Friday, August 20, 2010

XML parsing with SAX Parser Demo

Java provides two ways of parsing XML. Namely SAXParser and DOM Parser. With DOM Parser complete XML document object is held in emory and uses the DOM approach with DOM elements. This approach is used when you need to get hold of elements from XML in any order. But using DOM parser is costly in term of memory. When you have a XML document and want to read it once and create in memory java object (POJO), go for SAXParser approach. SAXParser parses the document(XML) from top to bottom, tag by tag.
Using Sax parser is simple. All we need to do is create a Handler class which extends DefaultHandler class of org.xml.sax.helpers package (part of JDK). In this Handler class, we define how tags are processed and stored into in memory object.
Once the Handler is developed, Create a Parser class which uses SAXParser abstract class from javax.xml.parsers package (also part of JDK). Use this Parser class and parse the document with help of the Handler class which we created. Current example demonstrate simple XML, for parsing nested XML with hierarchy, read a another article as how to parse nested XML.
Below is a snippet of simple XML file which needs to be parsed.

employee.xml
<?xml version="1.0" encoding="UTF-8"?>
<document>
    <Employee id="1">
        <name>Mohammed</name>
        <department>Development</department>
    </Employee>
</document>
This XML contains one Document tag and one <Employee> tag inside it. This Employee has an attribute id and two sub tags or child tags as <name>  and <department> tags with the respective information.
In Memory object for this Employee is below. This is a simple POJO class.
Employee.java
package com.mbm.demo.xml.main;

/**
 * @author Mohammed Bin Mahmood
 */
public class Employee {
    private final int id;
    private String name, department;

    public Employee(int id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getDepartment() {
        return department;
    }

    public void setDepartment(String department) {
        this.department = department;
    }

    public int getId() {
        return id;
    }

    @Override
    public String toString() {
        return id + ":[" + name + "][" + department + "]";
    }
}
The possible Handler class for the above XML and POJO class is described below.
Handler.java
package com.mbm.demo.xml.parser;

import java.math.BigDecimal;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import com.mbm.demo.xml.main.Employee;

/**
 * Handler extends DefaultHandler to capture data from XML response through
 * Input stream
 * 
 * @author Mohammed.
 */
public class Handler extends DefaultHandler {
    // logger
    // private static final Logger logger = Logger.getLogger(Handler.class);
    // private static final boolean isDebugEnabled = logger.isDebugEnabled();

    private final StringBuilder buffer = new StringBuilder(128);
    Employee emp;

    @Override
    public void startElement(String uri, String localName, String qName,
            Attributes attributes) throws SAXException {
        // reset the buffer
        buffer.setLength(0);

        if ("Employee".equals(qName)) {
            // start of Employee tag. only attributes values can be obtained
            // while starting.
            if (attributes != null) {
                final int empId = Integer.parseInt(attributes.getValue("id"));
                emp = new Employee(empId);
            }
        }

    }

    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        // add characters to the buffer
        buffer.append(ch, start, length);
    }

    @Override
    public void endElement(String uri, String localNameX, String qName)
            throws SAXException {
        if ("name".equals(qName)) {
            // closing tag for name, get value for the tag
            emp.setName(getBufferValue());
        } else if ("department".equals(qName)) {
            emp.setDepartment(getBufferValue());
        }
        // clear buffer after processing
        buffer.setLength(0);
    }

    /**
     * Returns the current value of the buffer, or null if it is empty or
     * whitespace. This method also resets the buffer.
     */
    private String getBufferValue() {
        if (buffer.length() == 0)
            return null;
        String value = buffer.toString().trim();
        buffer.setLength(0);
        return value.length() == 0 ? null : value;
    }

    public Employee getEmployee() {
        return emp;
    }

    // --------------UTIL METHODS --------------------
    /**
     * @return proper instance object from the given string. Possible Types are:
     *         Integer, BigDecimal, Boolean and String. null if value is null.
     */
    public static Object getValue(String value) {
        if (value == null)
            return null;
        // empty string
        if (value.length() == 0)
            return value;
        try {
            long l = Long.parseLong(value);
            if (l <= Integer.MAX_VALUE && l > Integer.MIN_VALUE) {
                // range of integer.
                return Integer.valueOf((int) l);
            } else
                return BigDecimal.valueOf(l);
        } catch (NumberFormatException e) {
            // not a simple number. try double
            try {
                return BigDecimal.valueOf(Double.parseDouble(value));
            } catch (NumberFormatException ex) {
                // not even double. ignore
            }
        }
        final String TRUE = "true", FALSE = "false";
        if (TRUE.equals(value) || FALSE.equals(value)) {
            // boolean value
            return Boolean.valueOf(value);
        }
        return value;
    }
}
Before creating a Parser, lets see what is going on in the Handler class. While the document is parsed, methods available in DefaultHandler class is called. This Class implements few interfaces below are the methods from org.xml.sax.ContentHandler Interface.
public void startElement (String uri, String localName, String qName, Attributes attributes) throws SAXException;
startElement() is called at the beginning of every element in the XML document. ex. <Employee id="1">
With the help of qName the name of the tag can be determined and the method parameter attribute provides an API to retrieve values available in the XML. See the comments and the code in startElement method which explains how a tag is processed at the begining and respective object and values are retrieved.
public void characters (char ch[], int start, int length) throws SAXException;
If you notice, the Handler class maintains an internal buffer to capture data called in characters() method. For performance improvement, buffer is cleared at beginning and the end of the cycle. i.e. startElement() and endElement(). The example demo, overrides the characters method to have own buffer and clears it when not needed.
public void endElement (String uri, String localName, String qName) throws SAXException;
This method simply close the object and get the content of the tag via buffer. When we parse multiple elements, this method stores that multiple object in respective collection.
Lets move on to The parser class. Well, Doesn't have to do really much in the Parser class. Have a look bellow.
Parser.java
package com.mbm.demo.xml.parser;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

/**
 * Class to create a parser and parse an XML file
 * 
 * @author Mohammed.
 */
public class Parser {

    private DefaultHandler handler;

    private SAXParser saxParser;

    /**
     * Constructor
     * 
     * @param handler
     *            - DefaultHandler for the SAX parser
     * @throws IOException
     */
    public Parser(DefaultHandler handler) throws IOException {
        // initialize handler
        this.handler = handler;
        // create parser
        create();
    }

    /**
     * Create the SAX parser
     * 
     * @throws IOException
     */
    private void create() throws IOException {
        // Obtain a new instance of a SAXParserFactory.
        SAXParserFactory factory = SAXParserFactory.newInstance();
        // Specifies that the parser produced by this code will NOT
        // support for XML namespaces.
        factory.setNamespaceAware(false);
        // Specifies that the parser produced by this code will NOT validate
        // documents as they are parsed.
        factory.setValidating(false);
        // Creates a new instance of a SAXParser using the currently
        // configured factory parameters.
        try {
            saxParser = factory.newSAXParser();
        } catch (Exception e) {
            throw new IOException(e);
        }
    }

    /**
     * Parse a File
     * 
     * @param file
     *            - File
     */
    public void parse(File file) throws IOException, SAXException {
        saxParser.parse(file, handler);
    }

    /**
     * Parse a URI
     * 
     * @param uri
     *            - String
     */
    public void parse(String uri) throws IOException, SAXException {
        saxParser.parse(uri, handler);
    }

    /**
     * Parse a Stream
     * 
     * @param stream
     *            - InputStream
     */
    public void parse(InputStream stream) throws IOException, SAXException {
        saxParser.parse(stream, handler);
    }

    public void parse(InputSource stream) throws IOException, SAXException {
        saxParser.parse(stream, handler);
    }
}
This class uses the Handler which was created earlier, wraps an instance of SAXParser inside and provide simplified methods to pars the document. This is little bit of facade pattern where a API is provided  to parse document. where the caller (which uses this class) do not need to get instance of SAXParserFactory then create new SAXParser by calling factory.newSAXParser(). Simple create this class using Handler and call parse methods. Rest is taken care. But yes, you have to take care of closing streams by your self after parsing is done.
A sample code for using the above classes and XML file is:
App.java
package com.mbm.demo.xml.main;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.util.List;

import org.xml.sax.SAXException;

import com.mbm.demo.xml.parser.Parser;
import com.mbm.demo.xml.parser.Handler;
import com.mbm.demo.xml.parser.Handler2;

/**
 * Main class to invoke XML parsing demo
 * 
 * @author Mohammed Bin Mahmood
 */
public class App {
    private static String path = "employee.xml";

    public static void main(String[] args) throws IOException {
        // one employee
        parseEmployee();
        // multiple employees
        parseEmployees();
    }

    private static void parseEmployee() throws FileNotFoundException,
            IOException {
        File xmlFile = new File(path);
        InputStream stream = new FileInputStream(xmlFile);

        Handler handler = new Handler();
        Parser parser = new Parser(handler);

        // try block for handling input stream.
        try {
            // try block for parsing
            try {
                // calling this will trigger respective calls in handler class
                // and handler will hold the data parsed from XML.
                parser.parse(stream);
                final Employee employee = handler.getEmployee();
                System.out.println(employee);
            } catch (SAXException e) {
                e.printStackTrace();
                // handle exception
            } catch (IOException e) {
                e.printStackTrace();
                // handle exception
            }
        } finally {
            try {
                stream.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

    private static void parseEmployees() throws FileNotFoundException,
            IOException {
        path = "employees.xml";
        File xml = new File(path);
        InputStream stream = new FileInputStream(xml);

        Handler2 handler = new Handler2();
        Parser parser = new Parser(handler);

        try {
            try {
                // calling this will trigger respective calls in handler class
                // and handler will hold the data parsed from XML.
                parser.parse(stream);
                final List<Employee> employeesList = handler.getEmployeesList();
                System.out.println(employeesList);

                System.out.println(handler.getEmployeesMap());
            } catch (SAXException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }
        } finally {
            try {
                stream.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

}
If you notice, the demo class App, uses one more Handler named Handler2. This is the class written as a demo for parsing multiple elements. Unlike before where we have only one Employee this example have multiple employees. Hence stored in a collection inside the Handler class. The Handler2 demonstrate to store the employees in a simple List as well as a Map where key is the employee id and the value is the Employee object it self. Use as your requirement.
employees.xml
<?xml version="1.0" encoding="UTF-8"?>
<document>
    <Employee id="1">
        <name>Mohammed</name>
        <department>Development</department>
    </Employee>
    <Employee id="2">
        <name>MBM</name>
        <department>Design</department>
    </Employee>
</document>
Handler2.java
package com.mbm.demo.xml.parser;

import java.math.BigDecimal;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import com.mbm.demo.xml.main.Employee;

/**
 * Handler extends DefaultHandler to capture data from XML response through
 * Input stream. capture Employees in a list as well as a map, where key is the
 * id of Employee and value is Employee object itself.
 * 
 * @author Mohammed.
 */
public class Handler2 extends DefaultHandler {
    // logger
    // private static final Logger logger = Logger.getLogger(Handler.class);
    // private static final boolean isDebugEnabled = logger.isDebugEnabled();

    private final StringBuilder buffer = new StringBuilder(128);
    // employees is linked list instead of array list because order has to be
    // retained.
    private List<Employee> employees = new LinkedList<Employee>();
    private Map<Integer, Employee> employeesMap = new HashMap<Integer, Employee>();

    @Override
    public void startElement(String uri, String localName, String qName,
            Attributes attributes) throws SAXException {
        // reset the buffer
        buffer.setLength(0);

        if ("Employee".equals(qName)) {
            // start of Employee tag. only attributes values can be obtained
            // while starting.
            if (attributes != null) {
                final int empId = Integer.parseInt(attributes.getValue("id"));
                Employee emp = new Employee(empId);
                System.out.println("adding employee:" + empId);
                // ADD to List
                employees.add(emp);
                // OR use MAP approach,either of one
                employeesMap.put(empId, emp);
            }
        }

    }

    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        // add characters to the buffer
        buffer.append(ch, start, length);
    }

    @Override
    public void endElement(String uri, String localNameX, String qName)
            throws SAXException {
        if ("name".equals(qName)) {
            // closing tag for name, get value for the tag
            // get last employee from the list
            Employee emp = employees.get(employees.size() - 1);
            // OR get last form map.
            emp = getLastObject(employeesMap);
            emp.setName(getBufferValue());
        } else if ("department".equals(qName)) {
            // get last employee from the list
            Employee emp = employees.get(employees.size() - 1);
            // OR get last form map.
            emp = getLastObject(employeesMap);
            emp.setDepartment(getBufferValue());
        }
        // clear buffer after processing
        buffer.setLength(0);
    }

    /**
     * Returns the current value of the buffer, or null if it is empty or
     * whitespace. This method also resets the buffer.
     */
    private String getBufferValue() {
        if (buffer.length() == 0)
            return null;
        String value = buffer.toString().trim();
        buffer.setLength(0);
        return value.length() == 0 ? null : value;
    }

    public List<Employee> getEmployeesList() {
        return employees;
    }

    public Map<Integer, Employee> getEmployeesMap() {
        return employeesMap;
    }

    // --------------UTIL METHODS --------------------
    /**
     * Method to create proper object instance from string.
     * 
     * @return proper instance object from the given string. Possible Types are:
     *         Integer, BigDecimal, Boolean and String. null if value is null.
     */
    public static Object getValue(String value) {
        if (value == null)
            return null;
        // empty string
        if (value.length() == 0)
            return value;
        try {
            long l = Long.parseLong(value);
            if (l <= Integer.MAX_VALUE && l > Integer.MIN_VALUE) {
                // range of integer.
                return Integer.valueOf((int) l);
            } else
                return BigDecimal.valueOf(l);
        } catch (NumberFormatException e) {
            // not a simple number. try double
            try {
                return BigDecimal.valueOf(Double.parseDouble(value));
            } catch (NumberFormatException ex) {
                // not even double. ignore
            }
        }
        final String TRUE = "true", FALSE = "false";
        if (TRUE.equals(value) || FALSE.equals(value)) {
            // boolean value
            return Boolean.valueOf(value);
        }
        return value;
    }

    public static <T> T getLastObject(Map<?, T> map) {
        if (map == null || map.isEmpty())
            return null;
        T last = null;
        for (T val : map.values()) {
            last = val;
        }
        return last;
    }
}
Do leave your comments or suggestions.

1 comment:

  1. Hi,

    Your post helped in my coding. Thanks a lot..

    Samy,
    http://shinobukaneko.blogspot.com/

    ReplyDelete

Was this article useful?