However there are different approaches for parsing an xml source, You should select proper approach for your needs.
You may choose one of these:
- DOM - Document Object Model
- SAX - Simple API for XML
- StAX - Streaming API for XML
Let's discuss each one.
Parsing with DOM
If you perfer this technique you should know that the whole XML will be loaded into memory. Advatage of this
technique is you can navigate/read to any node. You can append, delete or update a child node becuase data
is available in the memory. However if the XML contains a large data, then it will be very expensive to load it into
memory. Also the whole XML is loaded to memory although you are looking for something particular.
You should consider using this technique, when you need to alter xml structure and you are sure that memory
consumption is not going to be expensive. Also this is the only choice where you can navigate to parent and child
elements. This makes it easier to use.
If you are creating a XML document(which is not big) you should use the technique. However, If you are going to
export a data from a database to xml(where you do need navigation in the xml and/or data is huge) then you should
consider other approaches.
DOM API is standardized by w3c.
Parsing with SAX:
SAX has totally a different approach. It starts to read the XML document from beginning to end, but it does not store anything
to memory. Instead it fires events and you can add your event handler depending on your requirements.
Your event handler will be called for example when an element begins or ends, when processing of document begins or ends.
So you register a handler(or more than one handler) and those handlers are called when an event occurs.
Here is a sample code from a site which calculates the total amount from the xml.
import java.io.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.apache.xerces.parsers.SAXParser;
public class Flour extends DefaultHandler {
float amount = 0;
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) {
if (namespaceURI.equals("http://recipes.org")
&& localName.equals("ingredient")) {
String n = atts.getValue("", "name");
if (n.equals("flour")) {
String a = atts.getValue("", "amount"); // assume 'amount' exists
amount = amount + Float.valueOf(a).floatValue();
}
}
}
public static void main(String[] args) {
Flour f = new Flour();
SAXParser p = new SAXParser();
p.setContentHandler(f);
try {
p.parse(args[0]);
} catch (Exception e) {
e.printStackTrace();
}
System.out.println(f.amount);
}
}
With SAX, first of all you do not need to worry on memory consumptions. If the performance is the criteria, (and if you are only reading the xml, not
modifying it), SAX is a much butter choice than DOM. However you are not going to have a tree structure where you can require parent or child
elements. You should be aware where you are.
Parsing with StAX
StAX is a newer technology then the others we discussed and it is the only one with a JSR(JSR-173).
Parsing with StAX look like parsing with SAX. Again StAX does not store anything to memory and the document is read from beginning to end once.
However use SAX, your event handler is called by SAX when an event occurs. In StAX to continue to next event.
You can use StaAX in two methods, the "cursor model" and the "iterator model".
Here is a simple code fragment I found on google. "cursor model" looks like:
URL u = new URL("http://www.cafeconleche.org/");
InputStream in = u.openStream();
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser = factory.createXMLStreamReader(in);
while (true) {
int event = parser.next();
if (event == XMLStreamConstants.END_DOCUMENT) {
parser.close();
break;
}
if (event == XMLStreamConstants.START_ELEMENT) {
System.out.println(parser.getLocalName());
}
}
As you see above, next event is required by us(parser.next()). In "iterator model" the logic is same but you receive an object while iterating which
contains information about the current event like:
XMLEventReader eventReader = XMLInputFactory.newInstance().createXMLEventReader(
new FileInputStream("abc.xml"));
while(eventReader.hasNext()) {
XMLEvent event = eventReader.next();
if (event instanceof StartElement)
{
System.out.println( ((Characters)eventReader.next())
.getData());
}
}
They were technologies, we also have implements.
After choosing your technology you can choose an implement.
Summary
- DOM tree-based
- load whole XML to memory, can navigate/ read to any nodes, you also can append, update and delete any child nodes
- can generate XML
- If the xml contains a large data, it will be very expensive to load it into memory
- SAX event-based(push model, observer design pattern)
- read XML from beginning to end, but it does not store anything to memory, so don't need to worry on memory consumptions
- do not need to parse whole XML, can stop anywhere when conditions are met
- can only read XML, cannot modify data
- cannot access another nodes in the document
- StAX event-based(pull model, iterator design pattern)
- read XML from beginning to end, but it does not store anything to memory, so don't need to worry on memory consumptions
- do not need to parse whole XML, can stop anywhere when conditions are met
- more efficient than SAX
- can generate XML
- can only read XML, cannot modify data
- cannot access another nodes in the document
from: http://blog.sanaulla.info/2013/05/23/parsing-xml-using-dom-sax-and-stax-parser-in-java