StAX API
The StAX API exposes methods for iterative, event-based processing of XML documents. XML documents are treated as a filtered series of events, and infoset states can be stored in a procedural fashion. Moreover, unlike SAX, the StAX API is bidirectional, enabling both reading and writing of XML documents.
The StAX API is really two distinct API sets: a cursor API and an iterator API. These two API sets explained in greater detail later in this chapter, but their main features are briefly described below.
Cursor API
As the name implies, the StAX cursor API represents a cursor with which you can walk an XML document from beginning to end. This cursor can point to one thing at a time, and always moves forward, never backward, usually one infoset element at a time.
The two main cursor interfaces are XMLStreamReader
and XMLStreamWriter
. XMLStreamReader
includes accessor methods for all possible information retrievable from the XML Information model, including document encoding, element names, attributes, namespaces, text nodes, start tags, comments, processing instructions, document boundaries, and so forth; for example:
public interface XMLStreamReader { public int next() throws XMLStreamException; public boolean hasNext() throws XMLStreamException; public String getText(); public String getLocalName(); public String getNamespaceURI(); // ... other methods not shown }
You can call methods on XMLStreamReader
, such as getText
and getName
, to get data at the current cursor location. XMLStreamWriter
provides methods that correspond to StartElement
and EndElement
event types; for example:
public interface XMLStreamWriter { public void writeStartElement(String localName) \ throws XMLStreamException; public void writeEndElement() \ throws XMLStreamException; public void writeCharacters(String text) \ throws XMLStreamException; // ... other methods not shown }
The cursor API mirrors SAX in many ways. For example, methods are available for directly accessing string and character information, and integer indexes can be used to access attribute and namespace information. As with SAX, the cursor API methods return XML information as strings, which minimizes object allocation requirements.
Iterator API
The StAX iterator API represents an XML document stream as a set of discrete event objects. These events are pulled by the application and provided by the parser in the order in which they are read in the source XML document.
The base iterator interface is called XMLEvent
, and there are subinterfaces for each event type listed in Table 3-2 , below. The primary parser interface for reading iterator events is XMLEventReader
, and the primary interface for writing iterator events is XMLEventWriter
. The XMLEventReader
interface contains five methods, the most important of which is nextEvent()
, which returns the next event in an XML stream.XMLEventReader
implements java.util.Iterator
, which means that returns from XMLEventReader
can be cached or passed into routines that can work with the standard Java Iterator; for example:
public interface XMLEventReader extends Iterator { public XMLEvent nextEvent() throws XMLStreamException; public boolean hasNext(); public XMLEvent peek() throws XMLStreamException; ... }
Similarly, on the output side of the iterator API, you have:
public interface XMLEventWriter { public void flush() throws XMLStreamException; public void close() throws XMLStreamException; public void add(XMLEvent e) throws XMLStreamException; public void add(Attribute attribute) \ throws XMLStreamException; ... }
Iterator Event Types
Table 3-2 lists the thirteen XMLEvent
types defined in the event iterator API.
StartDocument
StartElement
EndElement
StartElement
.
Characters
CData
sections and
CharacterData
entities. Note that ignorable whitespace and significant whitespace are also reported as
Character
events.
EntityReference
Characters
.
ProcessingInstruction
java.lang.String
information about the DTD, if any, associated with the stream, and provides a method for returning custom objects found in the DTD.
StartElement
event. However, there are times when it is desirable to return an attribute as a standalone
Attribute
event; for example, when a namespace is returned as the result of an
XQuery
or
XPath
expression.
StartElement
, but there are times when it is desirable to report a namespace as a discrete
Namespace
event.
Note that the DTD
, EntityDeclaration
, EntityReference
, NotationDeclaration
, and ProcessingInstruction
events are only created if the document being processed contains a DTD.
Sample Event Mapping
As an example of how the event iterator API maps an XML stream, consider the following XML document:
<?xml version="1.0"?> <BookCatalogue xmlns="http://www.publishing.org"> <Book> <Title>Yogasana Vijnana: the Science of Yoga</Title> <ISBN>81-40-34319-4</ISBN> <Cost currency="INR">11.50</Cost> </Book> </BookCatalogue>
This document would be parsed into eighteen primary and secondary events, as shown below. Note that secondary events, shown in curly braces ({}
), are typically accessed from a primary event rather than directly.
version="1.0"
StartDocument
isCData = false
data = "\n"
IsWhiteSpace = true
Characters
qname = BookCatalogue:http://www.publishing.org
attributes = null
namespaces = {BookCatalogue" -> http://www.publishing.org"}
StartElement
qname = Book
attributes = null
namespaces = null
StartElement
qname = Title
attributes = null
namespaces = null
StartElement
isCData = false
data = "Yogasana Vijnana: the Science of Yoga\n\t"
IsWhiteSpace = false
Characters
qname = Title
namespaces = null
EndElement
qname = ISBN
attributes = null
namespaces = null
StartElement
isCData = false
data = "81-40-34319-4\n\t"
IsWhiteSpace = false
Characters
qname = ISBN
namespaces = null
EndElement
qname = Cost
attributes = {"currency" -> INR}
namespaces = null
StartElement
isCData = false
data = "11.50\n\t"
IsWhiteSpace = false
Characters
qname = Cost
namespaces = null
EndElement
isCData = false
data = "\n"
IsWhiteSpace = true
Characters
qname = Book
namespaces = null
EndElement
isCData = false
data = "\n"
IsWhiteSpace = true
Characters
qname = BookCatalogue:http://www.publishing.org
namespaces = {BookCatalogue" -> http://www.publishing.org"}
EndElement
EndDocument
There are several important things to note in the above example:
StartElement
has a corresponding
EndElement
, even for empty elements.
Attribute
events are treated as secondary events, and are accessed from their corresponding
StartElement
event.
Attribute
events,
Namespace
events are treated as secondary, but appear twice and are accessible twice in the event stream, first from their corresponding
StartElement
and then from their corresponding
EndElement
.
Character
events are specified for all elements, even if those elements have no character data. Similarly,
Character
events can be split across events.
javax.xml.namespace.NamespaceContext
interface, and can be accessed by namespace prefix or URI.
Choosing Between Cursor and Iterator APIs
It is reasonable to ask at this point, "What API should I choose? Should I create instances of XMLStreamReader
or XMLEventReader
? Why are there two kinds of APIs anyway?"
Development Goals
The authors of the StAX specification targeted three types of developers:
Given these wide-ranging development categories, the StAX authors felt it was more useful to define two small, efficient APIs rather than overloading one larger and necessarily more complex API.
Comparing Cursor and Iterator APIs
Before choosing between the cursor and iterator APIs, you should note a few things that you can do with the iterator API that you cannot do with cursor API:
XMLEvent
subclasses are immutable, and can be used in arrays, lists, and maps, and can be passed through your applications even after the parser has moved on to subsequent events.
XMLEvent
that are either completely new information items or extensions of existing items but with additional methods.
Similarly, keep some general recommendations in mind when making your choice: