From Wikipedia, the free encyclopedia.
Table of contents <script type="text/javascript"> showTocToggle("show","hide") </script> [hide] |
Abstract
This article explains how to use the XPath API of Xalan to access XML data from a Java application. This method requires only few lines of code, even if the criteria for selecting data are complex. When using J2SE 1.4, no external libraries are required.
XPath
XPath is a standard for addressing parts of an XML document. For example, the XPath expression
- report[@severity="warning"][5]
selects the fifth <report> element that has a severity attribute with value "warning".
Among other things, XPath is used in XSL Transformations (XSLT) to specify the content that a template should be applied to.
Xalan
Xalan is an implementation of an XSLT processor. As such, it also implements XPath.
Xalan is included in J2SE 1.4.
Xalan XPath API
Xalan exposes its XPath implementation through the API defined in the org.apache.xpath package. A good starting point for learning the API is the class XPathAPI.
Example 1
In this example, we will create a small Java application that prints the article titles from Slashdot's RSS feed.
XML Document
The Slashdot RSS feed can be found at http://slashdot.org/index.rss. After removing some parts that are not relevant for this example, it looks like this:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel rdf:about="http://slashdot.org/">
<title>Slashdot</title>
<link>http://slashdot.org/</link>
<description>News for nerds, stuff that matters</description>
<items>
<rdf:Seq>
<rdf:li rdf:resource="http://slashdot.org/article.pl?sid=04/05/08/2254227" />
<rdf:li rdf:resource="http://slashdot.org/article.pl?sid=04/05/08/2210224" />
<rdf:li rdf:resource="http://slashdot.org/article.pl?sid=04/05/08/1747258" />
</rdf:Seq>
</items>
</channel>
<item rdf:about="http://slashdot.org/article.pl?sid=04/05/08/2254227">
<title>What's Being Done About Nuclear Security</title>
<link>http://slashdot.org/article.pl?sid=04/05/08/2254227</link>
<description>KrisCowboy writes "Wired.com has an interesting article ... </description>
<dc:subject>security</dc:subject>
</item>
<item rdf:about="http://slashdot.org/article.pl?sid=04/05/08/2210224">
<title>Cyber-Soap Returns From The Dead</title>
<link>http://slashdot.org/article.pl?sid=04/05/08/2210224</link>
<description>An anonymous reader submits "Back in 1995, an experimental ...</description>
<dc:subject>ent</dc:subject>
</item>
<item rdf:about="http://slashdot.org/article.pl?sid=04/05/08/1747258">
<title>Phatbot Author Arrested In Germany</title>
<link>http://slashdot.org/article.pl?sid=04/05/08/1747258</link>
<description>Tacito writes "After arresting the author of Sasser, the ...</description>
<dc:subject>security</dc:subject>
</item>
</rdf:RDF>
Source Code
import javax.xml.parsers.DocumentBuilderFactory;
import org.apache.xpath.XPathAPI;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class XPathDemo {
// URL of Slashdot's RSS feed.
private static final String URL = "http://slashdot.org/index.rss";
// XPath expression that selects text content of titles of articles.
private static final String XPATH = "RDF/item/title/text()";
public static void main(String[] args) throws Exception {
// Parse feed into DOM tree.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document feed = factory.newDocumentBuilder().parse(URL);
// Select article titles into DOM node list.
NodeList titles = XPathAPI.selectNodeList(feed, XPATH);
// Iterate over node list and print article titles.
for (int i = 0; i < titles.getLength(); i++) {
System.out.println(titles.item(i).getNodeValue());
}
}
}
Compilation
The source code compiles with J2SE 1.4. No additional libraries are required.
Output
What's Being Done About Nuclear Security
Cyber-Soap Returns From The Dead
Phatbot Author Arrested In Germany
Explanation
First, the application parses the RSS document into a Document Object Model (DOM) tree. This is done using the J2SE classes DocumentBuilderFactory and DocumentBuilder.
Next, the application uses the Xalan method selectNodeList to select a subset of the DOM nodes. This method takes two inputs:
- The DOM node tree from which to select.
- The XPath expression (a string) that describes which nodes to select.
There are several static select methods in XPathAPI. They differ in the way they return the selected nodes and in the way they handle namespaces.
Lastly, the application iterates over the selected DOM nodes and prints their values.
Example 2
In this example, we will print only the titles of those articles that have the "security" subject.
Source Code
We only need to change one line of code:
// XPath expression selecting article titles.
private static final String XPATH = "RDF/item[subject='security']/title/text()";
Output
What's Being Done About Nuclear Security
Phatbot Author Arrested In Germany
Explanation
We were able to express the additional criterion "select only those articles with subject 'security'" by modifying the XPath expression. We did not need to change any Java code.
Discussion
Advantages
- Only few lines of Java code are required, because even complex selection criteria can be expressed in XPath, rather than Java code.
- All required classes are included in J2SE 1.4.
Disatvantages
- The Xalan API is not officially part of J2SE 1.4. It is not guaranteed to be included in future J2SE releases.