Using Xalan's XPath API to Access XML Data

This article explains how to use the XPath API of Xalan to access XML data from a Java application. This method requires only few lines of code, even if the criteria for selecting data are complex. When using J2SE 1.4, no external libraries are required.

XPath

XPath is a standard for addressing parts of an XML document. For example, the XPath expression

report[@severity="warning"][5]

selects the fifth <report> element that has a severity attribute with value "warning".

Among other things, XPath is used in XSL Transformations (XSLT) to specify the content that a template should be applied to.

Xalan

Xalan is an implementation of an XSLT processor. As such, it also implements XPath.

Xalan is included in J2SE 1.4.

Xalan XPath API

Xalan exposes its XPath implementation through the API defined in the org.apache.xpath package. A good starting point for learning the API is the class XPathAPI.

Example 1

In this example, we will create a small Java application that prints the article titles from Slashdot's RSS feed.

XML Document

The Slashdot RSS feed can be found at http://slashdot.org/index.rss. After removing some parts that are not relevant for this example, it looks like this:

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns="http://purl.org/rss/1.0/"
  xmlns:dc="http://purl.org/dc/elements/1.1/">

 <channel rdf:about="http://slashdot.org/">
  <title>Slashdot</title>
  <link>http://slashdot.org/</link>
  <description>News for nerds, stuff that matters</description>
 
  <items>
   <rdf:Seq>
    <rdf:li rdf:resource="http://slashdot.org/article.pl?sid=04/05/08/2254227" />
    <rdf:li rdf:resource="http://slashdot.org/article.pl?sid=04/05/08/2210224" />
    <rdf:li rdf:resource="http://slashdot.org/article.pl?sid=04/05/08/1747258" />
   </rdf:Seq>
  </items>
 </channel>

 <item rdf:about="http://slashdot.org/article.pl?sid=04/05/08/2254227">
  <title>What's Being Done About Nuclear Security</title>
  <link>http://slashdot.org/article.pl?sid=04/05/08/2254227</link>
  <description>KrisCowboy writes "Wired.com has an interesting article ... </description>
  <dc:subject>security</dc:subject>
 </item>

 <item rdf:about="http://slashdot.org/article.pl?sid=04/05/08/2210224">
  <title>Cyber-Soap Returns From The Dead</title>
  <link>http://slashdot.org/article.pl?sid=04/05/08/2210224</link>
  <description>An anonymous reader submits "Back in 1995, an experimental ...</description>
  <dc:subject>ent</dc:subject>
 </item>

 <item rdf:about="http://slashdot.org/article.pl?sid=04/05/08/1747258">
  <title>Phatbot Author Arrested In Germany</title>
  <link>http://slashdot.org/article.pl?sid=04/05/08/1747258</link>
  <description>Tacito writes "After arresting the author of Sasser, the ...</description>
  <dc:subject>security</dc:subject>
 </item>
</rdf:RDF>

Source Code

import javax.xml.parsers.DocumentBuilderFactory;
import org.apache.xpath.XPathAPI;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;

public class XPathDemo {

    // URL of Slashdot's RSS feed.
    private static final String URL = "http://slashdot.org/index.rss";
    
    // XPath expression that selects text content of titles of articles.
    private static final String XPATH = "RDF/item/title/text()";

    public static void main(String[] args) throws Exception {
        // Parse feed into DOM tree.
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        Document feed = factory.newDocumentBuilder().parse(URL);

        // Select article titles into DOM node list.
        NodeList titles = XPathAPI.selectNodeList(feed, XPATH);

        // Iterate over node list and print article titles.
        for (int i = 0; i < titles.getLength(); i++) {
            System.out.println(titles.item(i).getNodeValue());
        }
    }
}

Compilation

The source code compiles with J2SE 1.4. No additional libraries are required.

Output

What's Being Done About Nuclear Security
Cyber-Soap Returns From The Dead
Phatbot Author Arrested In Germany

Explanation

First, the application parses the RSS document into a Document Object Model (DOM) tree. This is done using the J2SE classes DocumentBuilderFactory and DocumentBuilder.

Next, the application uses the Xalan method selectNodeList to select a subset of the DOM nodes. This method takes two inputs:

The DOM node tree from which to select.
The XPath expression (a string) that describes which nodes to select.

There are several static select methods in XPathAPI. They differ in the way they return the selected nodes and in the way they handle namespaces.

Lastly, the application iterates over the selected DOM nodes and prints their values.

Example 2

In this example, we will print only the titles of those articles that have the "security" subject.

Source Code

We only need to change one line of code:

// XPath expression selecting article titles.
private static final String XPATH = "RDF/item[subject='security']/title/text()";

Output

What's Being Done About Nuclear Security
Phatbot Author Arrested In Germany

Only few lines of Java code are required, because even complex selection criteria can be expressed in XPath, rather than Java code.
All required classes are included in J2SE 1.4.