Using Xalan's XPath API to Access XML Data

From Wikipedia, the free encyclopedia.

Table of contents <script type="text/javascript"> showTocToggle("show","hide") </script> [hide]

Abstract

This article explains how to use the XPath API of Xalan to access XML data from a Java application. This method requires only few lines of code, even if the criteria for selecting data are complex. When using J2SE 1.4, no external libraries are required.

XPath

XPath is a standard for addressing parts of an XML document. For example, the XPath expression

report[@severity="warning"][5]

selects the fifth <report> element that has a severity attribute with value "warning".

Among other things, XPath is used in XSL Transformations (XSLT) to specify the content that a template should be applied to.

Xalan

Xalan is an implementation of an XSLT processor. As such, it also implements XPath.

Xalan is included in J2SE 1.4.

Xalan XPath API

Xalan exposes its XPath implementation through the API defined in the org.apache.xpath package. A good starting point for learning the API is the class XPathAPI.

Example 1

In this example, we will create a small Java application that prints the article titles from Slashdot's RSS feed.

XML Document

The Slashdot RSS feed can be found at http://slashdot.org/index.rss. After removing some parts that are not relevant for this example, it looks like this:

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/"
xmlns:dc="http://purl.org/dc/elements/1.1/">

<channel rdf:about="http://slashdot.org/">
<title>Slashdot</title>
<link>http://slashdot.org/</link>
<description>News for nerds, stuff that matters</description>

<items>
<rdf:Seq>
<rdf:li rdf:resource="http://slashdot.org/article.pl?sid=04/05/08/2254227" />
<rdf:li rdf:resource="http://slashdot.org/article.pl?sid=04/05/08/2210224" />
<rdf:li rdf:resource="http://slashdot.org/article.pl?sid=04/05/08/1747258" />
</rdf:Seq>
</items>
</channel>

<item rdf:about="http://slashdot.org/article.pl?sid=04/05/08/2254227">
<title>What's Being Done About Nuclear Security</title>
<link>http://slashdot.org/article.pl?sid=04/05/08/2254227</link>
<description>KrisCowboy writes "Wired.com has an interesting article ... </description>
<dc:subject>security</dc:subject>
</item>

<item rdf:about="http://slashdot.org/article.pl?sid=04/05/08/2210224">
<title>Cyber-Soap Returns From The Dead</title>
<link>http://slashdot.org/article.pl?sid=04/05/08/2210224</link>
<description>An anonymous reader submits "Back in 1995, an experimental ...</description>
<dc:subject>ent</dc:subject>
</item>

<item rdf:about="http://slashdot.org/article.pl?sid=04/05/08/1747258">
<title>Phatbot Author Arrested In Germany</title>
<link>http://slashdot.org/article.pl?sid=04/05/08/1747258</link>
<description>Tacito writes "After arresting the author of Sasser, the ...</description>
<dc:subject>security</dc:subject>
</item>
</rdf:RDF>

Source Code

import javax.xml.parsers.DocumentBuilderFactory;
import org.apache.xpath.XPathAPI;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;

public class XPathDemo {

// URL of Slashdot's RSS feed.
private static final String URL = "http://slashdot.org/index.rss";

// XPath expression that selects text content of titles of articles.
private static final String XPATH = "RDF/item/title/text()";

public static void main(String[] args) throws Exception {
// Parse feed into DOM tree.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document feed = factory.newDocumentBuilder().parse(URL);

// Select article titles into DOM node list.
NodeList titles = XPathAPI.selectNodeList(feed, XPATH);

// Iterate over node list and print article titles.
for (int i = 0; i < titles.getLength(); i++) {
System.out.println(titles.item(i).getNodeValue());
}
}
}

Compilation

The source code compiles with J2SE 1.4. No additional libraries are required.

Output

What's Being Done About Nuclear Security
Cyber-Soap Returns From The Dead
Phatbot Author Arrested In Germany

Explanation

First, the application parses the RSS document into a Document Object Model (DOM) tree. This is done using the J2SE classes DocumentBuilderFactory and DocumentBuilder.

Next, the application uses the Xalan method selectNodeList to select a subset of the DOM nodes. This method takes two inputs:

  • The DOM node tree from which to select.
  • The XPath expression (a string) that describes which nodes to select.

There are several static select methods in XPathAPI. They differ in the way they return the selected nodes and in the way they handle namespaces.

Lastly, the application iterates over the selected DOM nodes and prints their values.

Example 2

In this example, we will print only the titles of those articles that have the "security" subject.

Source Code

We only need to change one line of code:

// XPath expression selecting article titles.
private static final String XPATH = "RDF/item[subject='security']/title/text()";

Output

What's Being Done About Nuclear Security
Phatbot Author Arrested In Germany

Explanation

We were able to express the additional criterion "select only those articles with subject 'security'" by modifying the XPath expression. We did not need to change any Java code.

Discussion

Advantages

  • Only few lines of Java code are required, because even complex selection criteria can be expressed in XPath, rather than Java code.
  • All required classes are included in J2SE 1.4.

Disatvantages

  • The Xalan API is not officially part of J2SE 1.4. It is not guaranteed to be included in future J2SE releases.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值