使用DOM4J遍历文档

最新推荐文章于 2022-10-21 16:20:55 发布

weixin_34007906

最新推荐文章于 2022-10-21 16:20:55 发布

阅读量107

点赞数

文章标签： java

原文链接：https://my.oschina.net/fhd/blog/368948

版权

为什么80%的码农都做不了架构师？>>>

dom4j提供了几种不同的选项用于遍历Document对象和它的子对象。

Iterator,Lists和Index-Based Access

例如，输出Element所有子元素location属性的属性值：

public void outputLocationAttributes(Element parent) {
    for(Iterator it = parent.elementIterator(); it.hasNext();){
        Element child = (Element) it.next();
        String value = child.attributeValue("location");
        if(value == null){
            System.out.println("No location attribute");
        }else{
            System.out.println("Location attribute value is " + value);
        }
    }
}

注意在这个例子中，我使用了elementIterator()方法，此工具方法返回List列表的一个java.util.Iterator，该list列表是由elements()方法返回的。如果你不想用Iterator接口，想使用基于索引的访问，那么你可以使用nodeCount()和node()方法：

public void outputLocationAttributes2(Element parent) {
    for(int i=0;i<parent.nodeCount();i++){
        Node node = parent.node(i);
        if(node instanceof Element) {
            Element child = (Element) node;
            String value = child.attributeValue("location");
            if(value == null) {
                System.out.println("NO location attribute");
            }else{
                System.out.println("Location attribute value is " + value);
            }
        }
    }
}

XPath

dom4j有一个XPath接口，该对象由DocumentFactory中的createXPath()方法或DocumentHelper中的createXPath()方法创建。dom4j的XPath接口独特之处在于：它可以通过XPath表达式对结果列表进行排序，不管是Node对象的列表（sort方法），还是一个表达式的结果（有两三个参数的selectNodes()方法）。

示例，见下面xml文档：

<?xml version='1.0" encoding="UTF-8"?>
<books>
    <book>
        <title>Java &amp; XML</title>
        <pubDate>2006</pubDate>
    </book>
    <book>
        <title>Learning UML</title>
        <pubDate>2003</pubDate>
    </book>
    <book>
        <title>XML in a Nutshell</title>
        <pubDate>2004</pubDate>
    </book>
    <book>
        <title>Apache cookbook</title>
        <pubDate>2003</pubDate>
    </book>
</books>

如果你想根据出版日期将书名列表进行排序，你可以创建两个单独的XPath表达式，来获取book元素，并对它们一一进行排序，然后像这样使用它们，如下所示：

package javaxml3;

import java.io.File;
import java.util.Iterator;
import java.util.List;
import org.dom4j.Document;
import org. dom4j.DocumentHelper;
import org.dom4j.Element;
import org. dom4j.XPath;
import org.dom4j.io.SAXReader;

public class SortingXPath{
    public static void main(String[] args) throws Exception {
        Document doc = new SAXReader().read(new File("books.xml"));
        XPath bookPath = DocumentHelper.createXPath("//book");
        XPath sortPath = DocumentHelper.createXPath("pubDate");
        List books = bookPath.selectNodes(doc,sortPath);  //sortPath是用于排序的XPath
        for(Interator it = books.iterator();it.hasNext();){
            Element book = (Element) it.next();
            System.out.println(book.elementText("title");
        }
    }
}

这是按照书名升序输出的，从Learning UML开始，直至Java & XML结尾。这里并不有提供按降序排列的机制。相反，你可以使用Java.util.Collections类的reverse()静态方法转换顺序。带三个参数的selectNodes()方法删除了结果列表中出现重复值的Node对象（第三个参数是true，如果是false的话，重复的Node对象就不会被删除），如果调用它筛选上面例子中的节点，代码可以写成这样：

List books = bookPath.selectNodes(doc,sortPath,true);

这样就只输出三个书名，Apache Cookbook将会排除在外，因为它和Learnig UML的出版日期相同。

除XPath类之外，Node接口有一些方法，你可以简单的传入String给其中的一个方法来计算XPath表达式的值。Node接口的XPath特定方法如下：

public interface Node{
    List selectNodes(String xpathExpression);
    Object selectObject(String xpathExpression);
    List selectNodes(String xpathExpression,String comparisonXPathExpression);
    List selectNodes(String xpathExpression,String comparisonXPathExpression,boolean removeDuplicates);
    Node selectSingleNode(String xpathExpression);
    String valueOf(String xpathExpression);
    Number numberValueOf(String xpathExpression);
    boolean matches(String xpathExpression);
}

对于后台实现而言，一般会使用XPath类求表达式的值，然后传递给这些方法。因为这些方法处理String，一般情况下，每次调用这类方法都将会创建一个新的XPath对象。因此，如果你想对一个相同的XPath表达式进行重复求值，XPath类提供了一个较好的方式，这样就只对你的表达式进行一次编译。此外，这些方法不能处理命名空间，变量，或自定义函数，如果这些功能是必要的，XPath类是你唯一的选择。但是，这并不意味着这些方法是无用的。实际上，它们使用起来非常方便，并且代码量很小。

还有一些Node接口中的方法很值得一提。比如说，getPath()方法和getUniquePath()方法返回一个XPath表达式，这个表达式是用来求节点列表中的值，其中包含了当前的节点。getUniquePath()方法比getPath()方法更进了一步，它添加了索引，以确保XPath表达式只对这一节点求值。除了不带参数的方法外，getPath()方法和getUniquePath()方法都重载了，并接收一个ELement元素，在这种情况下，将会产生一个相对的XPath表达式，从一个传递的ELement到当前的节点。

转载于:https://my.oschina.net/fhd/blog/368948