本文最开始发表于择维士社区
什么是XPath
XPath是一种用于在xml格式的内容中提取信息的方式. 它与从JSON中提取信息的JSONPath类似. (如何使用JSONPath). 本文将介绍xpath的基本格式以及在Java中如何使用Xpath提取信息.
XPath基本格式
XPath的表达式
一般表达式格式如下: /foo/bar
可以搜索如下的xml内容/节点:
<foo>
<bar/>
</foo>
或者:
<foo>
<bar/>
<bar/>
<bar/>
</foo>
如果以//
开始代表忽略深度限制.
常见的节点元素类型:
Location Path | Description |
---|---|
/foo/bar/@id | bar元素的id属性 |
/foo/bar/text() | bar元素的text值. |
预测允许我们来查找满足条件的节点. 格式是[表达式]
. 比如:
选择所有foo节点(含所有子节点,孙子节点...)包含include属性,且值为true
//foo[@include='true']
//foo[@include='true'][@mode='bar']
更多的预测格式
<?xml version="1.0"?>
<Tutorials>
<Tutorial tutId="01" type="java">
<title>Guava</title>
<description>Introduction to Guava</description>
<date>04/04/2016</date>
<author>GuavaAuthor</author>
</Tutorial>
<Tutorial tutId="02" type="java">
<title>XML</title>
<description>Introduction to XPath</description>
<date>04/05/2016</date>
<author>XMLAuthor</author>
</Tutorial>
</Tutorials>
比如上面的例子:
/Tutorials/Tutorial[1]
/Tutorials/Tutorial[first()]
/Tutorials/Tutorial[position()<4]
XPath在Java中的使用示例
JDK11中原生支持了xmlpath解析, 以解析上面的xml为例:
获取一堆节点
返回所有 /Tutorials/Tutorial
节点:
import org.w3c.dom.*;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import java.io.*;
public class XmlDemo {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(new ByteArrayInputStream(EXAMPLE_STRING.getBytes()));
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "/Tutorials/Tutorial";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
System.out.println("Find nodes length = " + nodeList.getLength());
}
static String EXAMPLE_STRING = "<?xml version=\"1.0\"?>" +
"<Tutorials>\n" +
" <Tutorial tutId=\"01\" type=\"java\">\n" +
" <title>Guava</title>\n" +
" <description>Introduction to Guava</description>\n" +
" <date>04/04/2016</date>\n" +
" <author>GuavaAuthor</author>\n" +
" </Tutorial>\n" +
" <Tutorial tutId=\"02\" type=\"java\">\n" +
" <title>XML</title>\n" +
" <description>Introduction to XPath</description>\n" +
" <date>04/05/2016</date>\n" +
" <author>XMLAuthor</author>\n" +
" </Tutorial>\n" +
"</Tutorials>";
}
根据某个id获取节点:
获取Tutorial (tutId=01)的节点:
String expression = "/Tutorials/Tutorial[@tutId=\"01\"]";
Node node = (Node) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODE);
System.out.println("Find node" + node);
根据某个tag获取节点:
获取包含title的节点 以及节点值为Guava:
String expression = "//Tutorial[descendant::title[text()=" + "'" + "Guava" + "'" + "]]";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(xmlDocument, XPathConstants.NODESET);
System.out.println("Found title=Guava length:" + nodeList.getLength());
参考
1.JDK中的api