XPath是一种用来从XML文档中提取信息的语言
所需知识:
- HTML / XML
- XML / XML Namespace
什么是XPath:
- XPath is a syntax for defining parts of an XML document
- XPath uses path expression to navigate in XML documents
- XPath contains a library of standard functions
- XPath is a major element in XSLT
- XPath is a W3C recommendation
XPath中有7类元素:
- element
- attribute
- text
- namespace
- processing-instruction
- comment
- document nodes
<?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> </bookstore>
如下是节点的例子:
<bookstore>是根元素节点
<author>J K. Rowling</author>是元素节点
lang="en"是属性节点
atomic value(原子节点)指没有孩子节点或父亲节点的节点
原子节点的例子:
J K. Rowling
"en"
节点之间的关系-parent:每个元素和属性都有一个父节点
节点之间的关系-children:每个元素可以有0个,1个或多个孩子节点
节点之间的关系-siblings:兄弟节点——有共同的父亲节点的节点
节点之间的关系-ancestors:节点的父节点,父节点的父节点.....
节点之间的关系-descendents:节点的孩子节点,孩子节点的孩子节点......
XPath使用路径表达式来在XML文档中选择一个节点或一组节点
<?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="eng">Harry Potter</title> <price>29.99</price> </book> <book> <title lang="eng">Learning XML</title> <price>39.95</price> </book> </bookstore>
路径表达式:nodename - select all nodes with the name "nodename"
/ - select from the root node
// - select nodes from the document from the current node that match the selection no matter where they are
. - select the current node
.. - select the parent node of the current node
@ - select attribute
Predicates are used to find a specific node or a node that contains a specific value.
/bookstore/book[1] - 选择bookstore的第一个book孩子元素
/bookstore/book[last()] - 选择bookstore的最后一个book孩子元素
/bookstore/book[last() - 1] - 选择bookstore的倒数第二个book孩子元素
/bookstore/book[position() < 3] - 选择前两个book孩子元素
//title[@lang] - 选择所有的包含lang属性的title元素
//title[@lang='eng'] - 选择所有lang属性值为eng的title元素
/bookstore/book[price > 0.35] - 选择bookstore下所有price属性大雨0.35的book元素
/bookstore/book[price > 0.35]/title - 选择所有price大于0.35的book元素下的所有title元素
XPath通配符可以用来选择未知的XML元素:
* - 匹配任何元素节点
@* - 匹配任何属性节点
node() - 匹配任何类型的节点
/bookstore/* - 匹配bookstore下的所有孩子节点
//* - 选择文档中的所有节点
//title[@*] - 选择所有包含属性的title元素
//book/title | //book/ - 选择book元素的title元素和book元素
//title | //price - 选择所有的title元素和price元素
/bookstore/book/title | //price - 选择bookstore下的book下的所有title元素和所有price元素
An XPath axis defines a node set relative to the current node
ancestor 选择当前节点的祖先节点 ancestor-or-self 选择当前节点的祖先节点和当前节点 attribute 选择当前节点的所有属性 child 选择当前节点的所有孩子节点 descendant 选择当前节点的所有子孙节点 descendant-or-self 选择当前节点的所有子孙节点和自己 following 选择当前节点后面的所有节点 following-sibling 选择当前节点后面的所有兄弟节点 namespace 选择当前节点的所有命名空间节点 parent 选择当前节点的父亲节点 preceding 选择当前节点之前的节点 preceding-sibling 选择当前节点之前的兄弟节点 self 当前节点
child::book selects all book nodes that are children of the current node attribute::lang selects the lang attribute of the current node child::* selects all element children of the current node attribute::* selects all attributes of the current node child::text() selects all text node children of the current node child::node() selects all children of the current node descendant::book selects all descendant of the current node ancestor::book selects all book ancestors of the current node ancestor-or-self::book selects all book ancestors of the current node and the current as well if it is a book node child::*/child::price selects all price grandchildren of the current node
XPath Operators
|, +, -, *, div, =, !=, <, <=, >, >=, or, and, mod