[size=large]学习HtmlUnit的时候,看到了Xpath,主要是用Xpath选择hml或者xml中的元素,
先给一段代码:
WebClient client = new WebClient(BrowserVersion.INTERNET_EXPLORER_8);
HtmlPage page = client
.getPage("http://218.75.208.250:8089/opac/jdjsjg.jsp");
这是获取到了HtmlPage。
List<DomeNode> nodeList = page.getByXPath("/table[@class='.xxtable']");
这里可以选择用Jsoup,即Document d = Jsoup.parse(p.asXml());
接下来用d.select.... 去获取相关的元素。
今天我主要说Xpath, 在W3cschool中学习的,
nodename 根据name查找所有的节点
/ 表示从根目录下搜索
// 在当前的目录下搜素,不管在什么位置
. 获取当前的节点
.. 获取父节点
@ 根据属性去获取节点
下面给几个事例:
/bookstore/book[1]
/bookstore/book[last()]
/bookstore/book[last()-1]
/bookstore/book[position()<3]
//title[@lang]
//title[@lang='eng']
/bookstore/book[price>35.00]
/bookstore/book[price>35.00]/title
* Matches any element node
@* Matches any attribute node
node() Matches any node of any kind
/bookstore/* Selects all the child nodes of the bookstore element
//* Selects all elements in the document
//title[@*] Selects all title elements which have any attribute
//book/title | //book/price Selects all the title AND price elements of all book elements
//title | //price Selects all the title AND price elements in the document
/bookstore/book/title | //price
[/size]
先给一段代码:
WebClient client = new WebClient(BrowserVersion.INTERNET_EXPLORER_8);
HtmlPage page = client
.getPage("http://218.75.208.250:8089/opac/jdjsjg.jsp");
这是获取到了HtmlPage。
List<DomeNode> nodeList = page.getByXPath("/table[@class='.xxtable']");
这里可以选择用Jsoup,即Document d = Jsoup.parse(p.asXml());
接下来用d.select.... 去获取相关的元素。
今天我主要说Xpath, 在W3cschool中学习的,
nodename 根据name查找所有的节点
/ 表示从根目录下搜索
// 在当前的目录下搜素,不管在什么位置
. 获取当前的节点
.. 获取父节点
@ 根据属性去获取节点
下面给几个事例:
/bookstore/book[1]
/bookstore/book[last()]
/bookstore/book[last()-1]
/bookstore/book[position()<3]
//title[@lang]
//title[@lang='eng']
/bookstore/book[price>35.00]
/bookstore/book[price>35.00]/title
* Matches any element node
@* Matches any attribute node
node() Matches any node of any kind
/bookstore/* Selects all the child nodes of the bookstore element
//* Selects all elements in the document
//title[@*] Selects all title elements which have any attribute
//book/title | //book/price Selects all the title AND price elements of all book elements
//title | //price Selects all the title AND price elements in the document
/bookstore/book/title | //price
[/size]