XPath:一种网页查询语言
XPath最准确的分类是一种特定域语言,其应用领域相对狭窄——专门用于从标记语言文档(类似于HTML或XML)选取信息的有用工具。
参考网站:http://www.r-datacollection.com/materials/ch-4-xpath/fortunes/fortunes.html
解析文件
> library(XML)
> parsed_doc<-htmlParse(file = "http://www.r-datacollection.com/materials/ch-4-xpath/fortunes/fortunes.html")
> print(parsed_doc)
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head><title>Collected R wisdoms</title></head>
<body>
<div id="R Inventor" lang="english" date="June/2003">
<h1>Robert Gentleman</h1>
<p><i>'What we have is nice, but we need something very different'</i></p>
<p><b>Source: </b>Statistical Computing 2003, Reisensburg</p>
</div>
<div lang="english" date="October/2011">
<h1>Rolf Turner</h1>
<p><i>'R is wonderful, but it cannot work magic'</i> <br><emph>answering a request for automatic generation of 'data from a known mean and 95% CI'</emph></p>
<p><b>Source: </b><a href="https://stat.ethz.ch/mailman/listinfo/r-help">R-help</a></p>
</div>
<address>
<a href="http://www.r-datacollectionbook.com"><i>The book homepage</i></a><a></a>
</address>
</body>
</html>
xpathSApply(&