php解析html为dom,php – 用于解析HTML(而不是正则表达式)的DOMDocument

最新推荐文章于 2023-06-02 13:41:32 发布

六堡茶之家

最新推荐文章于 2023-06-02 13:41:32 发布

阅读量369

点赞数

文章标签： php解析html为dom

我正在尝试使用DOMDocument来解析HTML代码.

我只是在做一些简单的工作,我已经在scrap data using regex and simplehtmldom上喜欢gordon的答案,并根据他的工作编写了我的代码.

我发现PHP.net上的文档不是很好,因为信息有限,几乎没有示例,大多数细节都是基于解析XML.

$dom = new DOMDocument;

libxml_use_internal_errors(true);

$dom->loadHTMLFile('http://www.nu.nl/internet/1106541/taalunie-keurt-open-sourcewoordenlijst-goed.html');

libxml_clear_errors();

$recipe = array();

$xpath = new DOMXPath($dom);

$contentDiv = $dom->getElementById('page'); // would have preferred getContentbyClass('content') (unique) in this case.

# title

print_r($xpath->evaluate('string(div/div/div/div/div/h1)', $contentDiv));

# content (this is not working)

#print_r($xpath->evaluate('string(div/div/div/div['content'])', $contentDiv)); // if only this worked

print_r($xpath->evaluate('string(div/div/div/div)', $contentDiv));

?>

出于测试目的,我试图获取nu.nl新闻文章的标题(在h1标签之间)和内容(HTML).

正如你所看到的,我可以得到标题,虽然我对那个评估字符串甚至不满意,因为它恰好是该div级别上唯一的h1标签.

解决方法:

以下是使用DOM和XPath的方法：

$dom = new DOMDocument;

libxml_use_internal_errors(true);

$dom->loadHTMLFile('http://www.nu.nl/…');

libxml_clear_errors();

$xpath = new DOMXPath($dom);

echo $xpath->evaluate('string(id("leadarticle")/div/h1)');

echo $dom->saveHtml(

$xpath->evaluate('id("leadarticle")/div[@class="content"]')->item(0)

);

XPath字符串(id(“leadarticle”)/ div / h1)将返回h1的textContent,它是div的子节点,div是具有id leadarticle的元素的子节点.

XPath id(“leadarticle”)/ div [@ class =“content”]将返回具有class属性内容的div,该属性内容是具有id leadarticle的元素的子元素.

标签：php,dom,parsing,xpath

来源： https://codeday.me/bug/20190929/1833095.html

六堡茶之家

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
php解析html为dom,php – 用于解析HTML(而不是正则表达式)的DOMDocument

我正在尝试使用DOMDocument来解析HTML代码.我只是在做一些简单的工作,我已经在scrap data using regex and simplehtmldom上喜欢gordon的答案,并根据他的工作编写了我的代码.我发现PHP.net上的文档不是很好,因为信息有限,几乎没有示例,大多数细节都是基于解析XML.$dom = new DOMDocument;libxml_use_inter...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。