php页面内删除,php – 解析页面内容时删除DocDocument警告

我试图解析任何网址的内容.哪个不应该包含任何HTML代码.

这工作正常,但在阅读给定的URL上的内容时会出现一堆错误.如何删除此警告?

$url= 'https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page';

$doc = new DOMDocument();

$doc->loadHTMLFile($url);

$xpath = new DOMXPath($doc);

foreach($xpath->query("//script") as $script) {

$script->parentNode->removeChild($script);

}

$textContent = $doc->textContent; //inherited from DOMNode

echo $textContent;

?>

警告:

content-from-a-web-page, line: 255 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 255 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 273 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 273 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 412 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 412 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 551 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 551 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

Warning: DOMDocument::loadHTMLFile(): ID display-name already defined in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 731 in /opt/lampp/htdocs/FB/ec2/test.php on line 13

解决方法:

libxml_use_internal_errors(true);

$doc->loadHTMLFile($url);

libxml_clear_errors();

正如Peehaa在下面的评论中指出的那样,重置错误状态是个好主意.你可以这样做:

$errors = libxml_use_internal_errors(true); //store

$doc->loadHTMLFile($url);

libxml_clear_errors();

libxml_use_internal_errors($errors); //reset back to previous state

以下是它的工作原理:

> libxml_use_internal_errors()告诉libxml在内部处理错误和警告,并且不应该将其输出到浏览器.还将当前错误状态存储在变量中

>然后使用loadHTML()方法加载HTML文件

>使用libxml_clear_errors清除错误缓冲区

>恢复旧的错误值状态

标签:php,dom,domdocument

来源: https://codeday.me/bug/20191003/1846495.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值