我试图解析任何网址的内容.哪个不应该包含任何HTML代码.
这工作正常,但在阅读给定的URL上的内容时会出现一堆错误.如何删除此警告?
$url= 'https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page';
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXPath($doc);
foreach($xpath->query("//script") as $script) {
$script->parentNode->removeChild($script);
}
$textContent = $doc->textContent; //inherited from DOMNode
echo $textContent;
?>
警告:
content-from-a-web-page, line: 255 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 255 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 273 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 273 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 412 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 412 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 551 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): htmlParseEntityRef: expecting ';' in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 551 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
Warning: DOMDocument::loadHTMLFile(): ID display-name already defined in https://stackoverflow.com/questions/12097352/how-can-i-parse-dynamic-content-from-a-web-page, line: 731 in /opt/lampp/htdocs/FB/ec2/test.php on line 13
解决方法:
libxml_use_internal_errors(true);
$doc->loadHTMLFile($url);
libxml_clear_errors();
正如Peehaa在下面的评论中指出的那样,重置错误状态是个好主意.你可以这样做:
$errors = libxml_use_internal_errors(true); //store
$doc->loadHTMLFile($url);
libxml_clear_errors();
libxml_use_internal_errors($errors); //reset back to previous state
以下是它的工作原理:
> libxml_use_internal_errors()告诉libxml在内部处理错误和警告,并且不应该将其输出到浏览器.还将当前错误状态存储在变量中
>然后使用loadHTML()方法加载HTML文件
>使用libxml_clear_errors清除错误缓冲区
>恢复旧的错误值状态
标签:php,dom,domdocument
来源: https://codeday.me/bug/20191003/1846495.html