php domdocument getelementbyid,PHP DOMDocument-> getElementByID添加Â代替空

最新推荐文章于 2022-11-15 10:41:22 发布

落木君

最新推荐文章于 2022-11-15 10:41:22 发布

阅读量132

点赞数

文章标签： php domdocument getelementbyid

PHP DOMDocument-> getElementByID添加Â代替空(PHP DOMDocument->getElementByID adding Â in place of empty )

我正在使用PHP的DOMDocument对象来解析一些HTML(使用cURL获取)。当我按ID获取元素并输出它时，任何空的标记都会获得一个额外的字符并变为Â 。

代码：

$document = new DOMDocument();

$document->validateOnParse = true;

$document->loadHTML( curl_exec($handle) );

curl_close($handle);

$element = $document->getElementById( __ELEMENT_ID__ );

echo $document->saveHTML();

echo $document->saveHTML($element);

$document->saveHTML()命令按预期运行并打印出整个页面。但是，就像我上面说的那样，在echo $document->saveHTML($element)命令echo $document->saveHTML($element)空标签转换为Â 。

这发生在$element所有标记中。

在这个过程中(通过ID获取元素并输出元素)是插入这个额外的字符？我可以解决它，但我更感兴趣的是找到根。

I'm using PHP's DOMDocument object to parse some HTML (fetched with cURL). When I get an element by ID and output it, any empty tags get an additional character and become Â .

The Code:

$document = new DOMDocument();

$document->validateOnParse = true;

$document->loadHTML( curl_exec($handle) );

curl_close($handle);

$element = $document->getElementById( __ELEMENT_ID__ );

echo $document->saveHTML();

echo $document->saveHTML($element);

The $document->saveHTML() command behaves as expected and prints out the entire page. BUT, like I say above, on the echo $document->saveHTML($element) command transforms empty tags into Â .

This happens to all tags within $element.

What in this process (of getting the element by ID and outputting the element) is inserting this extra character? I'm could work around it, but I'm more interested in getting to the root.

原文：https://stackoverflow.com/questions/13629351

更新时间：2019-11-29 11:57

最满意答案

我能够通过设置页面的字符编码来解决问题。我提取的页面没有定义的字符编码，我的页面只是一个没有定义标题信息的片段。当我添加

问题消失了。

I was able to fix the problem by setting the character encoding of the page. The page I was fetching did not have a defined character encoding, and my page was just a snippet without defined header info. When I added

The problem disappeared.

2012-11-30

相关问答

我能够通过设置页面的字符编码来解决问题。我提取的页面没有定义的字符编码，我的页面只是一个没有定义标题信息的片段。当我添加

问题消失了。 I was able to fix the problem by setting the character encoding of the page. The page I was

...

你可以使用DOMDocumentFragment和它的appendXML()方法，例如 <?php

$doc = new DOMDocument();

$doc->formatOutput = true;

$ele = $doc->createElement("someele", "Hello");

$xmlstuff = $doc->createElement("otherxmlstuff");

$fragment = $doc->createDocumentFragm

...

function getInnerHtml( $node ) {

$innerHTML= '';

$children = $node->childNodes;

foreach ($children as $child) {

$innerHTML .= $child->ownerDocument->saveXML( $child );

}

return $innerHTML;

}

$html = getInnerHtml($d

...

替换＆nbsp; 与＆amp; nbsp; 然后当读取htmlDom文档时，它将返回＆nbsp; replace with   then when the htmlDom doc is read it will return

您可以使用抑制解析错误的输出 libxml_use_internal_errors(true);

要检查返回的响应是否为404，您可以在调用DOMDocument::load()之后检查$http_response_header 例： libxml_use_internal_errors(true);

$rssDom = new DOMDocument();

$rssDom->load($url);

if (strpos($http_response_header[0], '404')) {

...

我认为，如果禁用外部实体加载器，则显然无法加载外部实体。唯一的解决方案是使用libxml_disable_entity_loader(false)启用外部实体的加载。由于此设置不是线程安全的，我可以看到两种方法：全局启用它并使用其他功能来阻止加载不需要的实体(通常来自网络)：使用libxml_set_external_entity_loader注册您自己的实体加载器。我认为这是最安全的解决方案。使用解析选项LIBXML_NONET 。如果您只想禁用libxml2的网络访问，这应该足

...

知道了，不知道它是如何无效的 - 证明文件： $xpath = new \DOMXpath($document);

$nodes = $xpath->query('//img[@id="banner"]');

// Return content if we don't have exactly one image with id="banner"

if(1 !== $nodes->length) return $content;

// DOMNode of the banner

$banner

...

用这个： $str = file_get_contents('http://dream-portal.net/index.php/board,65.0.html');

$doc = new DOMDocument();

@$doc->loadHTML($str);

$selector = new DOMXPath($doc);

foreach ($selector->query('//*[starts-with(@id, "msg_")]') as $node) {

var_dump

...

尝试创建文件的用户不是“yurow”(可能有权创建该文件的用户)。相反，它是一个用户，如“apache”或“httpd”。通常，系统设置为禁止apache / httpd用户在Web根目录中创建文件。这是出于安全目的而做的，我不建议通过给webroot提供apache / httpd写访问来绕过它。相反，您可以在/ home / yurow内部创建文档(不在/ home / yurow / wwwroot内)。一个例子可能是：/home/yurow/xmldata/test02.xml。

...

尝试以正确的格式编写HTML，使用双引号分隔的属性值，而不是单引号，因此它们不会被编码。 Javascript识别由单引号分隔的字符串。这是一个例子： $html = 'click here';

$doc = new DOMDocument();

$doc->loadHTML( $html );

echo $html . "\n";

echo "-----------------\n";

...

落木君

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
php domdocument getelementbyid,PHP DOMDocument-> getElementByID添加Â代替空

PHP DOMDocument-> getElementByID添加Â代替空(PHP DOMDocument->getElementByID adding Â in place of empty )我正在使用PHP的DOMDocument对象来解析一些HTML(使用cURL获取)。当我按ID获取元素并输出它时，任何空的标记都会获得一个额外的字符并变为Â 。代码：$document...
复制链接

扫一扫