我期待在文章中找到第一个h2.一旦找到,找到所有h3,直到找到下一个h2.冲洗并重复,直到找到所有标题和副标题.
在您立即将此问题标记或关闭为重复解析问题之前,请注意问题标题,因为这与基本节点检索无关.我已经把那部分搞定了.
我的代码如下:
$matches = array();
$dom = new DOMDocument;
$dom->loadHTML($content);
foreach($dom->getElementsByTagName('h2') as $node) {
$matches['heading-two'][] = $dom->saveHtml($node);
}
foreach($dom->getElementsByTagName('h3') as $node) {
$matches['heading-three'][] = $dom->saveHtml($node);
}
if($matches){
$this->key_points = $matches;
}
这给了我一个类似的输出:
array(
'heading-two' => array(
'
Here is the first heading two
','
Here is the SECOND heading two
'),
'heading-three' => array(
'
Here is the first h3
','
Here is the second h3
','
Here is the third h3
','
Here is the fourth h3
',)
);
我希望有更多的东西:
array(
'
Here is the first heading two
' => array('
Here is an h3 under the first h2
','
Here is another h3 found under first h2, but after the first h3
'),
'
Here is the SECOND heading two
' => array('
Here is an h3 under the SECOND h2
','
Here is another h3 found under SECOND h2, but after the first h3
')
);
我并不是在寻找代码完成(如果你觉得通过这样做可以更好地帮助其他人 – 继续),但是或多或少的指导或建议正确的方向来完成一个嵌套数组,如上面的直接.
解决方法:
我假设所有标题都在DOM中处于同一级别,因此每个h3都是h2的兄弟.有了这个假设,你可以迭代h2的兄弟,直到遇到下一个h2:
foreach($dom->getElementsByTagName('h2') as $node) {
$key = $dom->saveHtml($node);
$matches[$key] = array();
while(($node = $node->nextSibling) && $node->nodeName !== 'h2') {
if($node->nodeName == 'h3') {
$matches[$key][] = $dom->saveHtml($node);
}
}
}
标签:php,dom,parsing,html-parsing,domdocument
来源: https://codeday.me/bug/20190728/1566286.html