php 正则 id value,php – 如何使用正则表达式中的特定单词获取ID？

最新推荐文章于 2022-05-24 23:47:50 发布

韧笔

最新推荐文章于 2022-05-24 23:47:50 发布

阅读量288

点赞数

文章标签： php 正则 id value

这种方法包括使用html结构来检索DOMXPath所需的元素.正则表达式第二次用于从文本节点或属性中提取信息：

$classRel = ['sect2' => 'section-ref',

'figure' => 'fig-ref'];

libxml_use_internal_errors(true);

$dom = new DOMDocument;

$dom->loadHTML($html); // or $dom->loadHTMLFile($url);

$xp = new DOMXPath($dom);

// make a custom php function available for the XPath query

// (it isn't really necessary, but it is more rigorous than writing

// "contains(@class, 'myClass')" )

$xp->registerNamespace("php", "http://php.net/xpath");

function hasClass($classNode, $className) {

if (!empty($classNode))

return in_array($className, preg_split('~\s+~', $classNode[0]->value, -1, PREG_SPLIT_NO_EMPTY));

return false;

}

$xp->registerPHPFunctions('hasClass');

// The XPath query will find the first ancestor of a text node with '[label*'

// that is a div tag with an id and a class attribute,

// if the class attribute doesn't contain the "metadata" class.

$labelQuery = <<

//text()[contains(., 'label*')]

/ancestor::div

[@id and @class and not(php:function('hasClass', @class, 'metadata'))][1]

EOD;

$idNodeList = $xp->query($labelQuery);

$links = [];

// For each div node, a new link node is created in the associative array $links.

// The keys are labels.

foreach($idNodeList as $divNode) {

// The pattern extract the first text part in group 1 and the label in group 2

if (preg_match('~(\S+) .*? \[label\* ([^]]+) ]~x', $divNode->textContent, $m)) {

$links[$m[2]] = $dom->createElement('a');

$links[$m[2]]->setAttribute('href', $divNode->getAttribute('id'));

$links[$m[2]]->setAttribute('class', $classRel[$divNode->getAttribute('class')]);

$links[$m[2]]->nodeValue = $m[1];

}

}

if ($links) { // if $links is empty no need to do anything

$refNodeList = $xp->query("//text()[contains(., '[ref*')]");

foreach ($refNodeList as $refNode) {

// split the text with square brackets parts, the reference name is preserved in a capture

$parts = preg_split('~\[ref\*([^]]+)]~', $refNode->nodeValue, -1, PREG_SPLIT_DELIM_CAPTURE);

// create a fragment to receive text parts and links

$frag = $dom->createDocumentFragment();

foreach ($parts as $k=>$part) {

if ($k%2 && isset($links[$part])) { // delimiters are always odd items

$clone = $links[$part]->cloneNode(true);

$frag->appendChild($clone);

} elseif ($part !== '') {

$frag->appendChild($dom->createTextNode($part));

}

}

$refNode->parentNode->replaceChild($frag, $refNode);

}

}

$result = '';

$childNodes = $dom->getElementsByTagName('body')->item(0)->childNodes;

foreach ($childNodes as $childNode) {

$result .= $dom->saveXML($childNode);

}

echo $result;

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
php 正则 id value,php – 如何使用正则表达式中的特定单词获取ID？

这种方法包括使用html结构来检索DOMXPath所需的元素.正则表达式第二次用于从文本节点或属性中提取信息：$classRel = ['sect2' => 'section-ref','figure' => 'fig-ref'];libxml_use_internal_errors(true);$dom = new DOMDocument;$dom->loadHTML($ht...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。