php 正则 id value,php – 如何使用正则表达式中的特定单词获取ID?

这种方法包括使用html结构来检索DOMXPath所需的元素.正则表达式第二次用于从文本节点或属性中提取信息:

$classRel = ['sect2' => 'section-ref',

'figure' => 'fig-ref'];

libxml_use_internal_errors(true);

$dom = new DOMDocument;

$dom->loadHTML($html); // or $dom->loadHTMLFile($url);

$xp = new DOMXPath($dom);

// make a custom php function available for the XPath query

// (it isn't really necessary, but it is more rigorous than writing

// "contains(@class, 'myClass')" )

$xp->registerNamespace("php", "http://php.net/xpath");

function hasClass($classNode, $className) {

if (!empty($classNode))

return in_array($className, preg_split('~\s+~', $classNode[0]->value, -1, PREG_SPLIT_NO_EMPTY));

return false;

}

$xp->registerPHPFunctions('hasClass');

// The XPath query will find the first ancestor of a text node with '[label*'

// that is a div tag with an id and a class attribute,

// if the class attribute doesn't contain the "metadata" class.

$labelQuery = <<

//text()[contains(., 'label*')]

/ancestor::div

[@id and @class and not(php:function('hasClass', @class, 'metadata'))][1]

EOD;

$idNodeList = $xp->query($labelQuery);

$links = [];

// For each div node, a new link node is created in the associative array $links.

// The keys are labels.

foreach($idNodeList as $divNode) {

// The pattern extract the first text part in group 1 and the label in group 2

if (preg_match('~(\S+) .*? \[label\* ([^]]+) ]~x', $divNode->textContent, $m)) {

$links[$m[2]] = $dom->createElement('a');

$links[$m[2]]->setAttribute('href', $divNode->getAttribute('id'));

$links[$m[2]]->setAttribute('class', $classRel[$divNode->getAttribute('class')]);

$links[$m[2]]->nodeValue = $m[1];

}

}

if ($links) { // if $links is empty no need to do anything

$refNodeList = $xp->query("//text()[contains(., '[ref*')]");

foreach ($refNodeList as $refNode) {

// split the text with square brackets parts, the reference name is preserved in a capture

$parts = preg_split('~\[ref\*([^]]+)]~', $refNode->nodeValue, -1, PREG_SPLIT_DELIM_CAPTURE);

// create a fragment to receive text parts and links

$frag = $dom->createDocumentFragment();

foreach ($parts as $k=>$part) {

if ($k%2 && isset($links[$part])) { // delimiters are always odd items

$clone = $links[$part]->cloneNode(true);

$frag->appendChild($clone);

} elseif ($part !== '') {

$frag->appendChild($dom->createTextNode($part));

}

}

$refNode->parentNode->replaceChild($frag, $refNode);

}

}

$result = '';

$childNodes = $dom->getElementsByTagName('body')->item(0)->childNodes;

foreach ($childNodes as $childNode) {

$result .= $dom->saveXML($childNode);

}

echo $result;

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值