PHP正则分步实现:过滤文章中的超链和文字链接,同时保留过滤掉的超链中的文字,保留包含有图片的超链、远程引入的js脚本、独立的图片等链接。
例如:<A href="http://dx.120bjhs.com/" target=_blank>把我过滤了好吗</A><A href="http://dx.120bjhs.com/" target=_blank><img src="http://baidu.com" /></A><script type="text/javascript" src="http://baidu.com"></script><img src="http://baidu.com" />http://baidu.com不要过滤我http://www.gdzjdaily.com.cn/cooperate/yszx/2013-08/20/content_2489278.htm
(注:加删除线的为将要被过滤的代码)
<?php
/**
+---------------------------------------------------------------------------------------
* Description
+---------------------------------------------------------------------------------------
* @copyright http://blog.csdn.net/wang_huan2011
* @author Straiway <wang_huan2011@foxmail.com>
* @version $Id: index2.php 2013-8-24 UTF-8 $
+---------------------------------------------------------------------------------------
*/
$article['content'] = <<< HTML
<A href="http://dx.120bjhs.com/" target=_blank>把我过滤了好吗</A>
<A href="http://dx.120bjhs.com/" target=_blank><img src="http://baidu.com" /></A>
<script type="text/javascript" src="http://baidu.com"></script>
<img src="http://baidu.com" />
http://baidu.com不要过滤我
http://www.gdzjdaily.com.cn/cooperate/yszx/2013-08/20/content_2489278.htm
HTML;
$pattern_anchor = '#<a\s+.*?>(.*?)</a>#im';
$pattern_img = '#\<img.*?>#im';
$pattern_link = '#(http://|)((?:[\w\-]+\.)+\w+)((?:/[\w\-]+)*)((?:/|\.)[\x{0000}-\x{007f}]*)?#i';
$pattern_tag = '#(<.*?>)#m';
$search = array();
$replace = array();
//过滤超链
preg_match_all($pattern_anchor, $article['content'], $matches);
foreach ($matches[0] as $val) {
if (!preg_match($pattern_img, $val)) {
$search[] = $val;
$replace[] = preg_replace($pattern_anchor, '$1', $val);
}
}
//执行替换
if (count($search) > 0) {
$article['content'] = str_replace($search, $replace, $article['content']);
}
//过滤网址。先将html用标签分段
$html_pieces = preg_split($pattern_tag, $article['content'], -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
//检查每段是否是标签
foreach ($html_pieces as $key => $val) {
//如果不是标签,则过滤超链
if (!preg_match($pattern_tag, $val)) {
$html_pieces[$key] = preg_replace($pattern_link, '', $val);
}
}
//将分段合并
$article['content'] = implode('', $html_pieces);
echo $article['content'];
其实我希望通过一个正则实现替换,但奈何没有实现,还望大神们指教,小弟们拍砖。。。