php 敏感词过滤

最近的任务是敏感词过滤(检测文章中的敏感词,手机号及网址并高亮),首先是百度和查看项目代码,于是有了

版本一:

1.worklevel.php

//根据文章id获取敏感词库和文章内容

public function actionSpecial($id){

    $SensitiveWord   =   SensitiveWord::find()->where(['status'=>SensitiveWord::STATUS_1])->all();

    $text            =   ArticleContent::find()->where(['article_id' => $id])->asArray()->all();

    $content         =   $text[0]['content'];

    $test            =   SensitiveWord::setSpecialColor($SensitiveWord,$content);

    return $test;

}

2.sensitiveword.php

public static function setSpecialColor($SensitiveWord='',$text=''){
    if(!empty($SensitiveWord)) {
        foreach ($SensitiveWord as $v) {
            $str   =  $v->name;
            $tmp   =  '<span class="setjuhuospecialcolor">' . $v->name . '</span>';//换颜色
            $text  =  str_replace($str, $tmp, $text);//替换

            //匹配手机号
            if (preg_match_all("/1\d{10}?/", $text, $result)) {
                foreach ($result[0] as $key => $value) {
                    $str   =  $value;
                    $tmp   =  '<span class="setjuhuospeciaphonelcolor">' . $value . '</span>';
                    $text  =  str_replace($str, $tmp, $text);
                }
             }

             //匹配网址
             $pattern = '/(http|https):\/\/([\w\d\-_]+[\.\w\d\-_]+)[:\d+]?([\/]?[\w\/\.]+)/i';
             if (preg_match_all($pattern, $text, $result)) {
                 foreach ($result[0] as $key => $value) {
                     $str   =  $value;
                     $tmp   =  '<span class="setjuhuospeciaphonelcolor">' . $value . '</span>';
                     $text  =  str_replace($str, $tmp, $text);
                 }
             }
        }
    }else{
        //匹配手机号
        if(preg_match_all("/1\d{10}?/",$text,$result)){
            foreach ($result[0] as $key => $value){
                $str   =  $value;
                $tmp   =  '<span class="setjuhuospeciaphonelcolor">'.$value.'</span>';
                $text  =  str_replace($str,$tmp,$text);
            }
         }

         //匹配网址
         $pattern = '/(http|https):\/\/([\w\d\-_]+[\.\w\d\-_]+)[:\d+]?([\/]?[\w\/\.]+)/i';
         if(preg_match_all($pattern,$text,$result)){
             foreach ($result[0] as $key => $value){
                 $str   =  $value;
                 $tmp   =  '<span class="setjuhuospeciaphonelcolor">'.$value.'</span>';
                 $text  =  str_replace($str,$tmp,$text);
             }
         }
    }
    return $text;
}

其实这个版本还是能用的,但是真正到线上运行的时候打开文章预览会出现超时的问题,而且考虑到后期如果词库太大,效率方面不是很好

于是打算改为用异步的方式,考虑到同时使用人多的情况,顺便把整个过滤功能放到JS中去处理,而不是让服务器区处理,提高加载速度

版本二:

1.view.php

<style type="text/css">
    .title{   text-align: center; font-size: 25px; font-weight: bold;}
    .time{    text-align: center; margin: 25px;}
    .content{ width: 92%; margin-left: 4%;}
    .remark{  margin-top: 50px; border: 1px solid #000000; width: 92%; margin-left: 4%; padding: 5px; border-radius: 5px;}
    .setjuhuospecialcolor{color: #ffcc00}
    .setjuhuospeciaphonelcolor{color: #ffcc00}
</style>


<script src="/js/admin/jquery-1.10.2.min.js"></script>
<script language="JavaScript">
    
    //获取文章
    $.get('/admin/work-level/get-article?id=<?php echo $_GET['id'];?>',function (article_content) {
        //先显示内容给用户看
        $("#content").html(article_content);
        
        //去除敏感词后的内容展示给用户(可能消耗大量时间)
        $.get('/admin/work-level/get-sensitive-word',function (data) { //因为JS只能接受字符串,所以先把敏感词拼成字符串,再进行拆分,变成数组
            var getstrimg        =  data;
            var sensitiveword    =  new Array(getstrimg .length);
            sensitiveword        =  getstrimg .split('-'); //循环查找关键字进行替换
            $.each(sensitiveword,function(key,value){
                var reg_keyword  =  eval('/'+value+'/g');
                article_content  =  article_content.replace(reg_keyword, '<span class="setjuhuospeciaphonelcolor">'+value+'</span>');
            });

            //替换手机号
            var reg_phone_arr    =  /1\d{10}?/g;
            var phone            =  article_content.match(reg_phone_arr);
            var phone_arr        =  delete_repetition(phone);
            for(var i=0;i<phone_arr.length;i++){
                var reg_phone    =  eval('/'+phone_arr[i]+'/g');
                article_content  =  article_content.replace(reg_phone, '<span class="setjuhuospeciaphonelcolor">'+phone_arr[i]+'</span>');
            }

            //替换网址
            var reg_url_arr      =  /(http|https):\/\/([\w\d\-_]+[\.\w\d\-_]+)[:\d+]?([\/]?[\w\/\.]+)?/g;
            var url              =  article_content.match(reg_url_arr);
            var url_arr          =  delete_repetition(url);
            for(var t=0;t<url_arr.length;t++){
                var reg_url      =  eval("url_arr[t]");
                article_content  =  article_content.replace(reg_url, '<span class="setjuhuospeciaphonelcolor">'+url_arr[t]+'</span>');
            }
            $("#content").html(article_content);
        });
    });

    //去重函数
    function delete_repetition(arr){
        var tmp = new Array();
        for(var m in arr){
            tmp[arr[m]]=1;
        }

        //再把键和值的位置再次调换
        var tmparr = new Array();
        for(var n in tmp){
            tmparr.push(n);
        }
        return tmparr;
    }
</script>


2.WorkLevelController.php

<?php
namespace app\controllers\admin;

use app\components\common\Tools;
use app\models\WorkLevel;
use app\models\SensitiveWord;
use app\models\ArticleContent;
use Yii;


class WorkLevelController extends AdminController{

    //获取文章内容
    public function actionGetArticle($id){
        $result = ArticleContent::find()->where(['article_id' => $id])->asArray()->all();
        return $result[0]['content'];
    }

    //获取敏感词
    public function actionGetSensitiveWord(){
        $sql   =  'select name FROM sensitive_word where status = 1';
        $test  =  SensitiveWord::findBySql($sql)->asArray()->all();
        foreach ($test as $key =>$value ){
            $result[$key] = $value['name'];
        }
        $result1 = implode('-',$result);
        return $result1;
    }
}

最终效果,把需要过滤的显示成金色:

当然,以上的方法是基于目前不是很多的敏感词库,如果敏感词库特别大的话,建议参考

1.PHP实现敏感词过滤系统

2.基于AC状态机的关键词过滤

转载于:https://my.oschina.net/u/1587469/blog/1456673

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值