PHP7.2使用扩展trie-filter进行关键词过滤

关键词过滤扩展,用于检查一段文本中是否出现关键词,基于Double-Array Trie 树实现

一、安装libiconv
这个是libdatrie的依赖项

PHP7.2的用法:
wget http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.14.tar.gz   
tar zxvf libiconv-1.14.tar.gz   
cd libiconv-1.14   
./configure   
make   
make install

安装如果遇到下面的问题:
In file included from progname.c:26:0:
./stdio.h:1010:1: error: 'gets' undeclared here (not in a function)
 _GL_WARN_ON_USE (gets, "gets is a security hole - use fgets instead");
 ^
make[2]: *** [progname.o] Error 1
make[2]: Leaving directory `/usr/local/directadmin/custombuild/libiconv-1.14/srclib'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/usr/local/directadmin/custombuild/libiconv-1.14/srclib'
make: *** [all] Error 2
*******************************************
*******************************************

Cannot find /usr/local/bin/php
Please recompile php with custombuild, eg:
cd /usr/local/directadmin/custombuild
./build all d

This appears to be a 64-bit system.
a common cause of http/php compile failures is mentioned here:
http://help.directadmin.com/item.php?id=213

*******************************************
*******************************************

解决方案:                                                                              

cd /tmp/libiconv-1.14/srclib

vim stdio.in.h

复制代码
找到这行内容:
_GL_WARN_ON_USE (gets, "gets is a security hole - use fgets instead");

替换成:
#if defined(__GLIBC__) && !defined(__UCLIBC__) && !__GLIBC_PREREQ(2, 16)
_GL_WARN_ON_USE (gets, "gets is a security hole - use fgets instead");
#endif

注意:结尾的#endif也要包括。

二、安装libdatrie

wget http://www.az1314.cn/uploads/libdatrie.gz
tar zxvf libdatrie.gz
cd libdatrie-0.2.4
./configure --prefix=/usr/local/libdatrie
make
make install

如果报下面的错误:
[root@localhost libdatrie-0.2.4]# ./configure
...
libtool: link: gcc -g -O2 -o .libs/trietool-0.2 trietool.o  ../datrie/.libs/libdatrie.so
trietool.o: In function `conv_from_alpha':
/usr/local/src/libdatrie-0.2.4/tools/trietool.c:170: undefined reference to `libiconv'
trietool.o: In function `conv_to_alpha.isra.2.constprop.7':
trietool.c:(.text+0x181): undefined reference to `libiconv'

configure加上下面的参数:
./configure LDFLAGS=-L/usr/local/lib LIBS=-liconv


三、安装trie-filter扩展

git clone https://github.com/zzjin/php-ext-trie-filter
cd php-ext-trie-filter
/usr/bin/phpize7.2 
./configure  --with-trie_filter=/usr/local/libdatrie
make 
make install

sudo vim /etc/php/7.2/mods-available/trie-filter.ini添加extension=trie-filter.so
sudo ln -s /etc/php/7.2/mods-available/trie-filter.ini /etc/php/7.2/cli/conf.d/20-trie-filter.ini
sudo ln -s /etc/php/7.2/mods-available/trie-filter.ini /etc/php/7.2/fpm/conf.d/20-trie-filter.ini

重启php

代码如下:

$dataPath = '/htdocs/demo/words/';
$badtrie = $dataPath.'/badtrie.dic';
$resTrie = trie_filter_new(); //create an empty trie tree
$wordModel = new Word;
$sensitives = $wordModel::where('id', '>=', 0)->get()->toArray();
if(!file_exists($badtrie)) {
    foreach ($sensitives as $k => $v) {
        trie_filter_store($resTrie, $v['name']);
    }
    trie_filter_save($resTrie, $badtrie);
} else {
    $resTrie = trie_filter_load($badtrie);
}

$word = '骄傲的fl骄傲下贱as234副驾驶的环境金凤凰XJP大师傅发的623暴力膜54防守打法';
$arrRet = trie_filter_search_all($resTrie, $word);

echo "<pre>";
print_r($arrRet);
echo '是否为敏感词:'.($arrRet ? '是': '否');

if($arrRet){
    foreach($arrRet as $k=>$v){
        $params = array();
        $tmp[] = substr($word, $arrRet[$k][0], $arrRet[$k][1]);
    }

    $badword = implode(',',array_unique($tmp));
}

if($badword) {
    $badwords = explode(',', $badword);
    $temp = array();
    for($i = 1; $i < count($badwords); $i++) {
        for($j = count($badwords) - 1; $j >= $i; $j--) {
            if(strlen($badwords[$j]) > strlen($badwords[$j - 1])) {
                $temp = $badwords[$j - 1];
                $badwords[$j - 1] = $badwords[$j];
                $badwords[$j] = $temp;
            }
        }
    }

     foreach($badwords as $k=>$v){
        if($v){
            $word = str_replace($v, "<font color=red>".$v."</font>", $word);
        }
    }

    echo $word."\r\n";
}


trie_filter_free($resTrie);

浏览器打开如下:

可以看到关键词过滤出来了,标红。

 


 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值