关键词过滤扩展,用于检查一段文本中是否出现关键词,基于Double-Array Trie 树实现
一、安装libiconv
这个是libdatrie的依赖项
PHP7.2的用法:
wget http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.14.tar.gz
tar zxvf libiconv-1.14.tar.gz
cd libiconv-1.14
./configure
make
make install
安装如果遇到下面的问题:
In file included from progname.c:26:0:
./stdio.h:1010:1: error: 'gets' undeclared here (not in a function)
_GL_WARN_ON_USE (gets, "gets is a security hole - use fgets instead");
^
make[2]: *** [progname.o] Error 1
make[2]: Leaving directory `/usr/local/directadmin/custombuild/libiconv-1.14/srclib'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/usr/local/directadmin/custombuild/libiconv-1.14/srclib'
make: *** [all] Error 2
*******************************************
*******************************************
Cannot find /usr/local/bin/php
Please recompile php with custombuild, eg:
cd /usr/local/directadmin/custombuild
./build all d
This appears to be a 64-bit system.
a common cause of http/php compile failures is mentioned here:
http://help.directadmin.com/item.php?id=213
*******************************************
*******************************************
解决方案:
cd /tmp/libiconv-1.14/srclib
vim stdio.in.h
复制代码
找到这行内容:
_GL_WARN_ON_USE (gets, "gets is a security hole - use fgets instead");
替换成:
#if defined(__GLIBC__) && !defined(__UCLIBC__) && !__GLIBC_PREREQ(2, 16)
_GL_WARN_ON_USE (gets, "gets is a security hole - use fgets instead");
#endif
注意:结尾的#endif也要包括。
二、安装libdatrie
wget http://www.az1314.cn/uploads/libdatrie.gz
tar zxvf libdatrie.gz
cd libdatrie-0.2.4
./configure --prefix=/usr/local/libdatrie
make
make install
如果报下面的错误:
[root@localhost libdatrie-0.2.4]# ./configure
...
libtool: link: gcc -g -O2 -o .libs/trietool-0.2 trietool.o ../datrie/.libs/libdatrie.so
trietool.o: In function `conv_from_alpha':
/usr/local/src/libdatrie-0.2.4/tools/trietool.c:170: undefined reference to `libiconv'
trietool.o: In function `conv_to_alpha.isra.2.constprop.7':
trietool.c:(.text+0x181): undefined reference to `libiconv'
configure加上下面的参数:
./configure LDFLAGS=-L/usr/local/lib LIBS=-liconv
三、安装trie-filter扩展
git clone https://github.com/zzjin/php-ext-trie-filter
cd php-ext-trie-filter
/usr/bin/phpize7.2
./configure --with-trie_filter=/usr/local/libdatrie
make
make install
sudo vim /etc/php/7.2/mods-available/trie-filter.ini添加extension=trie-filter.so
sudo ln -s /etc/php/7.2/mods-available/trie-filter.ini /etc/php/7.2/cli/conf.d/20-trie-filter.ini
sudo ln -s /etc/php/7.2/mods-available/trie-filter.ini /etc/php/7.2/fpm/conf.d/20-trie-filter.ini
重启php
代码如下:
$dataPath = '/htdocs/demo/words/';
$badtrie = $dataPath.'/badtrie.dic';
$resTrie = trie_filter_new(); //create an empty trie tree
$wordModel = new Word;
$sensitives = $wordModel::where('id', '>=', 0)->get()->toArray();
if(!file_exists($badtrie)) {
foreach ($sensitives as $k => $v) {
trie_filter_store($resTrie, $v['name']);
}
trie_filter_save($resTrie, $badtrie);
} else {
$resTrie = trie_filter_load($badtrie);
}
$word = '骄傲的fl骄傲下贱as234副驾驶的环境金凤凰XJP大师傅发的623暴力膜54防守打法';
$arrRet = trie_filter_search_all($resTrie, $word);
echo "<pre>";
print_r($arrRet);
echo '是否为敏感词:'.($arrRet ? '是': '否');
if($arrRet){
foreach($arrRet as $k=>$v){
$params = array();
$tmp[] = substr($word, $arrRet[$k][0], $arrRet[$k][1]);
}
$badword = implode(',',array_unique($tmp));
}
if($badword) {
$badwords = explode(',', $badword);
$temp = array();
for($i = 1; $i < count($badwords); $i++) {
for($j = count($badwords) - 1; $j >= $i; $j--) {
if(strlen($badwords[$j]) > strlen($badwords[$j - 1])) {
$temp = $badwords[$j - 1];
$badwords[$j - 1] = $badwords[$j];
$badwords[$j] = $temp;
}
}
}
foreach($badwords as $k=>$v){
if($v){
$word = str_replace($v, "<font color=red>".$v."</font>", $word);
}
}
echo $word."\r\n";
}
trie_filter_free($resTrie);
浏览器打开如下:
可以看到关键词过滤出来了,标红。