php 敏感关键词检测,字符串 - PHP 敏感词违法关键字检测算法方案

最新推荐文章于 2023-11-28 08:38:16 发布

weixin_39623273

最新推荐文章于 2023-11-28 08:38:16 发布

阅读量792

点赞数

文章标签： php 敏感关键词检测

已有6000条关键字，分3批次。

一批为替换 replace，一批为遇到需要审核 censor，最后一批为遇到就禁止发布banned。

设计数据表如下：

mysql> desc tbl_censor;

+-------------+----------------------+------+-----+---------+----------------+

| Field | Type | Null | Key | Default | Extra |

+-------------+----------------------+------+-----+---------+----------------+

| id | smallint(6) unsigned | NO | PRI | NULL | auto_increment |

| censortype | smallint(6) | NO | | 1 | |

| find | varchar(120) | NO | UNI | | |

| replacement | varchar(255) | NO | | | |

| extra | varchar(255) | NO | | | |

| uptime | int(11) | YES | | NULL | |

| enable | int(1) | NO | | 1 | |

+-------------+----------------------+------+-----+---------+----------------+

7 rows in set (0.01 sec)

由于有6000多关键字，使用 foreach 的 strstr?还是preg_match ?

追求效率，每小时提交量为10万多文章。

刚刚写的一种：

phpnamespace app\helpers;

use app\models\other\Censor;

use app\models\other\CensorLog;

class CensorHelper

{

public $id;

public $data;

public $match_banned;

public $match_censor;

public function __construct($id = 'censor')

{

$this->id = $id;

$this->match_banned = [];

$this->match_censor = [];

$this->data = $this->getData();

}

/**

* @description 获取正则表达式

* @return array|mixed

*/

public function getData()

{

$data = \Yii::$app->cache->get($this->id);

if (empty($data)) {

$words = Censor::find()

->where(['enable' => 1])

->andWhere([' != ', 'replacement', ''])

->orderBy(['replacement' => SORT_ASC, 'find' => SORT_DESC])

->asArray()

->all();

$censor = [];

$banned = [];

$replace = [];

foreach ($words as $row) {

switch ($row['replacement']) {

case '{censor}':

$censor[] = $row['find'];

break;

case '{banned}&#

最低0.47元/天解锁文章

weixin_39623273

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
php 敏感关键词检测,字符串 - PHP 敏感词违法关键字检测算法方案

已有6000条关键字，分3批次。一批为替换 replace，一批为遇到需要审核 censor，最后一批为遇到就禁止发布banned。设计数据表如下：mysql> desc tbl_censor;+-------------+----------------------+------+-----+---------+----------------+| Field | Type ...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。