php的敏感词的过滤类(不依赖扩展)

废话不多**,直接上货:

<?php
class Logic_BlackWord
{
  const APP_FORUM = 1;
  const APP_BLOG  = 2;
  const APP_VOTE  = 3;
  /**
   * 过滤得到禁词
   * @param unknown $txt
   * @return Ambigous <multitype:, unknown>
   */
  public function getHitList($txt)
  {
    $hitList = array();
    //对禁词分批过滤
    $max = $this->getMax();
    if($max)
    {
      $size = 1000;
      $last = ceil($max/$size);
      for($page=1;$page<=$last;$page++)
      {
        $result = $this->getHitListByPage($txt,$page,$size);
        if($result) $hitList = array_merge($hitList,$result);
      }
    }
    $hitList2 = array();
    foreach($hitList as $hit=>$type)
    {
      $hitList2[$type][] = $hit;
    }
    return $hitList2;
  }
  private function getMax()
  {
    $redis = Rds::factory();
    $memKey = 'blackWord_max';
    $max = $redis->get($memKey);
    if($max===false)
    {
      $max = 0;
      $blackWord = new Model_BlackWord_BlackWord();
      $para['field'] = "MAX(id) AS max";
      $result = $blackWord->search($para);
      if(isset($result[0]['max'])) $max = $result[0]['max'];
      $redis->setex($memKey,300,$max);
    }
    return $max;
  }
  /**
   * 分批过滤得到禁词
   * @param unknown $txt
   * @param number $page
   * @param number $size
   * @return multitype:Ambigous <multitype:unknown, multitype:arr >
   */
  private function getHitListByPage($txt,$page=1,$size=1000)
  {
    $hitList = array();
    //分批得到禁词树
    $wordTree = $this->getWordTreeByPage($page,$size);
    $txt = strip_tags($txt);
    $txt = preg_replace('/[^a-zA-Z0-9\\x{4e00}-\\x{9fa5}]/iu','',$txt);
    $len = mb_strlen($txt,'UTF-8');
    for($i=0;$i<$len;$i++)
    {
      $char = mb_substr($txt,$i,1,'UTF-8');
      if(isset($wordTree[$char]))
      {
        $result = $this->getHitListByTree(mb_substr($txt,$i,50,'UTF-8'),$wordTree);
        if($result)
        {
          foreach($result as $hit=>$type)
          {
            $hitList[$hit] = $type;
          }
        }
      }
    }
    return $hitList;
  }
  /**
   * 是否禁词
   * @param str $txt
   * @param arr $wordTree
   * @return multitype:unknown
   */
  private function getHitListByTree($txt,&$wordTree)
  {
    $len = mb_strlen($txt,'UTF-8');
    $point = & $wordTree;
    $hit = '';
    $hitList = array();
    for($i=0;$i<$len;$i++)
    {
      $char = mb_substr($txt,$i,1,'UTF-8');
      if(isset($point[$char]))
      {
        $hit .= $char;
        $point = & $point[$char];
        if(isset($point['type']))//匹配成功
        {
          $hitList[$hit] = $point['type'];
        }
      }
      else
      {
        break;
      }
    }
    return $hitList;
  }
  /**
   * 分批得到禁词树
   * @param int $page
   * @param int $size
   * @return arr:
   */
  private function getWordTreeByPage($page=1,$size=1000)
  {
    $redis = Rds::factory();
    $memKey = 'blackWord_tree_'.$page.'_'.$size;
    $wordTree = $redis->get($memKey);
    if($wordTree===false)
    {
      $wordTree = array();
      $blackWord = new Model_BlackWord_BlackWord();
      $start = ($page-1)*$size;
      $end = $start + $size;
      $para['where'] = "status=1 AND id>".$start." AND id<=".$end;
      $result = $blackWord->search($para);
      if($result)
      {
        foreach($result as $value)
        {
          if($value['word'])
          {
            $value['word'] = preg_split('/(?<!^)(?!$)/u',$value['word']);
            $point = & $wordTree;
            foreach($value['word'] as $char)
            {
              $point = & $point[$char];
            }
            $point['type'] = $value['type'];
          }
        }
      }
      $redis->setex($memKey,300,$wordTree);
    }
    return $wordTree;
  }
}

文章转载于:http://www.thinkphp.cn/extend/1121.html

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值