使用PHP对非法内容进行检查

最新推荐文章于 2023-05-20 17:23:30 发布

黑夜路人

最新推荐文章于 2023-05-20 17:23:30 发布

阅读量2.3k

点赞数

文章标签： php string list file function semaphore

本文链接：https://blog.csdn.net/heiyeshuwu/article/details/586976

版权

使用PHP对非法内容进行检查

* 作者：heiyeluren
* 时间：2006-01-23
* Blog: http://blog.csdn.net/heiyeshuwu

【程序思路】
大致的思路比较弱智了，就是一个文件列表中保存了非法关键字的内容，一行一个，然后在程序中读取非法关键字跟用户输入内容进行正则匹配，如果匹配或者模糊匹配发现是非法关键字，则提示用户。关键字列表可能有普通的，只是不允许全字匹配的，使用精确匹配，还有一种就是绝对不能出现的，使用模糊匹配。

【实现思路】
使用了三个函数，一个是精确匹配函数、一个是模糊匹配函数，一个是调用这两个函数的接口函数，基本没有技术含量，使用正则，只是简单的实现，原来还考虑如果长期这样使用，那么就使用php来申请一块内存，把文件列表放进去，但是测试发现，其实如果小数据量的话，直接使用也没有太大问题，效率还比较高。遍历非法关键字列表的话，最好能够使用一些诸如二分法之类的算法来遍历，这样能够最快速的找到非法内容，当然，对关键字进行排序是比较麻烦的，我尝试了，对UTF-8和GBK的排序结果非常奇怪，有兴趣的可以研究一下。
提示：PHP中的“Semaphore, Shared Memory and IPC Functions”和“Shared Memory Functions”两组函数能够申请共享内存，有兴趣的可以去研究一下。

【程序代码】

//======================================================================
//
// 函数：string_filter($string, $match_type=1)
// 功能：过滤非法内容
// 参数：
// $string 需要检查的字符串
// $match_type 匹配类型,1为精确匹配, 2为模糊匹配，默认为1
//
// 返回：有非法内容返回True，无非法内容返回False
// 其他：非法关键字列表保存在txt文件里, 分为普通非法关键字和严重非法关键字两个列表
// 作者：heiyeluren
// 时间：2006-1-18
//
//======================================================================
function lib_lawless_string_filter($string, $match_type=1)
{
//字符串空直接返回为非法
$string = trim($string);
if (empty($string))
{
return false;
}
//获取重要关键字列表和普通关键字列表
$common_file = "common_list.txt"; //通用过滤关键字列表
$signify_file = "signify_list.txt"; //重要过滤关键字列表

//如果任何列表文件不存在直接返回false，否则把两个文件列表读取到两个数组里
if (!file_exists($common_file) || !file_exists($signify_file))
{
return false;
}
$common_list = file($common_file);
$signify_list = file($signify_file);

//精确匹配
if ($match_type == 1)
{
$is_lawless = exact_match($string, $common_list);
}

//模糊匹配
if ($match_type == 2)
{
  $is_lawless = blur_match($string, $common_list, $signify_list);
}

//判断检索结果数组中是否有数据，如果有，证明是非法的
if (is_array($is_lawless) && !empty($is_lawless))
{
  return true;
}
else
{
  return false;
}
}

//---------------------
// 精确匹配,为过滤服务
//---------------------
function exact_match($string, $common_list)
{
$string = trim($string);
$string = lib_replace_end_tag($string);

//检索普通过滤关键字列表
foreach($common_list as $block)
{
  $block = trim($block);
  if (preg_match("/^$string$/i", $block))
  {
   $blist[] = $block;
  }
}
//判断有没有过滤内容在数组里
if (!empty($blist))
{
  return array_unique($blist);
}

return false;
}

//----------------------
// 模糊匹配,为过滤服务
//----------------------
function blur_match($string, $common_list, $signify_list)
{
$string = trim($string);
$s_len = strlen($string);
$string = lib_replace_end_tag($string);

//检索普通过滤关键字列表
foreach($common_list as $block)
{
  $block = trim($block);
  if (preg_match("/^$string$/i", $block))
  {
   $blist[] = $block;
  }
}
//检索严重过滤关键字列表
foreach($signify_list as $block)
{
  $block = trim($block);
  if ($s_len>=strlen($block) && preg_match("/$block/i", $string))
  {
   $blist[] = $block;
  }
}
//判断有没有过滤内容在数组里
if (!empty($blist))
{
  return array_unique($blist);
}

return false;
}

//--------------------------
// 替换HTML尾标签,为过滤服务
//--------------------------
function lib_replace_end_tag($string)
{
if (empty($string)) return false;
$string = htmlspecialchars($string);
$string = str_replace( '/', "", $string);
$string = str_replace("//", "", $string);

return $string;

//HTML标签，可以作为扩展过滤
/*
$tags = array("/html", "/head", "/body", "/div", "/span", "/DOCTYPE", "/title", "/link", "/meta", "/style", "/p", "/h1,", "/h2,", "/h3,", "/h4,", "/h5,", "/h6", "/strong", "/em", "/abbr", "/acronym", "/address", "/bdo", "/blockquote", "/cite", "/q", "/code", "/ins", "/del", "/dfn", "/kbd", "/pre", "/samp", "/var", "/br", "/a", "/img", "/area", "/map", "/object", "/param", "/ul", "/ol", "/li", "/dl", "/dt", "/dd", "/table", "/tr", "/td", "/th", "/tbody", "/thead", "/tfoot", "/col", "/colgroup", "/caption", "/form", "/input", "/textarea", "/select", "/option", "/optgroup", "/button", "/label", "/fieldset", "/legend", "/script", "/noscript", "/b", "/i", "/tt", "/sub", "/sup", "/big", "/small", "/hr" );
*/
}

wirte by heiyeluren
2006-01-23