敏感词过滤

最新推荐文章于 2024-02-18 12:42:32 发布

趴在树上写代码的猫

最新推荐文章于 2024-02-18 12:42:32 发布

阅读量406

点赞数

分类专栏： C/C++

本文链接：https://blog.csdn.net/kinly_jiang/article/details/80801071

版权

C/C++ 专栏收录该内容

37 篇文章 0 订阅

订阅专栏

敏感词过滤&防沉迷，基本所有上线系统都会用的基础功能吧，网上挺多代码的，也很多种方式，看的有点绕

写了个自己感觉比较干净清楚的，结构也比较简单，记录一下

insert 和 censor（审查、检查）都是递归，效率还好，win release版本5700的样本，length = 1000的censor_str 大概0.2ms

支持比如敏感词库有 ab  abcd  检查 abc 这种 （网上下下来的试了几个好像都不支持，觉得这种还是有必要的）

struct mgc_node 
{
    std::unordered_map<char, mgc_node> _node;   // 递归节点
    bool _end;                        // 整词结尾

    mgc_node()
        :_end(false)
    {

    }

    void censor(string& source, size_t& pos, size_t& cnt, size_t& last_cnt)
    {
        auto it = source.begin() + pos;
        if (it == source.end())
            return ;

        auto subit = _node.find(*it);
        if (subit == _node.end())
            return ;

        cnt += 1; // 匹配到一个

        if (subit->second._end)  // 是结尾
            last_cnt = cnt;

        pos += 1;

        subit->second.censor(source, pos, cnt, last_cnt);
        return;
    }
};

class mgc
{
private:
    mgc_node _root;
public:
    bool insert(const string& source)
    {
        if (source.empty())
            return true;

        mgc_node* _curr = &_root;

        size_t len = source.length();
        for (size_t i = 0; i < len; ++i)
        {
            mgc_node& _next = _curr->_node[source[i]];

            _curr = &_next;

            if (i + 1 == len)
                _curr->_end = true;
        }
        return true;
    }

    std::string censor(const string& source)
    {
        std::stringstream ss;

        int lenght = source.size();
        for (int i = 0; i < lenght; )
        {
            string substring = source.substr(i);

            size_t pos = 0;        // 递归使用，字符串开始匹配的位置
            size_t cnt = 0;        // 递归使用，匹配到的字符数
            size_t last_cnt = 0;   // 递归&返回使用，整词的字符数
            _root.censor(substring, pos, cnt, last_cnt);

            if (last_cnt > 0)
            {
                ss << "*";
                i += last_cnt;
            }
            else
            {
                ss << source.at(i); 
                i += 1;
            }
        }
        return std::move(ss.str());
    }
};