ElasticSearch之中文、拼音分词

1、IK分词(Git)

1.1、IK分词插件安装

/usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.2.3/elasticsearch-analysis-ik-6.2.3.zip
sudo systemctl stop elasticsearch.service
sudo systemctl start elasticsearch.service

PS:如果是集群模式,则每个节点都需要安装;

1.2、本地词库配置

vi /etc/elasticsearch/analysis-ik/IKAnalyzer.cfg.xml

修改"ext_dict",可在/etc/elasticsearch/analysis-ik/目录下新建custom,拷贝fresh.dic。

<entry key="ext_dict">custom/fresh.dic</entry>

重启elasticsearch即可

PS:如果是集群模式,则每个节点都需要配置;

1.3、远程词库配置(热更新)

vi /etc/elasticsearch/analysis-ik/IKAnalyzer.cfg.xml

修改"remote_ext_dict"

<entry key="remote_ext_dict">http://ip:port/products/freshdictrequest</entry>

通过配置远程扩展词典,可以完成热词更新。

实现方式:从词库中查询近一分钟新增热词,若存在,则在OpenResty中修改header的ETag(Etag=os.time())

PS:如果是集群模式,则每个节点都需要配置;

该实现方式有一坑:

IKAnalyzer.cfg.xml中配置后,重启ES,报异常

java.security.AccessControlException: access denied (java.net.SocketPermission ip:port connect,resolve)

此时,需要在

/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64/jre/lib/security/java.policy

中新增信任站点:

permission java.net.SocketPermission "ip:port","accept";

permission java.net.SocketPermission "ip:port","listen";

permission java.net.SocketPermission "ip:port","resolve";

permission java.net.SocketPermission "ip:port","connect";

2、拼音分词(Git)

/usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v6.2.3/elasticsearch-analysis-pinyin-6.2.3.zip
sudo systemctl stop elasticsearch.service
sudo systemctl start elasticsearch.service
PS:如果是集群模式,则每个节点都需要安装;

3、IK-PinYin分词方案(仅供参考,欢迎指正!)

PUT test
{
    "index": {
        "analysis": {
            "analyzer": {
                "ik_pinyin_analyzer": {
                    "type": "custom",
                    "tokenizer": "ik_smart",
                    "filter": [
                        "my_pinyin",
                        "word_delimiter"
                    ]
                }
            },
            "filter": {
                 "my_pinyin" : {
                    "type" : "pinyin",
                    "keep_separate_first_letter" : false,
                    "keep_full_pinyin" : true,
                    "keep_original" : false,
                    "limit_first_letter_length" : 10,
                    "lowercase" : true,
                    "remove_duplicated_term" : true
                }
            }
        }
    }
}

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值