转载自:https://blog.csdn.net/chengyuqiang/article/details/78991570,ES版本号6.3.0
插件安装
离线安装
下载安装包:https://github.com/medcl/elasticsearch-analysis-ik/releases。
进入F:\elkStudy\elasticsearch\elasticsearch-6.3.0\plugins目录下,创建ik目录
将下载的压缩包解压到F:\elkStudy\elasticsearch\elasticsearch-6.3.0\plugins\ik目录下,重启es即可
在线安装
进入F:\elkStudy\elasticsearch\elasticsearch-6.3.0\bin\目录下
在dos窗口键入命令
elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-6.3.0.zip
注:在线安装的ik分词插件的配置文件在F:\elkStudy\elasticsearch\elasticsearch-6.3.0\config目录下
测试IK中文分词器
(1)ik_smart
GET _analyze?pretty
{
"analyzer": "ik_smart",
"text":"安徽省长江流域"
}
返回
{
"tokens": [
{
"token": "安徽省",
"start_offset": 0,
"end_offset": 3,
"type": "CN_WORD",
"position": 0
},
{
"token": "长江流域",
"start_offset": 3,
"end_offset": 7,
"type": "CN_WORD",
"position": 1
}
]
}
(2)ik_max_world
GET _analyze?pretty
{
"analyzer": "ik_max_word",
"text":"安徽省长江流域"
}
返回
{
"tokens": [
{
"token": "安徽省",
"start_offset": 0,
"end_offset": 3,
"type": "CN_WORD",
"position": 0
},
{
"token": "安徽",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 1
},
{
"token": "省长",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 2
},
{
"token": "长江流域",
"start_offset": 3,
"end_offset": 7,
"type": "CN_WORD",
"position": 3
},
{
"token": "长江",
"start_offset": 3,
"end_offset": 5,
"type": "CN_WORD",
"position": 4
},
{
"token": "江流",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 5
},
{
"token": "流域",
"start_offset": 5,
"end_offset": 7,
"type": "CN_WORD",
"position": 6
}
]
}
(3)新词的分词结果
GET _analyze?pretty
{
"analyzer": "ik_smart",
"text": "王者荣耀"
}
返回
{
"tokens": [
{
"token": "王者",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "荣耀",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
}
]
}
扩展已有词典
step1.进入F:\elkStudy\elasticsearch\elasticsearch-6.3.0\plugins\ik\config目录创建custom文件夹
step2.进入F:\elkStudy\elasticsearch\elasticsearch-6.3.0\plugins\ik\config\custom目录,创建文件my_word.dic,并添加内容,注意文件的编码一定要为UTF-8 无Bom编码,老哥卡在这里卡了半天。
王者荣耀
step3.修改F:\elkStudy\elasticsearch\elasticsearch-6.3.0\plugins\ik\config\IKAnalyzer.cfg.xml文件
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict">custom/my_word.dic</entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
step4.重启ES,Kibana
打印出来上述内容,说明自定义词典加载
step5.测试分词
GET _analyze?pretty
{
"analyzer": "ik_smart",
"text": "王者荣耀"
}
返回
{
"tokens": [
{
"token": "王者荣耀",
"start_offset": 0,
"end_offset": 4,
"type": "CN_WORD",
"position": 0
}
]
}