IK安装
注意:ik分词器版本必须与ElaticSearch版本严格一致
https://github.com/medcl/elasticsearch-analysis-ik/tree/6.x
-
2、解压后进入
\elasticsearch-analysis-ik-6.x
目录,使用cmd,maven进行打包 - mvn clean
- mvn compile
- mvn package
-
3、去此
elasticsearch-analysis-ik/target/releases/
目录下找到你对应的ElaticSearch版本的zip,上传值linux -
4、unzip 解压上传的zip,
-
5、将解压出的文件夹,直接移动到ElaticSerach/plugins目录下即可
-
6、重启ElaticSearch
- (安装成功效果图,ik分词器加载成功)
-
测试IK分词器
(ik_smart)(ik_max_word)
//ik_smart
GET _analyze
{
"text": ["中华人民共和国国歌"],
"analyzer": "ik_smart"
}
//执行结果
{
"tokens": [
{
"token": "中华人民共和国",
"start_offset": 0,
"end_offset": 7,
"type": "CN_WORD",
"position": 0
},
{
"token": "国歌",
"start_offset": 7,
"end_offset": 9,
"type": "CN_WORD",
"position": 1
}
]
}
//ik_max_word
GET _analyze
{
"text": ["中华人民共和国国歌"],
"analyzer": "ik_max_word"
}
//执行结果
{
"tokens": [
{
"token": "中华人民共和国",
"start_offset": 0,
"end_offset": 7,
"type": "CN_WORD",
"position": 0
},
{
"token": "中华人民",
"start_offset": 0,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
},
{
"token": "中华",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 2
},
{
"token": "华人",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position": 3
},
{
"token": "人民共和国",
"start_offset": 2,
"end_offset": 7,
"type": "CN_WORD",
"position": 4
},
{
"token": "人民",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 5
},
{
"token": "共和国",
"start_offset": 4,
"end_offset": 7,
"type": "CN_WORD",
"position": 6
},
{
"token": "共和",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 7
},
{
"token": "国",
"start_offset": 6,
"end_offset": 7,
"type": "CN_CHAR",
"position": 8
},
{
"token": "国歌",
"start_offset": 7,
"end_offset": 9,
"type": "CN_WORD",
"position": 9
}
]
}
- 官方回答
ik_max_word
: 会将文本做最细粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”,会穷尽各种可能的组合,适合 Term Query;
ik_smart
: 会做最粗粒度的拆分,比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”,适合 Phrase 查询。