https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v7.4.2
2.上传至linux服务器
3.解压
unzip elasticsearch-analysis-ik-7.4.2.zip -d /usr/local/elasticsearch-7.4.2/plugins/ik
4.进入到/usr/local/elasticsearch-7.4.2/plugins/ik,可以看到已经解压的ik中文分词器
5.切换到普通用户,重启es进程
6.测试
根据github上提供的说明,我们现在可以不用es内置默认的分词器,用elasticsearch-analysis-ik提供的分词器进行测试
6.1 ik_max_word 会做最细腻粒度的拆分,如此,体现出我们上下五千年博大精深的文化。
返回:
{
"tokens": [
{
"token": "上下班",
"start_offset": 0,
"end_offset": 3,
"type": "CN_WORD",
"position": 0
},
{
"token": "上下",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 1
},
{
"token": "下班",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position": 2
},
{
"token": "班车",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 3
},
{
"token": "车流量",
"start_offset": 3,
"end_offset": 6,
"type": "CN_WORD",
"position": 4
},
{
"token": "车流",
"start_offset": 3,
"end_offset": 5,
"type": "CN_WORD",
"position": 5
},
{
"token": "流量",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 6
},
{
"token": "很大",
"start_offset": 6,
"end_offset": 8,
"type": "CN_WORD",
"position": 7
}
]
}
6.2 ik_smart 拆分粒度粗糙
返回:
{
"tokens": [
{
"token": "上下班",
"start_offset": 0,
"end_offset": 3,
"type": "CN_WORD",
"position": 0
},
{
"token": "车流量",
"start_offset": 3,
"end_offset": 6,
"type": "CN_WORD",
"position": 1
},
{
"token": "很大",
"start_offset": 6,
"end_offset": 8,
"type": "CN_WORD",
"position": 2
}
]
}