建立ik中文分词器

最新推荐文章于 2022-11-22 13:28:11 发布

@所谓伊人

最新推荐文章于 2022-11-22 13:28:11 发布

阅读量99

点赞数

分类专栏： Elasticsearch 电商网站搭建

本文链接：https://blog.csdn.net/qq_39750835/article/details/119544057

版权

elasticsearch

电商网站搭建同时被 2 个专栏收录

84 篇文章 1 订阅

订阅专栏

Elasticsearch

23 篇文章 1 订阅

订阅专栏

1.下载elasticsearch-analysis-ik

https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v7.4.2

2.上传至linux服务器

3.解压

unzip elasticsearch-analysis-ik-7.4.2.zip -d /usr/local/elasticsearch-7.4.2/plugins/ik
4.进入到/usr/local/elasticsearch-7.4.2/plugins/ik，可以看到已经解压的ik中文分词器

5.切换到普通用户，重启es进程

6.测试

根据github上提供的说明，我们现在可以不用es内置默认的分词器，用elasticsearch-analysis-ik提供的分词器进行测试

6.1 ik_max_word 会做最细腻粒度的拆分，如此，体现出我们上下五千年博大精深的文化。

{
	"tokens": [
		{
			"token": "上下班",
			"start_offset": 0,
			"end_offset": 3,
			"type": "CN_WORD",
			"position": 0
		},
		{
			"token": "上下",
			"start_offset": 0,
			"end_offset": 2,
			"type": "CN_WORD",
			"position": 1
		},
		{
			"token": "下班",
			"start_offset": 1,
			"end_offset": 3,
			"type": "CN_WORD",
			"position": 2
		},
		{
			"token": "班车",
			"start_offset": 2,
			"end_offset": 4,
			"type": "CN_WORD",
			"position": 3
		},
		{
			"token": "车流量",
			"start_offset": 3,
			"end_offset": 6,
			"type": "CN_WORD",
			"position": 4
		},
		{
			"token": "车流",
			"start_offset": 3,
			"end_offset": 5,
			"type": "CN_WORD",
			"position": 5
		},
		{
			"token": "流量",
			"start_offset": 4,
			"end_offset": 6,
			"type": "CN_WORD",
			"position": 6
		},
		{
			"token": "很大",
			"start_offset": 6,
			"end_offset": 8,
			"type": "CN_WORD",
			"position": 7
		}
	]
}

6.2 ik_smart 拆分粒度粗糙

{
	"tokens": [
		{
			"token": "上下班",
			"start_offset": 0,
			"end_offset": 3,
			"type": "CN_WORD",
			"position": 0
		},
		{
			"token": "车流量",
			"start_offset": 3,
			"end_offset": 6,
			"type": "CN_WORD",
			"position": 1
		},
		{
			"token": "很大",
			"start_offset": 6,
			"end_offset": 8,
			"type": "CN_WORD",
			"position": 2
		}
	]
}