[Elasticsearch](三)安装本地Elasticsearch的IK分词插件和拼音分词插件

最新推荐文章于 2024-06-23 16:05:17 发布

时间可以改变一切

最新推荐文章于 2024-06-23 16:05:17 发布

阅读量1.1k

点赞数

分类专栏： Java

本文链接：https://blog.csdn.net/wang926454/article/details/99648829

版权

Java 专栏收录该内容

21 篇文章 0 订阅

订阅专栏

目录: https://github.com/dolyw/ProjectStudy/tree/master/Elasticsearch

项目地址

Github：https://github.com/dolyw/ProjectStudy/tree/master/Elasticsearch
Gitee(码云)：https://gitee.com/dolyw/ProjectStudy/tree/master/Elasticsearch

安装本地`Elasticsearch`的IK分词插件

去https://github.com/medcl/elasticsearch-analysis-ik/releases下载对应Elasticsearch版本的IK分词插件elasticsearch-analysis-ik-7.3.0.zip这个文件，打开可以看到如下文件

commons-codec-1.9.jar
commons-logging-1.2.jar
config/
elasticsearch-analysis-ik-7.2.0.jar
httpclient-4.5.2.jar
httpcore-4.4.4.jar
plugin-descriptor.properties
plugin-security.policy

没问题，就解压到你安装的Elasticsearch目录的plugins目录下，例如我的路径是这样的D:\Tools\elasticsearch-7.2.0\plugins\elasticsearch-analysis-ik-7.2.0

重启Elasticsearch，可以看到控制台打印日志

loaded plugin [analysis-ik]

测试一下

POST /_analyze
{
  "text":"中华人民共和国国徽",
  "analyzer":"ik_smart"
}

{
	"tokens": [
		{
			"token": "中华人民共和国",
			"start_offset": 0,
			"end_offset": 7,
			"type": "CN_WORD",
			"position": 0
		},
		{
			"token": "国徽",
			"start_offset": 7,
			"end_offset": 9,
			"type": "CN_WORD",
			"position": 1
		}
	]
}

POST /_analyze
{
  "text":"中华人民共和国国徽",
  "analyzer":"ik_max_word"
}

{
	"tokens": [
		{
			"token": "中华人民共和国",
			"start_offset": 0,
			"end_offset": 7,
			"type": "CN_WORD",
			"position": 0
		},
		{
			"token": "中华人民",
			"start_offset": 0,
			"end_offset": 4,
			"type": "CN_WORD",
			"position": 1
		},
		{
			"token": "中华",
			"start_offset": 0,
			"end_offset": 2,
			"type": "CN_WORD",
			"position": 2
		},
		{
			"token": "华人",
			"start_offset": 1,
			"end_offset": 3,
			"type": "CN_WORD",
			"position": 3
		},
		{
			"token": "人民共和国",
			"start_offset": 2,
			"end_offset": 7,
			"type": "CN_WORD",
			"position": 4
		},
		{
			"token": "人民",
			"start_offset": 2,
			"end_offset": 4,
			"type": "CN_WORD",
			"position": 5
		},
		{
			"token": "共和国",
			"start_offset": 4,
			"end_offset": 7,
			"type": "CN_WORD",
			"position": 6
		},
		{
			"token": "共和",
			"start_offset": 4,
			"end_offset": 6,
			"type": "CN_WORD",
			"position": 7
		},
		{
			"token": "国",
			"start_offset": 6,
			"end_offset": 7,
			"type": "CN_CHAR",
			"position": 8
		},
		{
			"token": "国徽",
			"start_offset": 7,
			"end_offset": 9,
			"type": "CN_WORD",
			"position": 9
		}
	]
}

IK分词插件就这样安装成功了

安装本地`Elasticsearch`的拼音分词插件

去https://github.com/medcl/elasticsearch-analysis-pinyin/releases下载对应Elasticsearch版本的IK分词插件elasticsearch-analysis-pinyin-7.2.0.zip这个文件，打开可以看到如下文件

elasticsearch-analysis-pinyin-7.2.0.jar
nlp-lang-1.7.jar
plugin-descriptor.properties

没问题，就解压到你安装的Elasticsearch目录的plugins目录下，例如我的路径是这样的D:\Tools\elasticsearch-7.2.0\plugins\elasticsearch-analysis-pinyin-7.2.0

重启Elasticsearch，可以看到控制台打印日志

loaded plugin [analysis-pinyin]

测试一下

POST /_analyze
{
  "text":"中华人民共和国国徽",
  "analyzer":"pinyin"
}

{
	"tokens": [
		{
			"token": "zhong",
			"start_offset": 0,
			"end_offset": 0,
			"type": "word",
			"position": 0
		},
		{
			"token": "zhrmghggh",
			"start_offset": 0,
			"end_offset": 0,
			"type": "word",
			"position": 0
		},
		{
			"token": "hua",
			"start_offset": 0,
			"end_offset": 0,
			"type": "word",
			"position": 1
		},
		{
			"token": "ren",
			"start_offset": 0,
			"end_offset": 0,
			"type": "word",
			"position": 2
		},
		{
			"token": "min",
			"start_offset": 0,
			"end_offset": 0,
			"type": "word",
			"position": 3
		},
		{
			"token": "gong",
			"start_offset": 0,
			"end_offset": 0,
			"type": "word",
			"position": 4
		},
		{
			"token": "he",
			"start_offset": 0,
			"end_offset": 0,
			"type": "word",
			"position": 5
		},
		{
			"token": "guo",
			"start_offset": 0,
			"end_offset": 0,
			"type": "word",
			"position": 6
		},
		{
			"token": "guo",
			"start_offset": 0,
			"end_offset": 0,
			"type": "word",
			"position": 7
		},
		{
			"token": "hui",
			"start_offset": 0,
			"end_offset": 0,
			"type": "word",
			"position": 8
		}
	]
}

拼音分词插件就这样安装成功了

使用IK和拼音插件(详细使用可以查看Github的文档)

创建Index，拼音分词过滤

PUT /book
{
	"settings": {
		"analysis": {
			"analyzer": {
				"pinyin_analyzer": {
					"tokenizer": "my_pinyin"
				}
			},
			"tokenizer": {
				"my_pinyin": {
					"type": "pinyin",
					"keep_separate_first_letter": false,
					"keep_full_pinyin": true,
					"keep_original": true,
					"limit_first_letter_length": 16,
					"lowercase": true,
					"remove_duplicated_term": true
				}
			}
		}
	}
}

{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "book"
}

创建Mapping，属性使用过滤，name开启拼音分词，content开启IK分词，describe开启拼音加IK分词

POST /book/_mapping
{
	"properties": {
		"name": {
			"type": "keyword",
			"fields": {
				"pinyin": {
					"type": "text",
					"store": false,
					"term_vector": "with_offsets",
					"analyzer": "pinyin_analyzer",
					"boost": 10
				}
			}
		},
		"content": {
			"type": "text",
			"analyzer": "ik_max_word",
			"search_analyzer": "ik_smart"
		},
		"describe": {
			"type": "text",
			"analyzer": "ik_max_word",
			"search_analyzer": "ik_smart",
			"fields": {
				"pinyin": {
					"type": "text",
					"store": false,
					"term_vector": "with_offsets",
					"analyzer": "pinyin_analyzer",
					"boost": 10
				}
			}
		},
		"id": {
			"type": "long"
		}
	}
}

{
    "acknowledged": true
}

这样Index以及属性分词就开启了

注：搜索时，先查看被搜索的词被分析成什么样的数据，如果你搜索该词输入没有被分析出的参数时，是查不到的！！！

时间可以改变一切

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[Elasticsearch](三)安装本地Elasticsearch的IK分词插件和拼音分词插件

目录:https://github.com/dolyw/Elasticsearch项目地址Github：https://github.com/dolyw/ElasticsearchGitee(码云)：https://gitee.com/dolyw/Elasticsearch安装本地Elasticsearch的IK分词插件去https://github.com/medcl/elas...
复制链接

扫一扫