1.下载拼音分词插件,要和安装的es版本保持一致,我的版本是7.9.3
插件源码地址:https://github.com/medcl/elasticsearch-analysis-pinyin
但是找不到相应的releases版本
只有自己下载7.9.3 code
2.下载完成后,用maven进行打包,mvn clean package 进行打包,在releases中会生成zip包
生产的releases zip包发现版本是7.7的
elasticsearch-analysis-pinyin-7.7.0.zip
3.解压改名乘pinyin放入到 es的plugins下,重启es,还是提示版本出错
于是修改plugin-descriptor.properties
version=7.9.3
elasticsearch.version=7.9.3
重启es 正常运行
测试
创建index:
PUT /medcl/
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"pinyin_analyzer": {
"tokenizer": "my_pinyin"
}
},
"tokenizer": {
"my_pinyin": {
"type": "pinyin",
"keep_separate_first_letter": false,
"keep_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"lowercase": true,
"remove_duplicated_term": true
}
}
}
}
}
}
参数说明:
keep_first_letter:启用此选项时,例如:刘德华> ldh,默认值:true
keep_separate_first_letter:启用该选项时,将保留第一个字母分开,例如:刘德华> l,d,h,默认:
假的,注意:查询结果也许是太模糊,由于长期过频
keep_full_pinyin:当启用该选项,例如:刘德华> [ liu,de,hua],默认值:true
keep_original:当启用此选项时,也会保留原始输入,默认值:false
limit_first_letter_length:设置first_letter结果的最大长度,默认值:16
lowercase:小写非中文字母,默认值:true
remove_duplicated_term:当启用此选项时,将删除重复项以保存索引,例如:de的> de,默认值:
false,注意:位置相关查询可能受影响
POST /medcl/_analyze
{
"text": ["刘德华"],
"analyzer": "pinyin_analyzer"
}
POST /medcl/_mapping
{
"properties": {
"name": {
"type": "keyword",
"fields": {
"pinyin": {
"type": "text",
"store": false,
"term_vector": "with_offsets",
"analyzer": "pinyin_analyzer",
"boost": 10
}
}
}
}
}
POST /medcl/_bulk
{"index":{"_index":"medcl"}}
{"name":"刘德华"}
POST /medcl/_search
{
"query":{
"match": {
"name.pinyin": {
"query": "ldh"
}
}
}
}
结果:
{
took: 5
timed_out: false
_shards: {
total: 1
successful: 1
skipped: 0
failed: 0
}-
hits: {
total: {
value: 1
relation: "eq"
}-
max_score: 0.3439677
hits: [1]
0: {
_index: "medcl"
_type: "_doc"
_id: "aNyPaXYBMZ73IDxRFEYg"
_score: 0.3439677
_source: {
name: "刘德华"
}-
}-
-
}-
}