ElasticSearch学习(四)—— 中文按拼音排序&拼音检索

  • 使用pinyin分词器
  1. 下载拼音分词器https://github.com/medcl/elasticsearch-analysis-pinyin
  2. 解压进入目录elasticsearch-analysis-pinyin
  3. 修改pom.xml中es版本为自己使用的版本
  4. 命令行mvn package 打包
  5. 进入elasticsearch-analysis-pinyin-master\target\releases解压elasticsearch-analysis-pinyin-7.7.0.zip文件
  6. 将解压后的文件拷贝到es安装目录下的 plugins/pinyin 中
  7. 重启es
  8. 修改报错的地方,重复步骤4567
  • 索引设置
PUT /book
{
    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "pinyin_analyzer": {
                        "tokenizer": "my_pinyin"
                    }
                }, 
                "tokenizer": {
                    "my_pinyin": {
                        "type": "pinyin", 
                        "keep_none_chinese": false,
                        "keep_full_pinyin": false,
                        "keep_joined_full_pinyin": true,
                        "keep_none_chinese_in_joined_full_pinyin": true,
                        "keep_first_letter": false,
                        "keep_none_chinese_in_first_letter": false,
                        "none_chinese_pinyin_tokenize": false
                    }
                }
            }
        }
    }
}

'keep_none_chinese' => false, // 对非中文不拆分词

'keep_full_pinyin' => false, // 关闭: 刘德华 -> liu, de, hua

'keep_joined_full_pinyin' => true, // 刘德华 -> liudehua

'keep_none_chinese_in_joined_full_pinyin' => true, // 刘德华2016 -> liudehua2016

'keep_first_letter' => true, // 刘德华 -> ldh

'keep_none_chinese_in_first_letter' => true, // 刘德华2016 -> ldh2016

'none_chinese_pinyin_tokenize' => false, // 没有卵用

keep_separate_first_letter :将字母分割,例如:刘德华> l,d,h,default:false。

keep_full_pinyin :包含全拼拼音,例如:刘德华> [ liu,de,hua],default:true。

limit_first_letter_length :设置first_letter结果的最大长度,default:16。

lowercase :小写非中文字母,default:true。

keep_none_chinese : 不在结果中保留非中文字母或数字,default:true。

  •  设置字段
POST /book/_mapping
{
    "properties": {
        "title": {
          "type": "text",                              
                 "fields": {                                 
                    "keyword": {                            
                      "type": "keyword",
                      "ignore_above": 256
                    },
                    "sort": {
                      "type": "text",
                      "analyzer": "pinyin_analyzer"
                    }
                }
        },
        "author": {                                        
            "type": "text",                              
            "fields": {                                 
                 "keyword": {                            
                    "type": "keyword",
                    "ignore_above": 256
                },
              "sort": {
                "type": "text",
                "analyzer": "pinyin_analyzer"
              }
            }
        }
    }
}

注意:Only text fields support the analyzer mapping parameter.只有text可以设置分词器

  •  检索
GET /book/_search
{
  "query": {
    "match": {
      "title": "测试"
    }
  },
  "from": 0, 
  "size": 20,
  "sort": {
      "title.sort" : "asc"
    }
}
PUT /book
{
    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "ik_smart_pinyin": {
                        "type": "custom",
                        "tokenizer": "ik_smart",
                        "filter": "my_pinyin_filter"
                    },
                    "ik_max_word_pinyin": {
                        "type": "custom",
                        "tokenizer": "ik_max_word",
                        "filter": "my_pinyin_filter"
                    },
                    "pinyin_analyzer": {
                        "tokenizer": "my_pinyin_tokenizer"
                    }
                }, 
                "tokenizer": {
                    "my_pinyin_tokenizer": {
                        "type": "pinyin", 
                        "keep_first_letter": false, 
                        "keep_full_pinyin": false, 
                        "keep_joined_full_pinyin": true, 
                        "keep_none_chinese_in_first_letter": true, 
                        "none_chinese_pinyin_tokenize": false, 
                        "lowercase": true, 
                        "with_tone_number": true
                    }
                },
                "filter": {
                  "my_pinyin_filter": {
                        "type": "pinyin", 
                        "keep_first_letter": false, 
                        "keep_full_pinyin": false, 
                        "keep_joined_full_pinyin": true, 
                        "keep_none_chinese_in_first_letter": true, 
                        "none_chinese_pinyin_tokenize": false, 
                        "lowercase": true, 
                        "with_tone_number": true
                    }
                }
            }
        }
    }
}
  • 字段设置
POST /ancientbook/_mapping
{
    "properties": {
        "title": {                                        
                "type": "text",                     
                "analyzer": "ik_max_word_pinyin",          
                 "fields": {                        
                     "keyword": {                 
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            },
            "author": {                                       
                "type": "text",                     
                "analyzer": "ik_max_word_pinyin",          
                 "fields": {                       
                     "keyword": {                   
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            }
    }
}

该配置可以实现中文/拼音检索,但无法按拼音排序。

  • 使用icu分词器
    • 安装插件
      • es安装目录下查看插件./bin/elasticsearch-plugin list
      • es安装目录下./bin/elasticsearch-plugin install analysis-icu
      • 重启es
    • 索引设置
PUT /book

POST /book/_mapping
{
    "properties": {
        "title": {                                        
                "type": "text",                     
                "analyzer": "icu_analyzer",  
                 "fields": {                        
                     "keyword": {                 
                        "type": "keyword",
                        "ignore_above": 256
                    },
                  "sort": {  
                    "type": "icu_collation_keyword",
                    "index": false,
                    "language": "zh",
                    "country": "CN"
                  }
                }
            },
            "author": {                                       
                "type": "text",                     
                "analyzer": "icu_analyzer",          
                 "fields": {                       
                     "keyword": {                   
                        "type": "keyword",
                        "ignore_above": 256
                    },
                  "sort": {  
                    "type": "icu_collation_keyword",
                    "index": false,
                    "language": "zh",
                    "country": "CN"
                  }
                }
            }
    }
}
  • 检索
GET /book/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "match_all": {
                      
                    }
                }
            ]
        }
    }, 
    "from": 0, 
    "size": 10, 
    "sort": [
        {
            "title.sort": "asc"
        }
    ]
}
  • 同时需要分词检索和按拼音排序使用的是icu分词器
1.es安装目录下查看插件./bin/elasticsearch-plugin list
2.es安装目录下./bin/elasticsearch-plugin install analysis-icu
3.重启es

 

  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值