/**
* vm12下的centos7.2
* elasticsearch 5.2.2
*/
有时在淘宝搜索商品的时候, 会发现使用汉字, 拼音, 或者拼音混合汉字都会出来想要的搜索结果, 今天找了一下, 是通过拼音搜索插件实现的:
1), ik的安装之前已经讲过, 不在赘述
2), es2.4版本的安装非常简单, 和ik挺像, 最后在elasticsearch.yml中加上分词配置即可, 也不再说..
原博客: http://blog.csdn.net/hhl2046/article/details/53319637
index: analysis: analyzer: ik: alias: [news_analyzer_ik,ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider ik_analyzer_pinyin: //分词器名称 type: custom // custom表示自己定制 tokenizer: ik // 分割词源的组建, ik filter: [synonym_test_filter,pinyin_mcl] // 对分隔的词源做处理 拼音和同义词 filter: synonym_test_filter: type: synonym_filter synonyms_path: synonym.txt dynamic_reload: true reload_interval: 10s expand: true pinyin_mcl: type: pinyin first_letter: none padding_char: ""
ik: https://github.com/medcl/elasticsearch-analysis-ik
拼音分词器: https://github.com/medcl/elasticsearch-analysis-pinyin
然后, 5.2.2版本 拼音分词 的安装:
1, 下载
https://github.com/medcl/elasticsearch-analysis-pinyin
mvn package
打包成功后, 在 target/releases 下, 可以找到 elasticsearch-analysis-ik-5.2.2.zip
2, 将打包后的zip文件放在 {ES_HOME}/plugins/pinyin/ 目录下, 并解压根目录
3, 测试:
curl -XPUT http://localhost:9200/medcl/ -d' { "index" : { "analysis" : { "analyzer" : { "pinyin_analyzer" : { "tokenizer" : "my_pinyin" } }, "tokenizer" : { "my_pinyin" : { "type" : "pinyin", "keep_separate_first_letter" : false, "keep_full_pinyin" : true, "keep_original" : true, "limit_first_letter_length" : 16, "lowercase" : true, "remove_duplicated_term" : true } } } } }'
http://localhost:9200/medcl/_analyze?text=%e5%88%98%e5%be%b7%e5%8d%8e&analyzer=pinyin_analyzer
分词结果为:
{ "tokens" : [ { "token" : "liu", "start_offset" : 0, "end_offset" : 1, "type" : "word", "position" : 0 }, { "token" : "de", "start_offset" : 1, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "hua", "start_offset" : 2, "end_offset" : 3, "type" : "word", "position" : 2 }, { "token" : "刘德华", "start_offset" : 0, "end_offset" : 3, "type" : "word", "position" : 3 }, { "token" : "ldh", "start_offset" : 0, "end_offset" : 3, "type" : "word", "position" : 4 } ] }
4, 配置 IK + pinyin 分词配置
settings设置:
curl -XPUT "http://localhost:9200/medcl/" -d' { "index": { "analysis": { "analyzer": { "ik_pinyin_analyzer": { "type": "custom", "tokenizer": "ik_smart", "filter": ["my_pinyin", "word_delimiter"] } }, "filter": { "my_pinyin": { "type": "pinyin", "first_letter": "prefix", "padding_char": " " } } } } }'
创建mapping:
curl -XPOST http://localhost:9200/medcl/folks/_mapping -d' { "folks": { "properties": { "name": { "type": "keyword", "fields": { "pinyin": { "type": "text", "store": "no", "term_vector": "with_positions_offsets", "analyzer": "ik_pinyin_analyzer", "boost": 10 } } } } } }'
添加测试文档:
curl -XPOST http://localhost:9200/medcl/folks/andy -d'{"name":"刘德华"}'
curl -XPOST http://localhost:9200/medcl/folks/tina -d'{"name":"中华人民共和国国歌"}'
测试分词效果:
拼音分词效果:
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:liu" curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:de" curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:hua" curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:ldh"
ik分词测试:
curl -XPOST "http://localhost:9200/medcl/_search?pretty" -d' { "query": { "match": { "name.pinyin": "国歌" } }, "highlight": { "fields": { "name.pinyin": {} } } }'
ik + pinyin
curl -XPOST "http://localhost:9200/medcl/_search?pretty" -d' { "query": { "match": { "name.pinyin": "zhonghua" } }, "highlight": { "fields": { "name.pinyin": {} } } }'
参照: http://blog.csdn.net/napoay/article/details/53907921
http://www.jianshu.com/p/653f7b33e63c
https://github.com/medcl/elasticsearch-analysis-pinyin
https://my.oschina.net/xiaohui249/blog/214505