- ik分词器安装
下载ik分词器:https://github.com/medcl/elasticsearch-analysis-ik [ik与Elasticsearch版本一致]
把elasticsearch-analysis-ik-5.4.3.zip,解压后的文件拷贝到elasticsearch-5.4.3/plugins/。
mkdir /opt/ik unzip elasticsearch-analysis-ik-5.4.3.zip -d /opt/ik mv /opt/ik {ES_HOME}/plugins
重启es,ik分词器安装完成
- pinyin分词器安装
pinyin分词器安装,相对复杂。要自己进行源码的编译打包。
下载源码、编译源码:
git clone https://github.com/medcl/elasticsearch-analysis-pinyin.git cd elasticsearch-analysis-pinyin mvn clean install -Dmaven.test.skip
安装pinyin分词器:
cd target/releases unzip elasticsearch-analysis-pinyin-5.5.1.zip mv elasticsearch elasticsearch-analysis-pinyin mv elasticsearch-analysis-pinyin {ES_HOME}/plugins
重启es,pinyin分词器安装完成
- 创建索引[index]
创建索引,并设置index分析器相关属性:
curl -XPUT "http://localhost:9200/medcl/" -d' { "index": { "analysis": { "analyzer": { "ik_pinyin_analyzer": { "type": "custom", "tokenizer": "ik_smart", "filter": ["my_pinyin", "word_delimiter"] } }, "filter": { "my_pinyin": { "type": "pinyin", "first_letter": "prefix", "padding_char": " " } } } } }'
- 创建类型[mapping]
创建一个type并设置mapping:
curl -XPOST http://localhost:9200/medcl/folks/_mapping -d' { "folks": { "properties": { "name": { "type": "keyword", "fields": { "pinyin": { "type": "text", "store": "no", "term_vector": "with_positions_offsets", "analyzer": "ik_pinyin_analyzer", "boost": 10 } } } } } }'
- 创建文档
创建两份文档
curl -XPOST http://localhost:9200/medcl/folks/andy -d'{"name":"刘德华"}' curl -XPOST http://localhost:9200/medcl/folks/tina -d'{"name":"中华人民共和国国歌"}'
- 测试pinyin分词
下面四个查询请求都能查询出“刘德华”
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:liu" curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:de" curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:hua" curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:ldh"
查询结果示例:
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.85669875, "hits": [ { "_index": "medcl", "_type": "folks", "_id": "andy", "_score": 0.85669875, "_source": { "name": "刘德华" } } ] } }
- 测试ik分词
发送请求:
curl -XPOST "http://172.30.250.164:9200/medcl/_search?pretty" -d' { "query": { "match": { "name.pinyin": "国歌" } }, "highlight": { "fields": { "name.pinyin": {} } } }'
返回结果:
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 9.507006, "hits" : [ { "_index" : "medcl", "_type" : "folks", "_id" : "tina", "_score" : 9.507006, "_source" : { "name" : "中华人民共和国国歌" }, "highlight" : { "name.pinyin" : [ "<em>中华人民共和国</em><em>国歌</em>" ] } } ] } }
- 测试ik+pin分词
发送请求:
curl -XPOST "http://localhost:9200/medcl/_search?pretty" -d' { "query": { "match": { "name.pinyin": "zhonghua" } }, "highlight": { "fields": { "name.pinyin": {} } } }'
返回结果:
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 6.188843, "hits" : [ { "_index" : "medcl", "_type" : "folks", "_id" : "tina", "_score" : 6.188843, "_source" : { "name" : "中华人民共和国国歌" }, "highlight" : { "name.pinyin" : [ "<em>中华人民共和国</em>国歌" ] } }, { "_index" : "medcl", "_type" : "folks", "_id" : "3", "_score" : 3.0490103, "_source" : { "@timestamp" : "2017-07-13T06:42:00.203Z", "last_modify_time" : "2017-07-13T02:52:53.000Z", "name" : "可能猜到可以使用iterator来删除循环中的元素", "@version" : "1", "id" : 3, "type" : "jdbc" }, "highlight" : { "name.pinyin" : [ "可能猜到可以使用iterator来删除循<em>环中</em>的元素" ] } }, { "_index" : "medcl", "_type" : "folks", "_id" : "andy", "_score" : 0.22534128, "_source" : { "name" : "刘德华" }, "highlight" : { "name.pinyin" : [ "<em>刘德华</em>" ] } } ] } }
Ps:由于测试库多加几个文档,可以忽略返回结果中的,第二条结果。在该博客中并没有加入。