在实际搜索需求中,常常需要对中文做拼音搜索,首字母搜索或者中文拼音首字母混合搜索。
比如要对 "广发聚财信用" 这几个中文进行拼音搜索,我们可能的搜索关键字是:“广发”,“聚财”,“guangfa”,“gfjc”,“guangfajucai”,“guangfjc”,“gfajcai”,“广发juc” 等等的混合搜索,本文主要使用elasticsearch-analysis-lc-pinyin 实现针对这种中文和拼音的混合或全拼、首字母的搜索。
该插件主要基于elasticsearch1.4.5开发出来,更高的版本是否支持请自行测试
一、安装插件
git 地址:http://git.oschina.net/music_code_m/elasticsearch-analysis-lc-pinyin
1、用git下载源码
git clone https://git.oschina.net/music_code_m/elasticsearch-analysis-lc-pinyin.git
2、编译
进入elasticsearch-analysis-lc-pinyin根目录执行如下命令
3、安装
执行编译完成后会在releases文件夹下生成elasticsearch-analysis-lc-pinyin-1.4.5.zip压缩包
将该文件拷贝到es的plugin目录下并解压
修改es的config目录下elasticsearch.yml文件声明pinyin插件,在文件末尾加上
接下来重启es就OK啦。
上面的配置定义了lc_pinyin分词器,其中
lc_index:这个分词器用来索引文档时指定的分词器
lc_search:这个分词器用在搜索时指定的分词器
现在就可以测试下分词效果了,
1).首先创建一个索引
2 ) 创建mapping
3) 索引一条数据
4)构造查询条件
搜索:guangfa
curl -XPOST http://localhost:9200/index/_search?pretty -d '
{
"query": {
"match": {
"content": {
"query": "guangfa",
"analyzer": "lc_search",
"type": "phrase",
"slop": 1,
"zero_terms_query": "NONE"
}
}
},
"highlight": {
"pre_tags": [
"",
""
],
"post_tags": [
"",
""
],
"fields": {
"content": {}
}
}
}'
返回结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 4.841117,
"hits" : [ {
"_index" : "index",
"_type" : "fulltext",
"_id" : "1",
"_score" : 4.841117,
"_source": {"content":"广发聚财信用"}
"highlight" : {
"content" : [ "广发聚财信用" ]
}
} ]
}
}
搜索:guang发ju财
curl -XPOST http://localhost:9200/index/_search?pretty -d '
{
"query": {
"match": {
"content": {
"query": "guang发ju财",
"analyzer": "lc_search",
"type": "phrase",
"slop": 1,
"zero_terms_query": "NONE"
}
}
},
"highlight": {
"pre_tags": [
"",
""
],
"post_tags": [
"",
""
],
"fields": {
"content": {}
}
}
}'
{
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 3.6822338,
"hits" : [ {
"_index" : "index",
"_type" : "fulltext",
"_id" : "1",
"_score" : 3.6822338,
"_source":{"content":"广发聚财信用"}
"highlight" : {
"content" : [ "广发聚财信用" ]
}
} ]
}
}
搜索:j财xy
curl -XPOST http://localhost:9200/index/_search?pretty -d '
{
"query": {
"match": {
"content": {
"query": "j财xy",
"analyzer": "lc_search",
"type": "phrase",
"slop": 1,
"zero_terms_query": "NONE"
}
}
},
"highlight": {
"pre_tags": [
"",
""
],
"post_tags": [
"",
""
],
"fields": {
"content": {}
}
}
}'
{
"took" : 22,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 6.682234,
"hits" : [ {
"_index" : "index",
"_type" : "fulltext",
"_id" : "1",
"_score" : 6.682234,
"_source":{"content":"广发聚财信用"}
"highlight" : {
"content" : [ "广发聚财信用" ]
}
} ]
}
}
搜索:gfjc
curl -XPOST http://localhost:9200/index/_search?pretty -d '
{
"query": {
"match": {
"content": {
"query": "gfjc",
"analyzer": "lc_search",
"type": "phrase",
"slop": 1,
"zero_terms_query": "NONE"
}
}
},
"highlight": {
"pre_tags": [
"",
""
],
"post_tags": [
"",
""
],
"fields": {
"content": {}
}
}
}'
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 6.682234,
"hits" : [ {
"_index" : "index",
"_type" : "fulltext",
"_id" : "1",
"_score" : 6.682234,
"_source":{"content":"广发聚财信用"}
"highlight" : {
"content" : [ "广发聚财信用" ]
}
} ]
}
}