ELASTICSEARCH 实现即时搜索的一些方式
-
对于电话号码,邮编这种字段,可以把字段设置位keyword,然后搜索的时候用 prefix query,match_phrase_prefix query进行查询
-
对于数字字母汉字组合这种比较长的语句,比如有个字段叫title,内容如下“network学习专区”,如果希望搜索“net”能查到,搜索“专区”也能查到,就有2种方法:
方法1:把这个语句存2分,一份为keyword类型,一份为text类型,当需要即时搜索的时候,就去查keyword,需要模糊搜索的时候就去查text。
方法2:自定义分词器
在索引的settings里设置如下:
"analysis": {
"filter": {
"my_autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10
}
},
"analyzer": {
"my_autocomplete_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"my_autocomplete_filter"
]
}
}
}
然后mappings里指定字段的分词器为my_autocomplete_analyzer:
'title' : {
'type' : 'text',
'analyzer' : 'my_autocomplete_analyzer'
}
可以测试分词结果,执行 curl -H “Content-Type: application/json” -XPOST “http://127.0.0.1:9200/myindex/_analyze?pretty” -d ’ {“text”: “network专区”, “analyzer”: “my_autocomplete_analyzer”}’
结果如下:
{
"tokens" : [
{
"token" : "n",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "ne",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "net",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "netw",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "netwo",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "networ",
"start_offset" : 0,
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "network",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "专",
"start_offset" : 7,
"end_offset" : 8,
"type" : "<IDEOGRAPHIC>",
"position" : 1
},
{
"token" : "区",
"start_offset" : 8,
"end_offset" : 9,
"type" : "<IDEOGRAPHIC>",
"position" : 2
}
]
}