概述
搜索一般都会要求具有“搜索推荐”或者叫“搜索补全”的功能,即在用户输入搜索的过程中,进行自动补全或者纠错。以此来提高搜索文档的匹配精准度,进而提升用户的搜索体验,这就是Suggest。
四种Suggester
1 Term Suggester
Term Suggester: term suggester正如其名,只基于tokenizer之后的单个term去匹配建议词,并不会考虑多个term之间的关系,对给入的文本进行分词,为每个词进行模糊查询提供词项建议。(建议对搜索词进行长度控制,超过长度则不会进行TermSuggest,原因也是一般Term Suggester适用于单个词使用 把得分最高的推荐词进行返回代表纠错)
POST <index>/_search
{
"suggest": {
"<suggest_name>": {
"text": "<search_content>",
"term": {
"suggest_mode": "<suggest_mode>",
"field": "<field_name>"
}
}
}
}
2 Phrase Suggester
Phrase Suggester:phrase suggester和term suggester相比,对建议的文本会参考上下文,也就是一个句子的其他token,不只是单纯的token距离匹配,它可以基于共生和频率选出更好的建议。在term的基础上,会考量多个term之间的关系,比如是否同时出现在索引的原文里,相邻程度,以及词频等。
如果说term suggester建议处理单个词的纠错 那么Phrase Suggester就建议作为一整句话的纠错(返回值的suggest列表中返回的也是一整句话)
DELETE test
POST test/_bulk
{ "index" : { "_id":1} }
{"title": "lucene and elasticsearch"}
{ "index" : {"_id":2} }
{"title": "lucene and elasticsearhc"}
{ "index" : { "_id":3} }
{"title": "luceen and elasticsearch"}
POST test/_search
GET test/_mapping
POST test/_search
{
"suggest": {
"text": "Luceen and elasticsearhc",
"simple_phrase": {
"phrase": {
"field": "title.trigram",
"max_errors": 2,
"gram_size": 1,
"confidence":0,
"direct_generator": [
{
"field": "title.trigram",
"suggest_mode": "always"
}
],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
3 completion suggester
自动补全,自动完成,支持三种查询【前缀查询(prefix)模糊查询(fuzzy)正则表达式查询(regex)】 ,主要针对的应用场景就是"Auto Completion"。 此场景下用户每输入一个字符的时候,就需要即时发送一次查询请求到后端查找匹配项,在用户输入速度较高的情况下对后端响应速度要求比较苛刻。因此实现上它和前面两个Suggester采用了不同的数据结构,索引并非通过倒排来完成,而是将analyze过的数据编码成FST和索引一起存放。对于一个open状态的索引,FST会被ES整个装载到内存里的,进行前缀查找速度极快。但是FST只能用于前缀查找,这也是Completion Suggester的局限所在。
DELETE suggest_carinfo
PUT suggest_carinfo
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_max_word",
"fields": {
"suggest": {
"type": "completion",
"analyzer": "ik_max_word"
}
}
},
"content": {
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
POST _bulk
{"index":{"_index":"suggest_carinfo","_id":1}}
{"title":"宝马X5 两万公里准新车","content":"这里是宝马X5图文描述"}
{"index":{"_index":"suggest_carinfo","_id":2}}
{"title":"宝马5系","content":"这里是奥迪A6图文描述"}
{"index":{"_index":"suggest_carinfo","_id":3}}
{"title":"宝马3系","content":"这里是奔驰图文描述"}
{"index":{"_index":"suggest_carinfo","_id":4}}
{"title":"奥迪Q5 两万公里准新车","content":"这里是宝马X5图文描述"}
{"index":{"_index":"suggest_carinfo","_id":5}}
{"title":"奥迪A6 无敌车况","content":"这里是奥迪A6图文描述"}
{"index":{"_index":"suggest_carinfo","_id":6}}
{"title":"奥迪双钻","content":"这里是奔驰图文描述"}
{"index":{"_index":"suggest_carinfo","_id":7}}
{"title":"奔驰AMG 两万公里准新车","content":"这里是宝马X5图文描述"}
{"index":{"_index":"suggest_carinfo","_id":8}}
{"title":"奔驰大G 无敌车况","content":"这里是奥迪A6图文描述"}
{"index":{"_index":"suggest_carinfo","_id":9}}
{"title":"奔驰C260","content":"这里是奔驰图文描述"}
{"index":{"_index":"suggest_carinfo","_id":10}}
{"title":"nir奔驰C260","content":"这里是奔驰图文描述"}
GET suggest_carinfo/_search?pretty
{
"suggest": {
"car_suggest": {
"prefix": "奥迪",
"completion": {
"field": "title.suggest"
}
}
}
}
4 context suggester
完成建议者会考虑索引中的所有文档,但是通常来说,我们在进行智能推荐的时候最好通过某些条件过滤,并且有可能会针对某些特性提升权重。
# context suggester
# 定义一个名为 place_type 的类别上下文,其中类别必须与建议一起发送。
# 定义一个名为 location 的地理上下文,类别必须与建议一起发送
DELETE place
PUT place
{
"mappings": {
"properties": {
"suggest": {
"type": "completion",
"contexts": [
{
"name": "place_type",
"type": "category"
},
{
"name": "location",
"type": "geo",
"precision": 4
}
]
}
}
}
}
PUT place/_doc/1
{
"suggest": {
"input": [ "timmy's", "starbucks", "dunkin donuts" ],
"contexts": {
"place_type": [ "cafe", "food" ]
}
}
}
PUT place/_doc/2
{
"suggest": {
"input": [ "monkey", "timmy's", "Lamborghini" ],
"contexts": {
"place_type": [ "money"]
}
}
}