现代的搜索引擎,一般会具备"Suggest As You Type"功能,即在用户输入搜索的过程中,进行自动补全或者纠错。 通过协助用户输入更精准的关键词,提高后续全文搜索阶段文档匹配的程度。例如在京东上输入部分关键词,甚至输入拼写错误的关键词时,它依然能够提示出用户想要输入的内容
如果自己亲手去试一下,可以看到京东在用户刚开始输入的时候是自动补全的,而当输入到一定长度,如果因为单词拼写错误无法补全,就开始尝试提示相似的词。
那么类似的功能在Elasticsearch里如何实现呢? 答案就在Suggesters API。 Suggesters基本的运作原理是将输入的文本分解为token,然后在索引的字典里查找相似的term并返回。 根据使用场景的不同,Elasticsearch里设计了4种类别的Suggester,分别是:
- Term Suggester
- Phrase Suggester
- Completion Suggester
- Context Suggester
看一个Term Suggester的示例
准备一个叫做blogs的索引,配置一个text字段
PUT /blogs/
{
"mappings": {
"properties": {
"body":{
"type": "text"
}
}
}
}
POST _bulk/?refresh=true
{"index":{"_index":"blogs"}}
{"body":"Lucene is cool"}
{"index":{"_index":"blogs"}}
{"body":"Elasticsearch builds on top of lucene"}
{"index":{"_index":"blogs"}}
{"body":"Elasticsearch rocks"}
{"index":{"_index":"blogs"}}
{"body":"Elastic is the company behind ELK stack"}
{"index":{"_index":"blogs"}}
{"body":"elk rocks"}
{"index":{"_index":"blogs"}}
{"body":"elasticsearch is rock solid"}
- 分析
POST _analyze
{
"text": [
"Lucene is cool",
"Elasticsearch builds on top of lucene",
"Elasticsearch rocks",
"Elastic is the company behind ELK stack",
"elk rocks",
"elasticsearch is rock solid"
]
}
{
"tokens" : [
{
"token" : "lucene",
"start_offset" : 0,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "is",
"start_offset" : 7,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "cool",
"start_offset" : 10,
"end_offset" : 14,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "elasticsearch",
"start_offset" : 15,
"end_offset" : 28,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "builds",
"start_offset" : 29,
"end_offset" : 35,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "on",
"start_offset" : 36,
"end_offset" : 38,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "top",