我们在使用搜索栏的时候,会自动补全或者纠错
这样的功能在es中可以使用Suggesters API。
Elasticsearch里设计了4种类别的Suggester
Term Suggester
准备索引库创建索引
PUT /chatting/
{
"mappings": {
"properties": {
"body": {
"type": "text"
}
}
}
}
POST _bulk/?refresh=true
{"index":{"_index":"chatting"}}
{"body":"I am iron man"}
{"index":{"_index":"chatting"}}
{"body":"php is the best language in the world"}
{"index":{"_index":"chatting"}}
{"body":"hello world"}
{"index":{"_index":"chatting"}}
{"body":"how to speaking this word"}
分词器会把这些文档分词,放进词典
执行suggester
POST /chatting/_search
{
"suggest": {
"my-suggestion": {
"text": "hallo word",
"term": {
"suggest_mode": "missing",
"field": "body"
}
}
}
}
如图上面hallo没有在词典中就有建议,而word在词典中能找到就没有建议,因为suggest mode是missing
然后把suggest_mode改成missing
这里因为world在词典中的词频更高,所以这里建议就变成了world
最后吧mode改成alway
这里表示总是会给出相似项
实验过后感觉这种搜索建议对中文的支持比较差,用的ik分词器,不知道有没有其他的方法
Phrase Suggester
Phrase suggester会计算词条是否同时出现在索引文档里面,相邻程度和词频等。
POST /chatting/_search
{
"suggest": {
"my-suggestion": {
"text": "php is the best languages",
"phrase": {
"field": "body",
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
被替换的词条会高亮
Completion Suggester
索引并非通过倒排来完成,而是将analyze过的数据编码成FST和索引一起存放。对于一个open状态的索引,FST会被ES整个装载到内存里的,进行前缀查找速度极快。但是FST只能用于前缀查找,这也是 Completion Suggester的局限所在。
重新定义一下索引库,造数据
PUT /chatting_completion/
{
"mappings": {
"properties": {
"body": {
"type": "completion"
}
}
}
}
POST _bulk/?refresh=true
{"index":{"_index":"chatting_completion"}}
{"body":"I am iron man"}
{"index":{"_index":"chatting_completion"}}
{"body":"php is the best language in the world"}
{"index":{"_index":"chatting_completion"}}
{"body":"hello world"}
{"index":{"_index":"chatting_completion"}}
{"body":"how to speaking this word"}
前缀查询
POST /chatting_completion/_search?pretty
{
"size": 0,
"suggest": {
"blog-suggest": {
"prefix": "i am",
"completion": {
"field": "body"
}
}
}
}
Context Suggester
Context Suggester是Completion Suggester的扩展
可以根据上下文来提供建议值
有两种类型的 Context
- Category - 任意的字符串;
- Geo - 地理位置信息;
创建索引,造数据
这里comments库定义了属性comment和comment_autocomplete,comment_autocomplete的type是completion并且有一个Category类型上下文名字叫comment_category
PUT comments/
{
"mappings": {
"properties": {
"comment": {
"type": "text"
},
"comment_autocomplete": {
"type": "completion",
"contexts": [
{
"type": "category",
"name": "comment_category"
}
]
}
}
}
}
POST comments/_doc
{
"comment":"I love the star war movies",
"comment_autocomplete":{
"input":["star wars"],
"contexts":{
"comment_category":"movies"
}
}
}
POST comments/_doc
{
"comment":"Where can I find a Starbucks",
"comment_autocomplete":{
"input":["starbucks"],
"contexts":{
"comment_category":"coffee"
}
}
}
查找
POST comments/_search
{
"suggest": {
"MY_SUGGESTION": {
"prefix": "sta",
"completion": {
"field": "comment_autocomplete",
"contexts": {
"comment_category": "coffee"
}
}
}
}
}
这里sta在两个文档里面都有,但是查询中指定了上下文在”coffee“里面找,所以查出来是Where can I find a Starbucks