ES学习记录12——推荐器(Suggester)

本文介绍了Elasticsearch的Suggester功能,它提供了基于文本的相似术语推荐,用于搜索框自动补全。Suggester的工作原理是在用户输入部分搜索词时,给出相关推荐词。请求可以指定多个推荐器,每个推荐器有自己的名字和推荐词。返回结果包括高效标记的推荐词、原始文本偏移量、长度以及评分。全局建议文本的设置可以避免重复推荐,并可以通过定义特定选项来覆盖全局内容。
摘要由CSDN通过智能技术生成

 Suggester(翻译成建议器有点绰,不过很好理解),suggest的特性通过使用建议器suggester推荐给用户正在查找的术语提供基于文本的相似术语term,当前版本的推荐术语的功能还在开发中,并不是齐全的。其实简单的说就是各个搜索引擎提供的自动补全的功能,在搜索框中输入部分搜索词时,下面就有相关推荐的搜索词,上述推荐搜索的动作就是Suggester的工作,类似于利用百度搜索时输入搜索词下方会有相关词的提示:

百度建议器
suggets请求的一部分是伴随着_search请求一同定义的。如:

curl -X POST "localhost:9200/twitter/_search" -H 'Content-Type: application/json' -d'
{
  "query" : {
    "match": {
      "message": "tring out Elasticsearch"
    }
  },
  "suggest" : {
    "my-suggestion" : {
      "text" : "trying out Elasticsearch",
      "term" : {
        "field" : "message"
      }
    }
  }
}
'

// 结果
{
    "took": 22,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 5,
        "max_score": 0.5753642,
        "hits": [
            {
                "_index": "twitter",
                "_type": "_doc",
                "_id": "5",
                "_score": 0.5753642,
                "_source": {
                    "user": "kimchy,5",
                    "likes": 13,
                    "post_date": 1542261564,
                    "message": "trying out Elasticsearch"
                }
            },
            {
                "_index": "twitter",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.5753642,
                "_source": {
                    "user": "kimchy1",
                    "likes": 10,
                    "post_date": 1542197868,
                    "message": "trying out Elasticsearch"
                }
            },
            {
                "_index": "twitter",
                "_type": "_doc",
                "_id": "3",
                "_score": 0.5753642,
                "_source": {
                    "user": "kimchy3",
                    "likes": 8,
                    "post_date": 1542197898,
                    "message": "trying out Elasticsearch"
                }
            },
            {
                "_index": "twitter",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.36464313,
                "_source": {
                    "user": "kimchy2",
                    "likes": 9,
                    "post_date": 1542197883,
                    "message": "trying out Elasticsearch"
                }
            },
            {
                "_index": "twitter",
                "_type": "_doc",
                "_id": "4",
                "_score": 0.36464313,
                "_source": {
                    "user": "kimchy4",
                    "likes": 12,
                    "post_date": 1542197912,
                    "message": "trying out Elasticsearch"
                }
            }
        ]
    },
    "suggest": {
        "my-suggestion": [
            {
                "text": "trying",
                "offset": 0,
                "length": 6,
                "options": []
            },
            {
                "text": "out",
                "offset": 7,
                "length": 3,
                "options": []
            },
            {
                "text": "elasticsearch",
                "offset": 11,
                "length": 13,
                "options": []
            }
        ]
    }
}

每次请求可以指定几个推荐词,每个推荐器使用任意名字作区分,下面栗子中请求了两个推荐词,my-suggest-1my-suggest-2两个推荐词使用了term建议器,但它们text不同:

curl -X POST "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
  "suggest": {
    "my-suggest-1" : {
      "text" : "tring out Elasticsearch",
      "term" : {
        "field" : "message"
      }
    },
    "my-suggest-2" : {
      "text" : "kmichy",
      "term" : {
        "field" : "user"
      }
    }
  }
}
'

返回的结果中,每个推荐部分都包含实体,每个实体都是从推荐文本选出的比较高效的标记(即token,多称之为“分词”),包含了实体文本、在原始文本中的偏移量以及长度(如果发现任意数量的选择,将会通过数组的形式options表现出来),下面是返回的结果:

{
    "took": 10,
    "timed_out": false,
    "_shards": ...,
    "hits": ...,
    "suggest": {
        "my-suggest-1": [
            {
                "text": "trying",
                "offset": 0,
                "length": 6,
                "options": []
            },
            {
                "text": "out",
                "offset": 7,
                "length": 3,
                "options": []
            },
            {
                "text": "elasticsearch",
                "offset": 11,
                "length": 13,
                "options": []
            }
        ],
        "my-suggest-2": [
            {
                "text": "kmichy3",
                "offset": 0,
                "length": 7,
                "options": [
                    {
                        "text": "kimchy3",
                        "score": 0.85714287,
                        "freq": 3
                    },
                    {
                        "text": "kimchy1",
                        "score": 0.71428573,
                        "freq": 1
                    }
                ]
            }
        ]
    }
}

每个数组选项包含了一个options对象,由推荐词和它的文档频率以及和推荐文本相对比的得分,术语推荐器的得分是基于编辑距离评分的。

全局建议文本设置

为了避免重复的推荐文本,推荐定义一个全局文本,下面是一个栗子,定义了一个全局建议文本,并将其应用于my-suggest-1my-suggest-2

curl -X POST "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
  "suggest": {
    "text" : "tring out Elasticsearch",
    "my-suggest-1" : {
      "term" : {
        "field" : "message"
      }
    },
    "my-suggest-2" : {
       "term" : {
        "field" : "user"
       }
    }
  }
}
'

注:推荐词除了向上面那样定义,也可以和推荐suggest的某个特定选项一样指定,在这个级别中的推荐词可以覆盖全局定义的内容。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值