Suggester(翻译成建议器有点绰,不过很好理解),suggest的特性通过使用建议器suggester推荐给用户正在查找的术语提供基于文本的相似术语term
,当前版本的推荐术语的功能还在开发中,并不是齐全的。其实简单的说就是各个搜索引擎提供的自动补全的功能,在搜索框中输入部分搜索词时,下面就有相关推荐的搜索词,上述推荐搜索的动作就是Suggester的工作,类似于利用百度搜索时输入搜索词下方会有相关词的提示:
suggets请求的一部分是伴随着_search
请求一同定义的。如:
curl -X POST "localhost:9200/twitter/_search" -H 'Content-Type: application/json' -d'
{
"query" : {
"match": {
"message": "tring out Elasticsearch"
}
},
"suggest" : {
"my-suggestion" : {
"text" : "trying out Elasticsearch",
"term" : {
"field" : "message"
}
}
}
}
'
// 结果
{
"took": 22,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 0.5753642,
"hits": [
{
"_index": "twitter",
"_type": "_doc",
"_id": "5",
"_score": 0.5753642,
"_source": {
"user": "kimchy,5",
"likes": 13,
"post_date": 1542261564,
"message": "trying out Elasticsearch"
}
},
{
"_index": "twitter",
"_type": "_doc",
"_id": "1",
"_score": 0.5753642,
"_source": {
"user": "kimchy1",
"likes": 10,
"post_date": 1542197868,
"message": "trying out Elasticsearch"
}
},
{
"_index": "twitter",
"_type": "_doc",
"_id": "3",
"_score": 0.5753642,
"_source": {
"user": "kimchy3",
"likes": 8,
"post_date": 1542197898,
"message": "trying out Elasticsearch"
}
},
{
"_index": "twitter",
"_type": "_doc",
"_id": "2",
"_score": 0.36464313,
"_source": {
"user": "kimchy2",
"likes": 9,
"post_date": 1542197883,
"message": "trying out Elasticsearch"
}
},
{
"_index": "twitter",
"_type": "_doc",
"_id": "4",
"_score": 0.36464313,
"_source": {
"user": "kimchy4",
"likes": 12,
"post_date": 1542197912,
"message": "trying out Elasticsearch"
}
}
]
},
"suggest": {
"my-suggestion": [
{
"text": "trying",
"offset": 0,
"length": 6,
"options": []
},
{
"text": "out",
"offset": 7,
"length": 3,
"options": []
},
{
"text": "elasticsearch",
"offset": 11,
"length": 13,
"options": []
}
]
}
}
每次请求可以指定几个推荐词,每个推荐器使用任意名字作区分,下面栗子中请求了两个推荐词,my-suggest-1
和my-suggest-2
两个推荐词使用了term
建议器,但它们text
不同:
curl -X POST "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
"suggest": {
"my-suggest-1" : {
"text" : "tring out Elasticsearch",
"term" : {
"field" : "message"
}
},
"my-suggest-2" : {
"text" : "kmichy",
"term" : {
"field" : "user"
}
}
}
}
'
返回的结果中,每个推荐部分都包含实体,每个实体都是从推荐文本选出的比较高效的标记(即token
,多称之为“分词”),包含了实体文本、在原始文本中的偏移量以及长度(如果发现任意数量的选择,将会通过数组的形式options
表现出来),下面是返回的结果:
{
"took": 10,
"timed_out": false,
"_shards": ...,
"hits": ...,
"suggest": {
"my-suggest-1": [
{
"text": "trying",
"offset": 0,
"length": 6,
"options": []
},
{
"text": "out",
"offset": 7,
"length": 3,
"options": []
},
{
"text": "elasticsearch",
"offset": 11,
"length": 13,
"options": []
}
],
"my-suggest-2": [
{
"text": "kmichy3",
"offset": 0,
"length": 7,
"options": [
{
"text": "kimchy3",
"score": 0.85714287,
"freq": 3
},
{
"text": "kimchy1",
"score": 0.71428573,
"freq": 1
}
]
}
]
}
}
每个数组选项包含了一个options
对象,由推荐词和它的文档频率以及和推荐文本相对比的得分,术语推荐器的得分是基于编辑距离评分的。
全局建议文本设置
为了避免重复的推荐文本,推荐定义一个全局文本,下面是一个栗子,定义了一个全局建议文本,并将其应用于my-suggest-1
和my-suggest-2
:
curl -X POST "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
"suggest": {
"text" : "tring out Elasticsearch",
"my-suggest-1" : {
"term" : {
"field" : "message"
}
},
"my-suggest-2" : {
"term" : {
"field" : "user"
}
}
}
}
'
注:推荐词除了向上面那样定义,也可以和推荐suggest
的某个特定选项一样指定,在这个级别中的推荐词可以覆盖全局定义的内容。