本文基于Elasticsearch7.x版本.
什么是搜索推荐?
当我们在Google上搜索信息时, Google会推荐一些相关联的搜索词, 即使我们拼写错了关键词, Google仍然会推荐出我们想要的内容, 这就是搜索推荐.
Elasticsearch Suggesters
Elasticsearch Suggesters 包含四个功能, 下面分别介绍这四个功能及其Java Client的使用.
Term Suggester
Term Suggester 即推荐相似的term. 原理是对输入的搜索文本进行分词得到term, 然后再去倒排索引中找到相似的term返回, 如果存在相同的term, 就返回为空.
语法
POST 索引名/_search
{
"suggest": {
"任意名称": {
"text": "搜索文本",
"term": {
"field": "字段",
"analyzer": "",
"size": "",
"sort": "",
"suggest_mode": "",
"prefix_length":"",
...
}
}
}
}
(1) analyzer
搜索文本使用的分词器.
(2) size
最多返回几个推荐term.
(3) sort
推荐term的排序.
- score
相关度分数优先. - frequency
相似term的个数优先.
(4) suggest_mode
term的推荐规则.
- missing
如果倒排索引中存在相同的term就不推荐, 默认配置. - popular
只推荐比原来term出现频率更高的term. - always
不管倒排索引中是否存在相同的term都要推荐.(当然如果没有相似的term就返回空)
(5) prefix_length
匹配前缀长度, 默认为1. 默认情况下必须匹配首字母才会进行推荐.
(6) 其他
此外还有很多参数, 可以去官网上查看.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html#term-suggester
实例
(1) 添加数据
POST articles/_bulk
{ "index" : { } }
{ "title": "lucene", "content": "lucene is very cool"}
{ "index" : { } }
{ "title": "elasticsearch", "content": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title": "Elasticsearch", "content": "Elasticsearch rock"}
{ "index" : { } }
{ "title": "ELK", "content": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title": "Elk", "content": "Elk stack rocks"}
{ "index" : {} }
{ "title": "es", "content": "elasticsearch is rock solid"}
(2) 实例一
POST articles/_search
{
"suggest": {
"my_suggestion_01": {
"text": "lucen rocks",
"term": {
"field": "content"
}
},
"my_suggestion_02": {
"text": "wucene",
"term": {
"prefix_length": 0,
"field": "title"
}
}
}
}
结果:
"suggest" : {
"my_suggestion_01" : [
{
"text" : "lucen",
"offset" : 0,
"length" : 5,
"options" : [
{
"text" : "lucene",
"score" : 0.8,
"freq" : 2
}
]
},
{
"text" : "rocks",
"offset" : 6,
"length" : 5,
"options" : [ ]
}
],
"my_suggestion_02" : [
{
"text" : "wucene",
"offset" : 0,
"length" : 6,
"options" : [
{
"text" : "lucene",
"score" : 0.8333333,
"freq" : 1
}
]
}
]
}
(3) 实例二
POST articles/_search
{
"suggest": {
"text": "lucen rocks",
"my_suggestion_01": {
"term": {
"suggest_mode": "always",
"field": "content"
}
},
"my_suggestion_02": {
"term": {
"field": "title"
}
}
}
}
结果:
"suggest" : {
"my_suggestion_01" : [
{
"text" : "lucen",
"offset" : 0,
"length" : 5,
"options" : [
{
"text" : "lucene",
"score" : 0.8,
"freq" : 2
}
]
},
{
"text" : "rocks",
"offset" : 6,
"length" : 5,
"options" : [
{
"text" : "rock",
"score" : 0.75,
"freq" : 2
}
]
}
],
"my_suggestion_02" : [
{
"text" : "lucen",
"offset" : 0,
"length" : 5,
"options" : [
{
"text" : "lucene",
"score" : 0.8,
"freq" : 1
}
]
},
{
"text" : "rocks",
"offset" : 6,
"length" : 5,
"options" : [ ]
}
]
}
Phrase Suggester
Phrase Suggester 即推荐相似的短语. 它在Term Suggester的基础上, 考量多个term之间的关系(比如是否同时出现在索引的原文里, 相邻程度以及词频等), 推荐一个相似的短语.
语法
POST 索引名/_search
{
"suggest": {
"任意名称": {
"text": "搜索文本",
"phrase": {
"field": "字段",
"analyzer": "",
"size": "",
"max_errors": "",
"highlight":"",
...
}
}
}
}
(1) analyzer
搜索文本使用的分词器.
(2) size
最多返回几个推荐短语, 默认5个, 常用的值为3或5.
(3) max_errors
最多可以拼写错的term数, 默认1, 常用值为1或2.
(4) highlight
将短语中推荐的term高亮显示.
(5) 其他
此外还有很多参数, 可以去官网上查看.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html#phrase-suggester
实例
(1) 添加数据
POST articles/_bulk
{ "index" : { } }
{ "title"