10.completion_suggester

1. Completion Suggester 简介

有关不使用 suggest 者的更灵活的search-as-you-type类型的搜索,请参阅search_as_you_type字段类型。

completion suggester 提供自动completion/search-as-you-type功能。这是一项导航功能,就是提示词功能,可在用户键入内容时指导他们获得相关结果,从而提高搜索精度。它不适用于term suggest或者phrase suggest拼写纠正或“您是否要说”功能。

理想情况下,自动completion功能应与用户键入的速度一样快,以提供与用户已经键入的内容相关的即时反馈。因此,completion suggester 的速度得到了优化。completion suggester使用的数据结构可实现快速查找,但构建成本很高,并且存储在内存中。

In order to understand the format of suggestions, please read the Suggesters page first. For more flexible search-as-you-type searches that do not use suggesters, see the search_as_you_type field type.

The completion suggester provides auto-complete/search-as-you-type functionality. This is a navigational feature to guide users to relevant results as they are typing, improving search precision. It is not meant for spell correction or did-you-mean functionality like the term or phrase suggesters.

Ideally, auto-complete functionality should be as fast as a user types to provide instant feedback relevant to what a user has already typed in. Hence, completion suggester is optimized for speed. The suggester uses data structures that enable fast lookups, but are costly to build and are stored in-memory.

Mapping

使用这个feature需要为字段定义特殊的mapping
To use this feature, specify a special mapping for this field, which indexes the field values for fast completions.

PUT music
{
    "mappings": {
        "properties" : {
            "suggest" : {
                "type" : "completion"
            },
            "title" : {
                "type": "keyword"
            }
        }
    }
}

Copy as cURL
View in Console

Mapping supports the following parameters:

1.analyzer :index analyzer,默认为simple

The index analyzer to use, defaults to simple.

2.search_analyzer: 默认同analyzer

3.preserve_separators
保留分隔符,默认为true。如果禁用,则使用foof进行suggest查找,则可以找到以Foo Fighters开头的字段。

Preserves the separators, defaults to true. If disabled, you could find a field starting with Foo Fighters, if you suggest for foof.

4.preserve_position_increments
启用位置增量,默认为true。如果禁用并且使用stop分析器,则使用字符串"b"进行suggest查询可以获取以"The Beatles"开头的字段。注意:您也可以通过索引两个输入(Beatles 和 The Beatles)来实现此目的,如果您能够丰富数据,则无需更改simple analyzer。

Enables position increments, defaults to true. If disabled and using stopwords analyzer, you could get a field starting with The Beatles, if you suggest for b. Note: You could also achieve this by indexing two inputs, Beatles and The Beatles, no need to change a simple analyzer, if you are able to enrich your data.

5.max_input_length
限制单个输入的长度,默认为50个UTF-16代码点。此限制仅在索引时间使用,以减少每个输入字符串的字符总数,以防止大量输入使基础数据结构膨胀。大多数用例不会受到默认值的影响,因为前缀补全很少会超出几个字符。

Limits the length of a single input, defaults to 50 UTF-16 code points. This limit is only used at index time to reduce the total number of characters per input string in order to prevent massive inputs from bloating the underlying datastructure. Most use cases won’t be influenced by the default value since prefix completions seldom grow beyond prefixes longer than a handful of characters.

2.存储doc文档

和之前普通的doc index一样,注意下面的例子中的suggest字段不是啥特殊字段,只是mapping中定义的field name 是suggest,可以是其他的任何字段。
index的是后可以带一些参数input,weight等

PUT music/_doc/1?refresh
{
    "suggest" : {
        "input": [ "Nevermind", "Nirvana" ],
        "weight" : 34
    }
}

Copy as cURL
View in Console

The following parameters are supported:

1.input: 要存储的输入,可以是字符串数组,也可以只是字符串。此字段是必填字段。
此值不能包含以下UTF-16控制字符:

This value cannot contain the following UTF-16 control characters:

\u0000 (null)
\u001f (information separator one)
\u001e (information separator two)

2.weight: 正整数或包含正整数的字符串,定义权重并允许您对 suggest 进行排名。该字段是可选的。

对于一个doc的多个input 内容可以这样

PUT music/_doc/1?refresh
{
    "suggest" : [
        {
            "input": "Nevermind",
            "weight" : 10
        },
        {
            "input": "Nirvana",
            "weight" : 3
        }
    ]
}

Copy as cURL
View in Console

或者这样

PUT music/_doc/1?refresh
{
  "suggest" : [ "Nevermind", "Nirvana" ]
}

Copy as cURL
View in Console

2. 查询使用

suggest 查询与往常一样工作,但是必须将 suggest 类型指定为completion。 suggest 几乎是实时的,这意味着可以通过refresh使新的 suggest 可见,并且一旦删除就不会显示文档。

POST music/_search?pretty
{
    "suggest": {
        "song-suggest" : { # suggest 名称
            "prefix" : "nir", # 使用的前缀
            "completion" : { # suggest 类型
                "field" : "suggest"  # 对应使用的字段
            }
        }
    }
}

Copy as cURL
View in Console

returns

{
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits": ...
  "took": 2,
  "timed_out": false,
  "suggest": {
    "song-suggest" : [ {
      "text" : "nir",
      "offset" : 0,
      "length" : 3,
      "options" : [ {
        "text" : "Nirvana",
        "_index": "music",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "suggest": ["Nevermind", "Nirvana"]
        }
      } ]
    } ]
  }
}

必须启用_source元字段,这是默认行为,才能启用返回带有 suggest 的_source。

为 suggest 配置的权重以_score的形式返回。text field 使用index 进去的suggest 内容。 suggest 默认情况下返回完整的文档_source。 _source的大小可能会由于磁盘获取和网络传输开销而影响性能。为了节省一些网络开销,请使用源过滤从_source过滤掉不必要的字段,以最小化_source大小。请注意,_suggest端点不支持源过滤,但在_search端点上使用 suggest 可以:

POST music/_search
{
    "_source": "suggest", 
    "suggest": {
        "song-suggest" : {
            "prefix" : "nir",
            "completion" : {
                "field" : "suggest", 
                "size" : 5 
            }
        }
    }
}

Copy as cURL
View in Console

过滤源以仅返回 suggest 字段
在其中搜索 suggest 的字段名称
返回的 suggest 数

{
    "took": 6,
    "timed_out": false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits": {
        "total" : {
            "value": 0,
            "relation": "eq"
        },
        "max_score" : null,
        "hits" : []
    },
    "suggest": {
        "song-suggest" : [ {
            "text" : "nir",
            "offset" : 0,
            "length" : 3,
            "options" : [ {
                "text" : "Nirvana",
                "_index": "music",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "suggest": ["Nevermind", "Nirvana"]
                }
            } ]
        } ]
    }
}

基本completion suggest 程序查询支持以下参数:

The basic completion suggester query supports the following parameters:

1.field: 在其上运行查询的字段的名称(必填)。
2.size: 返回的 suggest 数(默认为5)。
3.skip_duplicates: 是否应过滤掉重复的 suggest (默认为false)。

completion suggest 考虑索引中的所有文档。有关如何查询文档子集的说明,请参见context suggester 。

The completion suggester considers all documents in the index. See Context Suggester for an explanation of how to query a subset of documents instead.

如果completion查询跨越一个以上的分片,则中查找 suggest 会分为两个阶段,后一个阶段是从相关分片中获取查询的结果集,这意味着对单个分片执行completion请求的性能更高。为了获得最佳的suggest查询性能,建议将 suggest 索引到单个分片索引中。如果由于分片太大而导致堆使用率很高,则仍然将 suggest 索引到多个分片,而不是针对completion性能进行优化。

3. 跳过重复的suggestions

Skip duplicate suggestions

查询可以返回来自不同文档的重复 suggest 。通过将skip_duplicates设置为true,可以修改此行为。设置后,此选项从结果中过滤出带有重复 suggest 的文档。

POST music/_search?pretty
{
    "suggest": {
        "song-suggest" : {
            "prefix" : "nor",
            "completion" : {
                "field" : "suggest",
                "skip_duplicates": true
            }
        }
    }
}

设置为true时,此选项可能会减慢搜索速度,因为需要访问更多 suggest 才能找到前N个。

4. Fuzzy queries

completion提示器还支持模糊查询–这意味着您可以在搜索中输入拼写错误,并且仍然可以得到结果。

The completion suggester also supports fuzzy queries — this means you can have a typo in your search and still get results back.

POST music/_search?pretty
{
    "suggest": {
        "song-suggest" : {
            "prefix" : "nor",
            "completion" : {
                "field" : "suggest",
                "fuzzy" : {
                    "fuzziness" : 2
                }
            }
        }
    }
}

Copy as cURL
View in Console

与查询前缀共享最长前缀的 suggest 得分更高。
模糊查询可以采用特定的模糊参数。支持以下参数:

1.fuzziness: 模糊因子,默认为AUTO。有关允许的设置,请参见模糊性。

2.transpositions: 如果设置为true,则位置互换计为一次更改而不是两次更改,默认为true

3.min_length: 返回模糊 suggest 之前的最小输入长度,默认值为3

4.prefix_length: 输入的最小长度(不检查模糊替代项)默认为1

5.unicode_aware: 如果为true,则所有度量(如模糊编辑距离,位置互换和长度)均以Unicode代码数量计算而不是以字节为单位。这比使用原始字节略慢,因此默认情况下将其设置为false。

如果要坚持默认值,但仍要使用Fuzzy,则可以使用Fuzzy:{}或Fuzzy:true。

4. Regex queries

completion提示器还支持正则表达式查询,这意味着您可以将前缀表示为正则表达式

The completion suggester also supports regex queries meaning you can express a prefix as a regular expression

POST music/_search?pretty
{
    "suggest": {
        "song-suggest" : {
            "regex" : "n[ever|i]r",
            "completion" : {
                "field" : "suggest"
            }
        }
    }
}

Copy as cURL
View in Console

The regex query can take specific regex parameters. The following parameters are supported:

flags

Possible flags are ALL (default), ANYSTRING, COMPLEMENT, EMPTY, INTERSECTION, INTERVAL, or NONE. See regexp-syntax for their meaning

max_determinized_states

Regular expressions are dangerous because it’s easy to accidentally create an innocuous looking one that requires an exponential number of internal determinized automaton states (and corresponding RAM and CPU) for Lucene to execute. Lucene prevents these using the max_determinized_states setting (defaults to 10000). You can raise this limit to allow more complex regular expressions to execute.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
回答: Completion Suggester是一种用于关键词前缀匹配的功能,它可以根据用户输入的前缀来提供相关的补全选项。根据引用\[1\]的描述,Completion Suggester在精准程度上比Phrase和Term要好,但在召回率上则相对较低。因此,如果业务需求可以满足,只使用Completion Suggester进行前缀匹配是最理想的选择。然而,使用Completion Suggester并不是一件容易的事情,需要根据数据特性和业务需求,灵活搭配analyzer和mapping参数,并进行反复调试,才能获得理想的补全效果。此外,还可以使用Fuzzy Queries来增加匹配的模糊程度。总之,使用Completion Suggester需要根据具体情况进行调整和优化,以获得最佳的模糊匹配效果。 #### 引用[.reference_title] - *1* *2* [Elasticsearch Suggester详解(自动补全)](https://blog.csdn.net/qq_40374604/article/details/114841800)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down1,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [ElasticSearch suggester](https://blog.csdn.net/zhanglh046/article/details/78536021)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值