es 高亮显示_es高亮显示-CSDN博客

本文链接：https://blog.csdn.net/hyhanyu/article/details/98492920

es 支持3中高亮显示

Unified highlighteredit：

The unified highlighter使用Lucene Unified Highlighter。这个突出显示器将文本分成句子，并使用BM25算法对单个句子进行评分，就好像它们是语料库中的文档一样。它还支持准确的短语和多项（模糊，前缀，正则表达式）突出显示。这是默认的highlighter。

Plain highlighteredit：

The plain highlighter使用标准的Lucene荧光笔。它试图在词语重要性和短语查询中的任何单词定位标准方面反映查询匹配逻辑。

注意点：

The plain highlighter最适合在单个字段中突出显示简单查询匹配。为了准确反映查询逻辑，它会创建一个微小的内存索引，并通过Lucene的查询执行计划程序重新运行原始查询条件，以访问当前文档的低级匹配信息。对于需要突出显示的每个字段和每个文档都会重复此操作。如果要在复杂查询的大量文档中突出显示很多字段，我们建议在发布或term_vector字段上使用Unified highlighter。

Fast vector highlighteredit：

The fvh highlighte使用 the Lucene Fast Vector highlighter。此高亮显示器可用于在映射中将term_vector设置为with_positions_offsets的字段。

The fvh highlighte：

可以使用boundary_scanner进行自定义。

需要将term_vector设置为with_positions_offsets，这会增加索引的大小

可以将来自多个字段的匹配组合成一个结果。见matching_fields

可以为不同位置的匹配分配不同的权重，允许在突出显示提升词组匹配的提升查询时，将词组匹配等术语排序在术语匹配之上

注意：

The fvh highlighte不支持跨度查询。如果您需要支持跨度查询，请尝试使用其他highlighter，例如the unified highlighter。

number_of_fragments

要返回的最大片段数。如果片段数设置为0，则不返回任何片段。而是突出显示并返回整个字段内容。当您需要突出显示标题或地址等短文本时，这可能很方便，但不需要分段。如果number_of_fragments为0，则忽略fragment_size。默认为5。

突出显示的片段的大小（以字符为单位）默认为100。

注意：

因为高亮搜索会对文本进行分句，如果当前高亮显示的搜索所在的一句里有.30个字符串，fragment_size为70 但是下局有50个字符串。es 可以担心显示不全的问题，就只会显示当前30个字符串。如果下局小于40，就会继续显示。

PUT test_index

{

"mappings": {

"_doc": {

"properties": {

"content" : {

"type" : "text",

"analyzer" : "english",

"term_vector" : "with_positions_offsets",

"fields":{

"plain":{

"type" : "text",

"analyzer" : "standard",

"term_vector" : "with_positions_offsets"

}

PUT test_index/_doc/doc1

{

"content" : "For you I'm only a fox like a hundred thousand other foxes. But if you tame me, we'll need each other. You'll be the only boy in the world for me. I'll be the only fox in the world for you."

}

GET test_index/_search

{

"query": {

"match_phrase": {"content" : "fox like"}

"highlight": {

"type" : "unified",

"number_of_fragments" : 3,

"fragment_size": "30",

"fields": {

"content": {}

}

结果一

"highlight" : {

"content" : [

"For you I'm only a fox like a hundred"

]

}

因为For you I'm only a fox like a hundred thousand other foxes 超过了30个字符串，所以只显示前30个字符

结果2 "fragment_size": "70",

"highlight" : {

"content" : [

"For you I'm only a fox like a hundred thousand other foxes."

]

}

超过了当前的一句话，但是不够下局，所以只显示当前一句。

结果2 "fragment_size": "130",

"highlight" : {

"content" : [

"For you I'm only a fox like a hundred thousand other foxes. But if you tame me, we'll need each other."

]

}

因为130 也够下面一句话的长度，所以下句话也显示了出来。

结果2 "fragment_size": "1"

"highlight" : {

"content" : [

"I'm only a fox",

"like a hundred"

]

}

如果长度太小一句中匹配的词语也会被成2个。

highlight_query:

突出显示搜索查询以外的查询的匹配项。如果您使用rescore查询，这尤其有用，因为默认情况下突出显示不会考虑这些查询。

注意
Elasticsearch不会以任何方式验证highlight_query是否包含搜索查询，因此可以对其进行定义，以便不突出显示合法查询结果。通常，您应该将搜索查询作为highlight_query的一部分包含在内。

GET test_index/_search

{

"query": {

"match_phrase": {"content" : "fox like"}

"highlight": {

"type" : "unified",

"number_of_fragments" : 3,

"fragmenter": "span",

"fragment_size": "100",

"fields": {

"content": {

"highlight_query": {