Elasticsearch——查询结果排序

最新推荐文章于 2024-06-16 19:01:48 发布

java编程小帅

最新推荐文章于 2024-06-16 19:01:48 发布

阅读量2.9k

点赞数

分类专栏： Elasticsearch 文章标签： elasticsearch java 数据库大数据搜索引擎

本文链接：https://blog.csdn.net/qq_37107851/article/details/120710269

版权

Elasticsearch 专栏收录该内容

42 篇文章 2 订阅

订阅专栏

允许我们在特定字段上添加一个或多个排序。每种分类也可以颠倒。排序是在每个字段级别定义的，具有特殊的字段名，用于_score按分数排序，以及_doc按索引顺序排序。

假设以下索引映射：

PUT /my-index-000001
{
  "mappings": {
    "properties": {
      "post_date": { "type": "date" },
      "user": {
        "type": "keyword"
      },
      "name": {
        "type": "keyword"
      },
      "age": { "type": "integer" }
    }
  }
}

GET /my-index-000001/_search
{
  "sort" : [
    { "post_date" : {"order" : "asc", "format": "strict_date_optional_time_nanos"}},
    "user",
    { "name" : "desc" },
    { "age" : "desc" },
    "_score"
  ],
  "query" : {
    "term" : { "user" : "kimchy" }
  }
}

_doc除了是最有效的排序顺序之外，没有真正的用例。因此，如果我们不关心文档的返回顺序，那么我们应该按文档排序。这在滚动时特别有用。

Sort Values

搜索响应包括每个文档的排序值。使用format参数为日期和日期字段的排序值指定日期格式。以下搜索将以strict_date_optional_time_nanos格式为post_date字段返回排序值。

GET /my-index-000001/_search
{
  "sort" : [
    { "post_date" : {"format": "strict_date_optional_time_nanos"}}
  ],
  "query" : {
    "term" : { "user" : "kimchy" }
  }
}

Sort Order

“order”选项可以具有以下值：

asc	按升序排序
desc	按降序排序

在对_score进行排序时，顺序默认为desc，在对其他任何内容进行排序时，顺序默认为asc。

Sort mode option

Elasticsearch支持按数组或多值字段排序。mode选项控制为对所属文档进行排序而选择的数组值。模式选项可以具有以下值：

min	选择最低的值。
max	选择最高的值。
sum	使用所有值的总和作为排序值。仅适用于基于数字的数组字段。
avg	使用所有值的平均值作为排序值。仅适用于基于数字的数组字段。
median	使用所有值的中值作为排序值。仅适用于基于数字的数组字段。

升序排序中的默认排序模式为min — 将拾取最低值。按降序排列的默认排序模式为max — 将拾取最大值。

排序模式示例用法

在下面的示例中，“price”字段对每个文档具有多个价格。在这种情况下，结果点击将根据每个文档的平均价格按价格升序排序。

PUT /my-index-000001/_doc/1?refresh
{
   "product": "chocolate",
   "price": [20, 4]
}

POST /_search
{
   "query" : {
      "term" : { "product" : "chocolate" }
   },
   "sort" : [
      {"price" : {"order" : "asc", "mode" : "avg"}}
   ]
}

数字字段排序

对于数值字段，也可以使用“numeric_type”选项将值从一种类型转换为另一种类型。此选项接受以下值：[“double”、“long”、“date”、“date_nanos”]，对于在排序字段映射不同的多个数据流或索引中进行搜索非常有用。

例如，考虑这两个indices：

PUT /index_double
{
  "mappings": {
    "properties": {
      "field": { "type": "double" }
    }
  }
}

PUT /index_long
{
  "mappings": {
    "properties": {
      "field": { "type": "long" }
    }
  }
}

由于字段在第一个索引中映射为double，在第二个索引中映射为long，因此无法使用此字段对默认情况下查询两个索引的请求进行排序。但是，我们可以使用numeric_type选项将类型强制为一个或另一个，以便强制所有索引的特定类型：

POST /index_long,index_double/_search
{
   "sort" : [
      {
        "field" : {
            "numeric_type" : "double"
        }
      }
   ]
}

在上面的示例中，index_long索引的值被强制转换为double，以便与index_double索引生成的值兼容。也可以将浮点字段转换为long，但请注意，在这种情况下，浮点将替换为小于或等于参数的最大值（如果值为负数，则大于或等于），并等于数学整数。

此选项还可用于将使用毫秒分辨率的日期字段转换为具有纳秒分辨率的日期字段。例如，考虑这两个indices：

PUT /index_double
{
  "mappings": {
    "properties": {
      "field": { "type": "date" }
    }
  }
}

PUT /index_long
{
  "mappings": {
    "properties": {
      "field": { "type": "date_nanos" }
    }
  }
}

这些索引中的值以不同的分辨率存储，因此对这些字段进行排序时，总是会对date_nanos前的date进行排序（升序）。使用“numeric_type”选项可以为排序设置单个分辨率，设置为“date”将date_nanos转换为毫秒分辨率，而“date_nanos”将date字段中的值转换为纳秒分辨率：

POST /index_long,index_double/_search
{
   "sort" : [
      {
        "field" : {
            "numeric_type" : "date_nanos"
        }
      }
   ]
}

为避免溢出，不能将date_nanos转换为1970年之前和2262年之后的日期，因为纳秒表示为longs。

Sorting within nested objects

Elasticsearch还支持按一个或多个nested对象内的字段排序。“按nested字段排序”支持具有具有以下属性的嵌套排序选项：

path :定义要排序的nested对象。实际排序字段必须是此nested对象内的直接字段。按嵌套字段排序时，此字段为必填字段。

filter :嵌套路径中的内部对象应与之匹配的筛选器，以便通过排序考虑其字段值。常见情况是在嵌套筛选器或查询中重复查询/筛选器。默认情况下，没有嵌套的过滤器处于活动状态。

max_children :在选择排序值时要考虑每个root文档的最大数量的children。默认为无限。:

nested : 与顶级nested路径相同，但适用于当前nested对象中的另一个nested路径。

注意：Elasticsearch 6.1之前的nest排序选项，nested_path和nested_filter选项已被弃用，取而代之的是上面提到的选项。

Nested sorting examples

在下面的示例中，offer是nested类型的字段。需要指定nested路径；否则，Elasticsearch不知道需要捕获哪些nested级别的排序值。

POST /_search
{
   "query" : {
      "term" : { "product" : "chocolate" }
   },
   "sort" : [
       {
          "offer.price" : {
             "mode" :  "avg",
             "order" : "asc",
             "nested": {
                "path": "offer",
                "filter": {
                   "term" : { "offer.color" : "blue" }
                }
             }
          }
       }
    ]
}

在下面的示例中，父字段和子字段的类型为nested。需要在每个级别指定nested_path；否则，Elasticsearch不知道需要捕获哪些nested级别的排序值。

POST /_search
{
   "query": {
      "nested": {
         "path": "parent",
         "query": {
            "bool": {
                "must": {"range": {"parent.age": {"gte": 21}}},
                "filter": {
                    "nested": {
                        "path": "parent.child",
                        "query": {"match": {"parent.child.name": "matt"}}
                    }
                }
            }
         }
      }
   },
   "sort" : [
      {
         "parent.child.age" : {
            "mode" :  "min",
            "order" : "asc",
            "nested": {
               "path": "parent",
               "filter": {
                  "range": {"parent.age": {"gte": 21}}
               },
               "nested": {
                  "path": "parent.child",
                  "filter": {
                     "match": {"parent.child.name": "matt"}
                  }
               }
            }
         }
      }
   ]
}

按脚本(scripts)排序和按地理距离(geo distance)排序时，也支持nested排序。

Missing Values

missing参数指定应如何处理缺少排序字段的文档：可以将缺少的值设置为_last、_first或自定义值（将作为排序值用于缺少的文档）。默认值为_last。

例子：

GET /_search
{
  "sort" : [
    { "price" : {"missing" : "_last"} }
  ],
  "query" : {
    "term" : { "product" : "chocolate" }
  }
}

如果nested的内部对象与nested_filter不匹配，则使用缺少的值。

忽略未映射的字段

默认情况下，如果没有与字段关联的映射，则搜索请求将失败。unmapped_type选项允许我们忽略没有映射的字段，并且不按它们排序。此参数的值用于确定要发出的排序值。以下是如何使用它的示例：

GET /_search
{
  "sort" : [
    { "price" : {"unmapped_type" : "long"} }
  ],
  "query" : {
    "term" : { "product" : "chocolate" }
  }
}

如果查询的任何索引没有price映射，那么Elasticsearch将处理它，就像处理long类型的映射一样，该索引中的所有文档都没有该字段的值。

Geo Distance Sorting

允许按地理距离排序。下面是一个示例，假设pin.location是geo_point类型的字段：

GET /_search
{
  "sort" : [
    {
      "_geo_distance" : {
          "pin.location" : [-70, 40],
          "order" : "asc",
          "unit" : "km",
          "mode" : "min",
          "distance_type" : "arc",
          "ignore_unmapped": true
      }
    }
  ],
  "query" : {
    "term" : { "user" : "kimchy" }
  }
}

distance_type:

如何计算距离。可以是arc圆弧（默认），也可以是plain平面（速度更快，但在长距离和靠近极点时不准确）。

mode:

如果一个油田有几个地理点，该怎么办。默认情况下，升序排序时考虑最短距离，降序排序时考虑最长距离。支持的值为min(最小值)、max(最大值)、median(中值)和avg(平均值)。

unit:

计算排序值时使用的单位。默认值为m（米）。

ignore_unmapped:

指示是否应将未映射字段视为缺少的值。将其设置为true相当于在字段排序中指定未映射的类型。默认值为false（未映射字段导致搜索失败）。

注意：地理距离排序不支持可配置的缺失值：当文档没有用于距离计算的字段值时，距离将始终被视为等于无穷大。

提供坐标时支持以下格式：

Lat Lon as Properties

GET /_search
{
  "sort" : [
    {
      "_geo_distance" : {
        "pin.location" : {
          "lat" : 40,
          "lon" : -70
        },
        "order" : "asc",
        "unit" : "km"
      }
    }
  ],
  "query" : {
    "term" : { "user" : "kimchy" }
  }
}

Lat Lon as String

格式为lat，lon。

GET /_search
{
  "sort": [
    {
      "_geo_distance": {
        "pin.location": "40,-70",
        "order": "asc",
        "unit": "km"
      }
    }
  ],
  "query": {
    "term": { "user": "kimchy" }
  }
}

Geohash

GET /_search
{
  "sort": [
    {
      "_geo_distance": {
        "pin.location": "drm3btev3e86",
        "order": "asc",
        "unit": "km"
      }
    }
  ],
  "query": {
    "term": { "user": "kimchy" }
  }
}

Lat Lon as Array

[lon，lat]格式，注意，此处lon/lat的顺序，以符合GeoJSON。

GET /_search
{
  "sort": [
    {
      "_geo_distance": {
        "pin.location": [ -70, 40 ],
        "order": "asc",
        "unit": "km"
      }
    }
  ],
  "query": {
    "term": { "user": "kimchy" }
  }
}

Multiple reference points

例如，可以将多个地理点作为包含任何地理点格式的数组传递

GET /_search
{
  "sort": [
    {
      "_geo_distance": {
        "pin.location": [ [ -70, 40 ], [ -71, 42 ] ],
        "order": "asc",
        "unit": "km"
      }
    }
  ],
  "query": {
    "term": { "user": "kimchy" }
  }
}

文档的最终距离将是文档中包含的所有点到排序请求中给出的所有点的最小/最大/平均（通过模式定义）距离。

Script Based Sorting

允许基于自定义脚本进行排序，以下是一个示例：

GET /_search
{
  "query": {
    "term": { "user": "kimchy" }
  },
  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "lang": "painless",
        "source": "doc['field_name'].value * params.factor",
        "params": {
          "factor": 1.1
        }
      },
      "order": "asc"
    }
  }
}

java编程小帅

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch——查询结果排序

允许我们在特定字段上添加一个或多个排序。每种分类也可以颠倒。排序是在每个字段级别定义的，具有特殊的字段名，用于_score按分数排序，以及_doc按索引顺序排序。假设以下索引映射：PUT /my-index-000001{ "mappings": { "properties": { "post_date": { "type": "date" }, "user": { "type": "keyword" }, "name"
复制链接

扫一扫

专栏目录