翻译文档-elasticsearch-7.13.*-search-aggregations

最新推荐文章于 2023-11-25 09:49:14 发布

东风不来

最新推荐文章于 2023-11-25 09:49:14 发布

阅读量164

点赞数

分类专栏：大数据文章标签： elasticsearch 翻译

本文链接：https://blog.csdn.net/u010953706/article/details/119703999

版权

大数据专栏收录该内容

12 篇文章 0 订阅

订阅专栏

https://www.elastic.co/guide/en/elasticsearch/reference/7.13/search-aggregations.html
aggregation将统计数据的metrics、statistics或其他纬度。aggregation帮助您回答以下问题:
* 我的网站的平均加载时间是什么?
* 基于交易量,谁是我最有价值的客户?
* 在我的网络中，什么会被认为是一个大文件?
* 每个产品类别有多少产品?

Elasticsearch提供的聚合操作分为三个类别:
* Metric aggregation：总和、平均值
* Bucket aggregation: 也称为垃圾箱,根据field的值、分数或其他条件，将document 分到不同的桶
* pipeline aggretation:输入来自其他聚合,而不是field或document。

执行一个聚合操作：
聚合参数在查询api的aggs中定义，以下搜索条件执行一个terms aggregation 在字段my-field上。

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "my-field"
      }
    }
  }
}
'

返回结果：

{
  "took": 78,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [...]
  },
  "aggregations": {                                (1)
    "my-agg-name": {                           
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": []
    }
  }
}

(1)：my-agg-name的结果

改变聚合的数据范围：
使用查询参数限制一个聚合的document范围：

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-1d/d",
        "lt": "now/d"
      }
    }
  },
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "my-field"
      }
    }
  }
}
'

只返回聚合结果:
默认情况下,搜索结果包含搜索结果和聚合结果。只返回聚合结果,设置大小为0:

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "my-field"
      }
    }
  }
}
'

执行多个聚合操作：
您可以指定多个聚合在同一个请求：

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "my-first-agg-name": {
      "terms": {
        "field": "my-field"
      }
    },
    "my-second-agg-name": {
      "avg": {
        "field": "my-other-field"
      }
    }
  }
}
'

运行sub-aggregations：
bucket aggregation支持bucket或metric sub-aggregations。例如,一个terms aggregation 带有一个 avg sub-aggregation，用于计算每个桶中document的平均值。子聚合没有水平或嵌套深度的限制。

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "my-field"
      },
      "aggs": {
        "my-sub-agg-name": {
          "avg": {
            "field": "my-other-field"
          }
        }
      }
    }
  }
}
'

以上，桶聚合支持对每个桶再进行分桶、对每个桶进行度量统计；也说明进行过度量统计后，不能再使用子聚合。
sub-aggregation的结果在父聚合中

{
  ...
  "aggregations": {
    "my-agg-name": {                          (1)                   
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "foo",
          "doc_count": 5,
          "my-sub-agg-name": {                (2)                 
            "value": 75.0
          }
        }
      ]
    }
  }
}

(1):聚合的结果my-agg-name
(2):对my-agg-name进行子聚合的结果my-sub-agg-nam
添加自定义元数据:
在一个聚合中，利用meta对象添加自定义元素

curl -X GET "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "my-agg-name": {
      "terms": {
        "field": "my-field"
      },
      "meta": {
        "my-metadata-field": "foo"
      }
    }
  }
}
'

meta对象在返回结果中的位置：

{
  ...
  "aggregations": {
    "my-agg-name": {
      "meta": {
        "my-metadata-field": "foo"
      },
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": []
    }
  }
}

返回聚合类型：
默认情况，聚合结果包含聚合的名称，不包含聚合的类型。使用type_keys查询参数可以返回聚合类型。

curl -X GET "localhost:9200/my-index-000001/_search?typed_keys&pretty" -H 'Content-Type: application/json' -d'
{
  "aggs": {
    "my-agg-name": {
      "histogram": {
        "field": "my-field",
        "interval": 1000
      }
    }
  }
}
'

typed_keys在url请求中，返回结果中，聚合类型作为聚合名称的前缀,中间通过#号分隔。

{
  ...
  "aggregations": {
    "histogram#my-agg-name": {                 
      "buckets": []
    }
  }
}

ps:一些聚合操作根据不同的请求返回不同的聚合类型。
例如terms、significant terms、percentiles 根据聚合字段的不同类型返回不同的聚合类型。

在聚合操作中使用脚本：
当一个field不满足需要的聚合操作时，可以使用runtime field 作为聚合字段。

curl -X GET "localhost:9200/my-index-000001/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
  "runtime_mappings": {
    "message.length": {
      "type": "long",
      "script": "emit(doc[\u0027message.keyword\u0027].value.length())"
    }
  },
  "aggs": {
    "message_length": {
      "histogram": {
        "interval": 10,
        "field": "message.length"
      }
    }
  }
}
'

脚本动态地计算字段值，需要增加一点开销。
一些聚合像terms 、filters，在使用runtime fields时不能被优化，需要花费计算时间。
In total, performance costs for using a runtime field varies from aggregation to aggregation【其他，总的来说,使用一个runtime field性能成本与聚合后再聚合不同。】

聚合缓存：

为了更快的响应，Elasticsearch 在 shard-request-cache上缓存经常执行的聚合结果。
To get cached results, use the same preference string for each search.
If you don’t need search hits, set size to 0 to avoid filling the cache.【如果不需要使用聚合缓存，设置size = 0 去避免用缓存填充结果】

Elasticsearch routes searches with the same preference string to the same shards. If the shards’ data doesn’t change between searches, the shards return cached aggregation results.

长度限制
当执行聚合时,Elasticsearch使用double来保存和表示数字。因此,聚合结果的长度大于2^53.