ElasticSearch ——(十)聚合

The aggregations framework helps provide aggregated data based on a search query. It is based on simple building blocks called aggregations, that can be composed in order to build complex summaries of the data.

使用聚合操作来对数据进行复杂的汇总查询。

An aggregation can be seen as a unit-of-work that builds analytic information over a set of documents. The context of the execution defines what this document set is (e.g. a top-level aggregation executes within the context of the executed query/filters of the search request).

聚合可以看作是一个在文档集上分析信息的工作单元。

There are many different types of aggregations, each with its own purpose and output. To better understand these types, it is often easier to break them into four main families:

聚合操作有很多不同的类型,每种类型有不同的用户和结果输出。最好是去理解这些类型,通常会把它们分成四大类。

  • Bucketing
  • Metric
  • Matrix
  • Pipeline

Bucketing
A family of aggregations that build buckets, where each bucket is associated(联系、关联) with a key and a document criterion(标准、规则). When the aggregation is executed, all the buckets criteria(标准、规则) are evaluated on every document in the context and when a criterion matches, the document is considered to “fall in” the relevant bucket. By the end of the aggregation process, we’ll end up with(以…结束) a list of buckets - each one with a set of documents that “belong” to it.

一个类别的聚合是构建桶,每个桶都是用一个key和文档规则关联。当聚合被执行,所有的桶规则都会在每个文档上进行计算。当一个规则匹配,这个文档则被认为与相关的桶有关联。在聚合过程的最后,我们将会以一个桶列表(每个桶包含一个与之相关的文档集)结束。

Metric
Aggregations that keep track and compute metrics over a set of documents.

度量聚合

聚合在一个文档集中,保持跟踪和计算指标。

Matrix
A family of aggregations that operate on multiple fields and produce a matrix result based on the values extracted from the requested document fields. Unlike metric and bucket aggregations, this aggregation family does not yet support scripting.

矩阵

Pipeline
Aggregations that aggregate the output of other aggregations and their associated metrics

管道聚合

聚合操作会聚合其他聚合操作的输出和相关的指标。

The interesting part comes next. Since each bucket effectively defines a document set (all documents belonging to the bucket), one can potentially(潜在地,可能地) associate aggregations on the bucket level, and those will execute within the context of that bucket. This is where the real power of aggregations kicks in: aggregations can be nested(嵌套)!

接下来是非常有趣的部分。因为每个bucket都能有效地定义一个文档集(所有的文档都属于这个bucket),一个文档集可能关联一个bucket级别上的聚合,这些将在bucket的上下文上执行。聚合非常强大的一点在于:聚合可能被嵌套!

聚合的结构

"aggregations" : {
    "<aggregation_name>" : {
        "<aggregation_type>" : {
            <aggregation_body>
        }
        [,"meta" : {  [<meta_data_body>] } ]?
        [,"aggregations" : { [<sub_aggregation>]+ } ]?
    }
    [,"<aggregation_name_2>" : { ... } ]*
}

The aggregations object (the key aggs can also be used) in the JSON holds the aggregations to be computed. Each aggregation is associated with a logical name that the user defines (e.g. if the aggregation computes the average price, then it would make sense to name it avg_price). These logical names will also be used to uniquely identify the aggregations in the response. Each aggregation has a specific type (<aggregation_type> in the above snippet) and is typically the first key within the named aggregation body. Each type of aggregation defines its own body, depending on the nature of the aggregation (e.g. an avg aggregation on a specific field will define the field on which the average will be calculated). At the same level of the aggregation type definition, one can optionally define a set of additional aggregations, though this only makes sense if the aggregation you defined is of a bucketing nature. In this scenario, the sub-aggregations you define on the bucketing aggregation level will be computed for all the buckets built by the bucketing aggregation. For example, if you define a set of aggregations under the range aggregation, the sub-aggregations will be computed for the range buckets that are defined.

Values Source

Some aggregations work on values extracted from the aggregated documents. Typically, the values will be extracted from a specific document field which is set using the field key for the aggregations. It is also possible to define a script which will generate the values (per document).

When both field and script settings are configured for the aggregation, the script will be treated as a value script. While normal scripts are evaluated on a document level (i.e. the script has access to all the data associated with the document), value scripts are evaluated on the value level. In this mode, the values are extracted from the configured field and the script is used to apply a “transformation” over these value/s.

Elasticsearch uses the type of the field in the mapping in order to figure out how to run the aggregation and format the response. However there are two cases in which Elasticsearch cannot figure out this information: unmapped fields (for instance in the case of a search request across multiple indices, and only some of them have a mapping for the field) and pure scripts. For those cases, it is possible to give Elasticsearch a hint using the value_type option, which accepts the following values: string, long (works for all integer types), double (works for all decimal types like float or scaled_float), date, ip and boolean.

指标聚合

The aggregations in this family compute metrics based on values extracted in one way or another from the documents that are being aggregated. The values are typically extracted from the fields of the document (using the field data), but can also be generated using scripts.

Numeric metrics aggregations are a special type of metrics aggregation which output numeric values. Some aggregations output a single numeric metric (e.g. avg) and are called single-value numeric metrics aggregation, others generate multiple metrics (e.g. stats) and are called multi-value numeric metrics aggregation. The distinction between single-value and multi-value numeric metrics aggregations plays a role when these aggregations serve as direct sub-aggregations of some bucket aggregations (some bucket aggregations enable you to sort the returned buckets based on the numeric metrics in each bucket).

Avg聚合

A single-value metrics aggregation that computes the average of numeric(数值) values that are extracted from the aggregated documents. These values can be extracted either from specific numeric fields in the documents, or be generated by a provided script.

平均值聚合是一个单值度量聚合,计算从聚合文档中提取出来的数值的平均值。这些数值可能是从文档中特定的数值字段中提取出来,或者是通过script生成的。

Assuming(如果) the data consists of(由…组成) documents representing (表现、体现)exams grades (between 0 and 100) of students we can average their scores with:

如果数据是由文档组成,体现学生的考试分数(从0到100),我们可以通过下列方法,计算分数额平均值:


POST /exams/_search?size=0
{
    "aggs" : {
        "avg_grade" : { "avg" : { "field" : "grade" } }
    }
}


The above aggregation computes the average grade over all documents. The aggregation type is avg and the field setting defines the numeric field of the documents the average will be computed on. The above will return the following:

上面的聚合,计算在所有文档上计算平均分数。聚合的类型是avg,文档的域设置为数值类型,用于计算平均值。上述操作返回结果如下:


{
    ...
    "aggregations": {
        "avg_grade": {
            "value": 75.0
        }
    }
}

The name of the aggregation (avg_grade above) also serves as the key by which the aggregation result can be retrieved from the returned response.

聚合查询

计算某个字段的总和


GET /lib5/item/_search
{
  
  "aggs":{
    "price_of_sum":{
      "sum":{
        "field":"price"
      }
    }
  }
  
}

求字段中的最小值

GET /lib5/item/_search
{
  "size":0,
  "aggs":{
    "price_of_sum":{
      "min":{
        "field":"price"
      }
    }
  }
  
}

求字段的平均数


GET /lib5/item/_search
{
  "size":0,
  "aggs":{
    "price_of_sum":{
      "avg":{
        "field":"price"
      }
    }
  }
  
}

求基数,互不相同的值的个数


GET /lib5/item/_search
{
  "size":0,
  "aggs":{
    "price_of_sum":{
      "cardinality":{
        "field":"price"
      }
    }
  }
  
}

分组

GET /lib5/item/_search
{
  "size":0,
  "aggs":{
    "price_of_group":{
      "terms":{
        "field":"price"
      }
    }
  }
  
}

结果
image

习题:从喜欢唱歌的用户中按年龄分组


GET /lib4/user/_search
{
  "size":0,
  "query": {
    "match": {
      "interest": "唱歌"
    }
  },
  "aggs": {
    "age_of_group": {
      "terms": {"field":"age"}
    }
  }
}

复合查询

image
image

参考文档

ES官网聚合

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值