ElasticSearch从入门到放弃（四） -- 聚合【基于官方文档7.5】

最新推荐文章于 2021-07-12 22:49:02 发布

疯狂学习的白菜

最新推荐文章于 2021-07-12 22:49:02 发布

阅读量257

点赞数

分类专栏： ElasticSearch

本文链接：https://blog.csdn.net/xcvbxv01/article/details/103537636

版权

ElasticSearch 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

点击查看原文（包含源码和图片）：http://note.youdao.com/noteshare?id=06d431f9eab9bec860f12b96a2590500&sub=302E360D41E04B4C99570C983B8906B0

一、平均聚合

1.一个单值度量聚合，计算从聚合文档中提取的数值的平均值。这些值可以从文档中的特定数字字段提取，也可以由提供的脚本生成

（1）简单的求平均工资

GET bank/_search?size=0 { "aggs": { "avg_grade": { "avg": { "field": "balance" } } } }

上面的汇总计算所有文档的平均工资。聚合类型是avg，字段设置定义了将计算平均值的文档的数字字段

2.Script

（1）普通的脚本：这将把脚本参数解释为内联脚本，使用简单的脚本语言，没有脚本参数

GET bank/_search?size=0 { "aggs": { "avg_age": { "avg": { "script": { "source": "doc.balance.value + 2" } } } } }

（2）使用存储好的脚本

GET bank/_search?size=0 { "aggs": { "avg_change_age": { "avg": { "field": "age", "script": { "lang": "painless", "source": "_value + params.correction11", "params":{ "correction11":2 } } } } } }

(3) 缺少参数定义了如何处理缺少值的文档。默认情况下，它们将被忽略，但也可以将它们视为具有值

GET bank/_search?size=0 { "aggs" : { "age_avg" : { "avg" : { "field" : "age", "missing": 10 } } } }

在age字段中没有值的文档将与值为10的文档处于相同的存储区(即没有值，就视为10)

二、加权平均聚合

加权平均数对每个数据点的权重不同。每个数据点对最终值的贡献是从文档中提取出来的，或者由脚本提供。

一个普通的平均值可以被认为是一个加权平均值，其中每个值的隐含权值为1

value	提供值的字段或脚本的配置	Required
weight	提供权重的字段或脚本的配置	Required
format	提供权重的字段或脚本的配置	Optional
value_type	关于纯脚本或未映射字段值的提示	Optional

例1：如果我们的文档有一个包含0-100数字分数的“grade”字段和一个包含任意数字权重的“weight”字段，那么我们可以使用下面的方法计算加权平均值

POST /exams/_search { "size": 0, "aggs" : { "weighted_grade": { "weighted_avg": { "value": { "field": "grade" }, "weight": { "field": "weight" } } } } }

虽然每个字段允许多个值，但是只允许一个权重。如果聚合遇到一个文档有多个权值(例如，权值字段是一个多值字段)，它将抛出一个异常。如果出现这种情况，您将需要为weight字段指定一个脚本，并使用该脚本将多个值组合成一个要使用的值。

例2：这个例子展示了如何用一个单一的权重来平均一个具有多个值的文档

这三个值(1,2,3)将被作为独立的值包含，其权值均为2

POST /exams/_doc?refresh { "grade": [1, 2, 3], "weight": 2 } POST /exams/_search { "size": 0, "aggs" : { "weighted_grade": { "weighted_avg": { "value": { "field": "grade" }, "weight": { "field": "weight" } } } } }

结果返回2.0，它与我们手工计算时所期望的结果相匹配:(1*2)+ (2*2)+ (3*2))/ (2+2+2)== 2

值和权重都可以从脚本而不是字段派生:

例1：下面将使用脚本为文档中的等级和权重+1

POST /exams/_search { "size": 0, "aggs" : { "weighted_grade": { "weighted_avg": { "value": { "script": "doc.grade.value + 1" }, "weight": { "script": "doc.weight.value + 1" } } } } }

缺省（Missing Value）

缺省定义了如何处理缺少值的文档。值和权重的默认行为不同

默认情况下，如果缺少值字段，则忽略文档，聚合将转移到下一个文档。如果缺少权值字段，则假定其权值为1(与正常平均值一样)。

这两个默认值都可以用缺省覆盖

POST /exams/_search { "size": 0, "aggs" : { "weighted_grade": { "weighted_avg": { "value": { "field": "grade", "missing": 2 }, "weight": { "field": "weight", "missing": 3 } } } } }

三、拓展数据聚合

一种多值度量聚合，用于计算从聚合文档中提取的数值上的统计信息。这些值可以从文档中的特定数字字段提取，也可以由提供的脚本生成。

1.假设数据由代表学生考试成绩(0到100)的文档组成

GET /exams/_search { "size": 0, "aggs" : { "grades_stats" : { "extended_stats" : { "field" : "grade" } } } }

上面的汇总计算所有文档的等级统计。聚集类型是extended_stats，字段设置定义了将对其进行统计的文档的数字字段。以上将返回以下内容

{ ... "aggregations": { "grades_stats": { "count": 2, "min": 50.0, "max": 100.0, "avg": 75.0, "sum": 150.0, "sum_of_squares": 12500.0, "variance": 625.0, "std_deviation": 25.0, "std_deviation_bounds": { "upper": 125.0, "lower": 25.0 } } } }

聚合的名称(上面的grades_stats)也用作从返回的响应中检索聚合结果的键。

四、地理边界聚合

计算字段的所有geo_point值的边界框的度量聚合

PUT /museums { "mappings": { "properties": { "location": { "type": "geo_point" } } } } POST /museums/_bulk?refresh {"index":{"_id":1}} {"location": "52.374081,4.912350", "name": "NEMO Science Museum"} {"index":{"_id":2}} {"location": "52.369219,4.901618", "name": "Museum Het Rembrandthuis"} {"index":{"_id":3}} {"location": "52.371667,4.914722", "name": "Nederlands Scheepvaartmuseum"} {"index":{"_id":4}} {"location": "51.222900,4.405200", "name": "Letterenhuis"} {"index":{"_id":5}} {"location": "48.861111,2.336389", "name": "Musée du Louvre"} {"index":{"_id":6}} {"location": "48.860000,2.327000", "name": "Musée d'Orsay"} POST /museums/_search?size=0 { "query" : { "match" : { "name" : "musée" } }, "aggs" : { "viewport" : { "geo_bounds" : { "field" : "location", "wrap_longitude" : true } } } }

geo_bounds聚合指定用于获取边界的字段

wrap_longitude是一个可选参数，它指定是否允许边界框与国际日期行重叠。默认值为true

{ ... "aggregations": { "viewport": { "bounds": { "top_left": { "lat": 48.86111099738628, "lon": 2.3269999679178 }, "bottom_right": { "lat": 48.85999997612089, "lon": 2.3363889567553997 } } } } }

五、地理中心聚合

一种度量聚合，它从一个地理点场的所有坐标值计算加权质心。

PUT /museums { "mappings": { "properties": { "location": { "type": "geo_point" } } } } POST /museums/_bulk?refresh {"index":{"_id":1}} {"location": "52.374081,4.912350", "city": "Amsterdam", "name": "NEMO Science Museum"} {"index":{"_id":2}} {"location": "52.369219,4.901618", "city": "Amsterdam", "name": "Museum Het Rembrandthuis"} {"index":{"_id":3}} {"location": "52.371667,4.914722", "city": "Amsterdam", "name": "Nederlands Scheepvaartmuseum"} {"index":{"_id":4}} {"location": "51.222900,4.405200", "city": "Antwerp", "name": "Letterenhuis"} {"index":{"_id":5}} {"location": "48.861111,2.336389", "city": "Paris", "name": "Musée du Louvre"} {"index":{"_id":6}} {"location": "48.860000,2.327000", "city": "Paris", "name": "Musée d'Orsay"} POST /museums/_search?size=0 { "aggs" : { "centroid" : { "geo_centroid" : { "field" : "location" } } } }

geo_centroid聚合指定用于计算质心的字段。(注意:字段必须是地理点类型)

{ ... "aggregations": { "centroid": { "location": { "lat": 51.00982965203002, "lon": 3.9662131341174245 }, "count": 6 } } }

当geo_centroid聚合作为子聚合与其他bucket聚合组合时，其更有趣

POST /museums/_search?size=0 { "aggs" : { "cities" : { "terms" : { "field" : "city.keyword" }, "aggs" : { "centroid" : { "geo_centroid" : { "field" : "location" } } } } } }

{ ... "aggregations": { "cities": { "sum_other_doc_count": 0, "doc_count_error_upper_bound": 0, "buckets": [ { "key": "Amsterdam", "doc_count": 3, "centroid": { "location": { "lat": 52.371655656024814, "lon": 4.909563297405839 }, "count": 3 } }, { "key": "Paris", "doc_count": 2, "centroid": { "location": { "lat": 48.86055548675358, "lon": 2.3316944623366 }, "count": 2 } }, { "key": "Antwerp", "doc_count": 1, "centroid": { "location": { "lat": 51.22289997059852, "lon": 4.40519998781383 }, "count": 1 } } ] } } }

六、最大值聚合

一种单值度量聚合，它跟踪并返回从聚合文档中提取的数值的最大值。这些值可以从文档中的特定数字字段提取，也可以由提供的脚本生成。

例：计算所有文档的最大价格值

POST /sales/_search?size=0 { "aggs" : { "max_price" : { "max" : { "field" : "price" } } } }

{ ... "aggregations": { "max_price": { "value": 200.0 } } }

可以看到，聚合的名称(上面的max_price)也用作从返回的响应中检索聚合结果的键。

七、最小值聚合

POST /sales/_search?size=0 { "aggs" : { "min_price" : { "min" : { "field" : "price" } } } } Response: { ... "aggregations": { "min_price": { "value": 10.0 } } }

八、脚本度量聚合

使用脚本执行的度量聚合，以提供度量输出。

POST ledger/_search?size=0 { "query" : { "match_all" : {} }, "aggs": { "profit": { "scripted_metric": { "init_script" : "state.transactions = []", "map_script" : "state.transactions.add(doc.type.value == 'sale' ? doc.amount.value : -1 * doc.amount.value)", "combine_script" : "double profit = 0; for (t in state.transactions) { profit += t } return profit", "reduce_script" : "double profit = 0; for (a in states) { profit += a } return profit" } } } }

init_script是一个可选参数，其他所有脚本都是必需的。

上面的聚合演示了如何使用脚本聚合计算销售和成本交易的总利润。

{ "took": 218, ... "aggregations": { "profit": { "value": 240.0 } } }

init_script

在收集任何文件之前执行。允许聚合设置任何初始状态。

在上面的例子中，init_script在state对象中创建一个数组事务。

map_script

每个收集的文档执行一次。这是一个必需的脚本。

如果没有指定combine_script，则需要将结果状态存储在state对象中。

在上面的例子中，map_script检查type字段的值。如果价值是销售金额的价值

combine_script

文档收集完成后，在每个碎片上执行一次。这是一个必需的脚本。允许聚合合并从每个碎片返回的状态。

在上面的例子中，combine_script遍历所有存储的事务，将值累加到profit变量中，最后返回profit。

reduce_script

在所有碎片返回结果后，在协调节点上执行一次。这是一个必需的脚本。脚本可以访问一个变量状态，该变量状态是每个碎片上的combine_script结果的数组。

例：您将以下文档编入一个包含两个切片（A/B）的索引中:

PUT /transactions/_bulk?refresh {"index":{"_id":1}} {"type": "sale","amount": 80} {"index":{"_id":2}} {"type": "cost","amount": 10} {"index":{"_id":3}} {"type": "cost","amount": 30} {"index":{"_id":4}} {"type": "sale","amount": 130}

假设文档1和3最终在shard A上，文档2和4最终在shard b上

开始流程：

before init_script(初始化状态)

"state" : {}

after init_script：在 A B 两个分片上都有一个状态容器

Shard A "state" : { "transactions" : [] } Shard B "state" : { "transactions" : [] }

after map_script : 在每个分片上执行脚本，非sale 的变为负数

Shard A "state" : { "transactions" : [ 80, -30 ] } Shard B "state" : { "transactions" : [ -10, 130 ] }

执行聚合函数：80 + （-30） / -10 + 130

Shard A 50 Shard B 120

执行reduce函数： 50 + 120

最终返回

{ ... "aggregations": { "profit": { "value": 170 } } }

九、SUM 聚合

一个单值度量聚合，它汇总求和从聚合文档中提取的数值。这些值可以从文档中的特定数字字段提取，也可以由提供的脚本生成

POST /bank/_search?size=0 { "query":{ "match_all": {} }, "aggs":{ "sum_1":{ "sum": { "field": "age" } } } }

{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1000, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "sum_1" : { "value" : 30171.0 } } }

十、top-hits聚合

在下面的示例中，我们将销售按类型分组，并按类型显示最后一次销售。对于每个销售，源中只包含日期和价格字段

POST /sales/_search?size=0 { "aggs": { "top_tags": { "terms": { "field": "type", "size": 3 }, "aggs": { "top_sales_hits": { "top_hits": { "sort": [ { "date": { "order": "desc" } } ], "_source": { "includes": [ "date", "price" ] }, "size" : 1 } } } } } }

{ ... "aggregations": { "top_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "hat", "doc_count": 3, "top_sales_hits": { "hits": { "total" : { "value": 3, "relation": "eq" }, "max_score": null, "hits": [ { "_index": "sales", "_type": "_doc", "_id": "AVnNBmauCQpcRyxw6ChK", "_source": { "date": "2015/03/01 00:00:00", "price": 200 }, "sort": [ 1425168000000 ], "_score": null } ] } } }, { "key": "t-shirt", "doc_count": 3, "top_sales_hits": { "hits": { "total" : { "value": 3, "relation": "eq" }, "max_score": null, "hits": [ { "_index": "sales", "_type": "_doc", "_id": "AVnNBmauCQpcRyxw6ChL", "_source": { "date": "2015/03/01 00:00:00", "price": 175 }, "sort": [ 1425168000000 ], "_score": null } ] } } }, { "key": "bag", "doc_count": 1, "top_sales_hits": { "hits": { "total" : { "value": 1, "relation": "eq" }, "max_score": null, "hits": [ { "_index": "sales", "_type": "_doc", "_id": "AVnNBmatCQpcRyxw6ChH", "_source": { "date": "2015/01/01 00:00:00", "price": 150 }, "sort": [ 1420070400000 ], "_score": null } ] } } } ] } } }

疯狂学习的白菜

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
ElasticSearch从入门到放弃（四） -- 聚合【基于官方文档7.5】

点击查看原文（包含源码和图片）：http://note.youdao.com/noteshare?id=06d431f9eab9bec860f12b96a2590500&sub=302E360D41E04B4C99570C983B8906B0一、平均聚合1.一个单值度量聚合，计算从聚合文档中提取的数值的平均值。这些值可以从文档中的特定数字字段提取，也可以由提供的脚本生成...
复制链接

扫一扫

专栏目录