文章目录
ElasticSearch Aggregation(八)
管道聚合
cumulative sum聚合
一种父管道聚合,它计算父直方图聚合中指定指标的累积和(当前分桶指标的累积和等于前几个桶指标的累加总和)。指定的指标必须是数值,并且外围的直方图必须将min_doc_count
设置为0
(直方图聚合的默认值)。
语法
cumulative_sum
语法为:
{
"cumulative_sum": {
"buckets_path": "the_sum"
}
}
以下代码段计算每月总销售额的累计总和:
curl -X POST "localhost:9200/sales/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
},
"cumulative_sales": {
"cumulative_sum": {
"buckets_path": "sales"
}
}
}
}
}
}
'
响应:
{
"took": 11,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550.0
},
"cumulative_sales": {
"value": 550.0
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60.0
},
"cumulative_sales": {
"value": 610.0
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375.0
},
"cumulative_sales": {
"value": 985.0
}
}
]
}
}
}
通过以上例子中2015/03/01 00:00:00
桶中的累积和聚合为例。他的累积和等会550.0+60.0+375.0=985.0
。
Derivative 聚合
衍生聚合。略
extended stats bucket聚合
同级管道聚合,它计算同级聚合中指定指标的所有存储桶的各种统计信息。指定的指标必须是数字,同级聚合必须是多桶聚合。
语法
extended_stats_bucket
聚合语法为:
{
"extended_stats_bucket": {
"buckets_path": "the_sum"
}
}
以下代码段计算每月销售桶的扩展统计数据:
curl -X POST "localhost:9200/sales/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"stats_monthly_sales": {
"extended_stats_bucket": {
"buckets_path": "sales_per_month>sales"
}
}
}
}
'
响应:
{
"took": 11,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550.0
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60.0
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375.0
}
}
]
},
"stats_monthly_sales": {
"count": 3,
"min": 60.0,
"max": 550.0,
"avg": 328.3333333333333,
"sum": 985.0,
"sum_of_squares": 446725.0,
"variance": 41105.55555555556,
"variance_population": 41105.55555555556,
"variance_sampling": 61658.33333333334,
"std_deviation": 202.74505063146563,
"std_deviation_population": 202.74505063146563,
"std_deviation_sampling": 248.3109609609156,
"std_deviation_bounds": {
"upper": 733.8234345962646,
"lower": -77.15676792959795,
"upper_population" : 733.8234345962646,
"lower_population" : -77.15676792959795,
"upper_sampling" : 824.9552552551645,
"lower_sampling" : -168.28858858849787
}
}
}
}
最大桶聚合
一个同级管道聚合,它用同级聚合中指定指标的最大值来标识桶,并输出桶的值和键。指定的指标必须是数值,并且同级聚合必须是多桶聚合。
语法
max_bucket
语法如下:
{
"max_bucket": {
"buckets_path": "the_sum"
}
}
以下代码段计算每月总销售额的最大值:
curl -X POST "localhost:9200/sales/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"max_monthly_sales": {
"max_bucket": {
"buckets_path": "sales_per_month>sales"
}
}
}
}
'
以下可能是响应:
{
"took": 11,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550.0
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60.0
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375.0
}
}
]
},
"max_monthly_sales": {
"keys": ["2015/01/01 00:00:00"],
"value": 550.0
}
}
}
最小桶聚合
与最大桶聚合类似
moving function聚合
给定一系列有序的数据,Moving Function
聚合将在数据之间生成一个滑动窗口,并允许用户指定在每个数据窗口上执行自定义脚本。为了方便起见,预定义了一些常用函数,如最小/最大值、移动平均值等。
语法
moving_fn
聚合语法如下:
{
"moving_fn": {
"buckets_path": "the_sum",
"window": 10,
"script": "MovingFunctions.min(values)"
}
}
moving_fn
聚合必须嵌入到直方图或 date_histogram
聚合中。它们可以像任何其他指标聚合一样嵌入:
curl -X POST "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"my_date_histo": {
"date_histogram": {
"field": "date",
"calendar_interval": "1M"
},
"aggs": {
"the_sum": {
"sum": { "field": "price" }
},
"the_movfn": {
"moving_fn": {
"buckets_path": "the_sum",
"window": 10,
"script": "MovingFunctions.unweightedAvg(values)"
}
}
}
}
}
}
'
移动平均通过第一个指定的histogram
或者date_histogram
上的字段来构建。然后,您可以选择在该直方图中添加数字指标,例如总和。最后,moving_fn 被嵌入到直方图中。然后buckets_path
参数用于“指向”直方图中的同级指标之一。
来自上述聚合的示例响应可能如下所示:
{
"took": 11,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"my_date_histo": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"the_sum": {
"value": 550.0
},
"the_movfn": {
"value": null
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"the_sum": {
"value": 60.0
},
"the_movfn": {
"value": 550.0
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"the_sum": {
"value": 375.0
},
"the_movfn": {
"value": 305.0
}
}
]
}
}
}
自定义用户脚本
移动函数聚合允许用户指定任意脚本来定义自定义逻辑。每次收集新的数据窗口时都会调用该脚本。这些值在values
变量中提供给脚本。然后,脚本应该执行某种计算,并生成一个double
作为结果。不允许发出null,尽管NaN
和+/- Inf
是允许的。
例如,此脚本将简单地返回窗口中的第一个值,如果没有可用值,则返回 NaN
:
curl -X POST "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"my_date_histo": {
"date_histogram": {
"field": "date",
"calendar_interval": "1M"
},
"aggs": {
"the_sum": {
"sum": { "field": "price" }
},
"the_movavg": {
"moving_fn": {
"buckets_path": "the_sum",
"window": 10,
"script": "return values.length > 0 ? values[0] : Double.NaN"
}
}
}
}
}
}
'
内置函数
max()
min()
sum()
stdDev()
unweightedAvg()
linearWeightedAvg()
ewma()
holt()
holtWinters()
这些函数可从 MovingFunctions
命名空间获得。例如。MovingFunctions.max()
Moving percentiles聚合
给定一系列有序的百分位数,移动百分位数聚合将在这些百分位数上滑动一个窗口,并允许用户计算累积百分位数
这在概念上与移动函数管道聚合非常相似,不同之处在于它适用于百分位数草图而不是实际的桶值。
语法
move_percentiles
聚合看起来像这样:
{
"moving_percentiles": {
"buckets_path": "the_percentile",
"window": 10
}
}
move_percentiles
聚合必须嵌入到 histogram
或 date_histogram
聚合中。它们可以像任何其他指标聚合一样嵌入:
curl -X POST "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"my_date_histo": {
"date_histogram": {
"field": "date",
"calendar_interval": "1M"
},
"aggs": {
"the_percentile": {
"percentiles": {
"field": "price",
"percents": [ 1.0, 99.0 ]
}
},
"the_movperc": {
"moving_percentiles": {
"buckets_path": "the_percentile",
"window": 10
}
}
}
}
}
}
'
以下可能是响应:
{
"took": 11,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"my_date_histo": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"the_percentile": {
"values": {
"1.0": 150.0,
"99.0": 200.0
}
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"the_percentile": {
"values": {
"1.0": 10.0,
"99.0": 50.0
}
},
"the_movperc": {
"values": {
"1.0": 150.0,
"99.0": 200.0
}
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"the_percentile": {
"values": {
"1.0": 175.0,
"99.0": 200.0
}
},
"the_movperc": {
"values": {
"1.0": 10.0,
"99.0": 200.0
}
}
}
]
}
}
}
百分位桶聚合
同级管道聚合,计算同级聚合中指定指标的所有bucket的百分比。指定的指标必须是数值,并且同级聚合必须是多桶聚合。
语法
percentiles_bucket` 聚合看起来像这样:
{
"percentiles_bucket": {
"buckets_path": "the_sum"
}
}
以下代码段计算每月总销售额的百分位数:
curl -X POST "localhost:9200/sales/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"percentiles_monthly_sales": {
"percentiles_bucket": {
"buckets_path": "sales_per_month>sales",
"percents": [ 25.0, 50.0, 75.0 ]
}
}
}
}
'
以下可能是响应:
{
"took": 11,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550.0
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60.0
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375.0
}
}
]
},
"percentiles_monthly_sales": {
"values" : {
"25.0": 375.0,
"50.0": 375.0,
"75.0": 550.0
}
}
}
}
stats bucket聚合
同级管道聚合,它计算同级聚合中指定度量的所有bucket的各种统计信息。指定的度量必须是数值,并且同级聚合必须是多桶聚合。
语法
stats_bucket
聚合如下所示:
{
"stats_bucket": {
"buckets_path": "the_sum"
}
}
以下代码段计算每月销售额的统计数据:
curl -X POST "localhost:9200/sales/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"stats_monthly_sales": {
"stats_bucket": {
"buckets_path": "sales_per_month>sales"
}
}
}
}
'
以下可能是响应:
{
"took": 11,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550.0
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60.0
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375.0
}
}
]
},
"stats_monthly_sales": {
"count": 3,
"min": 60.0,
"max": 550.0,
"avg": 328.3333333333333,
"sum": 985.0
}
}
}
sum bucket聚合
一个同级管道聚合,它计算同级聚合中指定度量的所有桶的总和。指定的度量必须是数值,并且同级聚合必须是多桶聚合。
语法
sum_bucket
聚合看起来像这样:
{
"sum_bucket": {
"buckets_path": "the_sum"
}
}
以下代码段计算所有每月总销售额的总和:
curl -X POST "localhost:9200/sales/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"sum_monthly_sales": {
"sum_bucket": {
"buckets_path": "sales_per_month>sales"
}
}
}
}
'
以下可能是响应:
{
"took": 11,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"sales": {
"value": 550.0
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2,
"sales": {
"value": 60.0
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"sales": {
"value": 375.0
}
}
]
},
"sum_monthly_sales": {
"value": 985.0
}
}
}