前言
Elasticsearch除搜索以外,还提供了针对数据统计分析的功能,通过各种API可以构建数据的复杂查询,不同类型的聚合查询都有自己的目的和输出,为了更好的理解这些类型,人们通常又会把它们分为三大类。
聚合类型三大类
Bucketing(桶聚合)
每个桶都与一个键和一个文档标准相关联,通过桶的聚合查询,我们将得到一个桶的列表,即:满足条件的文档集合。
Metric(指标)
计算一组文档的某些指标项的聚合
Pipeline(管道)
对其他聚合的输出或相关指标进行二次聚合
Pipeline aggregations
Average bucket aggregation
一个同级管道聚合,它计算同级聚合中指定指标的平均值。指定的度量必须是数值,并且同级聚合必须是多桶聚合。
先统计每天订单的销售总额,再用sales_per_date的结果,计算每天的平均销售额。
GET /kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"sales_per_date": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
},
"aggs": {
"sales": {
"sum": {
"field": "taxless_total_price"
}
}
}
},
"avg_day_sales": {
"avg_bucket": {
"buckets_path": "sales_per_date>sales",
"gap_policy": "skip",
"format": "#,##0.00;(#,##0.00)"
}
}
}
}
Bucket selector aggregation
通过脚本对父聚合条件进行过滤
先统计每天订单的销售总额,再通过bucket_selector过滤销售总额小于13000的条目。
GET /kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"sales_per_date": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
},
"aggs": {
"total_sales": {
"sum": {
"field": "taxless_total_price"
}
},
"sales_bucket_filter":{
"bucket_selector": {
"buckets_path": {
"totalSales":"total_sales"
},
"script": "params.totalSales > 13000"
}
}
}
}
}
}
Bucket sort aggregation
可以对聚合结果进行排序,支持多个字段排序,每个bucket可以根据它的_key、_count或它的子聚合进行排序,支持from和size参数。
显示销售前5高的天数
GET /kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"sales_per_date": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
},
"aggs": {
"total_sales": {
"sum": {
"field": "taxless_total_price"
}
},
"sales_bucket_sort": {
"bucket_sort": {
"sort": [
{
"total_sales": {
"order": "desc"
}
}
],
"size": 5
}
}
}
}
}
}
Truncating without sorting
还可以使用此聚合来截断结果桶,而不进行任何排序。为此,只需使用from和/或size参数,而不指定sort。
直接查询
GET /kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"sales_per_date": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
}
}
}
}
进行截断
GET /kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"sales_per_date": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
},
"aggs": {
"bucket_truncate": {
"bucket_sort": {
"from": 1,
"size": 1
}
}
}
}
}
}
只返回了1条
Cumulative sum aggregation
一种父管道聚合,它计算父直方图(或date_直方图)聚合中指定度量的累积和。指定的度量必须是数值,并且外围的直方图必须将min_doc_count设置为0(直方图聚合的默认值)。
按月统计每月的销售额,并计算当月的累计和。
GET /kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "taxless_total_price"
}
},
"cumulative_sales": {
"cumulative_sum": {
"buckets_path": "sales"
}
}
}
}
}
}
Derivative aggregation
一种父管道聚合,它在父直方图(或date_直方图)聚合中计算指定度量的派生值。指定的度量必须是数值,并且外围的直方图必须将min_doc_count设置为0(直方图聚合的默认值)。
每天销售额与前一天的进行对比
GET /kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"sales_per_day": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
},
"aggs": {
"sales": {
"sum": {
"field": "taxless_total_price"
}
},
"sales_deriv":{
"derivative": {
"buckets_path": "sales"
}
}
}
}
}
}
Stats bucket aggregation
同级管道聚合,它计算同级聚合中指定度量的所有bucket的各种统计信息。指定的度量必须是数值,并且同级聚合必须是多桶聚合。
GET /kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "taxless_total_price"
}
}
}
},
"state_sales_month": {
"stats_bucket": {
"buckets_path": "sales_per_month>sales"
}
}
}
}
Extended stats bucket aggregation
同级管道聚合,它计算同级聚合中指定度量的所有bucket的各种统计信息。指定的度量必须是数值,并且同级聚合必须是多桶聚合。
与stats_bucket聚合相比,这个聚合提供了更多的统计信息(平方和、标准差等)。
GET /kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "taxless_total_price"
}
}
}
},
"sales_deriv": {
"extended_stats_bucket": {
"buckets_path": "sales_per_month>sales"
}
}
}
}
Max bucket aggregation
一个同级管道聚合,它用同级聚合中指定度量的最大值来标识桶,并输出桶的值和键。指定的度量必须是数值,并且同级聚合必须是多桶聚合。
GET /kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "taxless_total_price"
}
}
}
},
"max_bukcets_sales": {
"max_bucket": {
"buckets_path": "sales_per_month>sales"
}
}
}
}
Min bucket aggregation
一个同级管道聚合,它用同级聚合中指定度量的最小值来标识桶,并输出桶的值和键。指定的度量必须是数值,并且同级聚合必须是多桶聚合。
GET /kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "taxless_total_price"
}
}
}
},
"min_bukcets_sales": {
"min_bucket": {
"buckets_path": "sales_per_month>sales"
}
}
}
}
Sum bucket aggregation
一个同级管道聚合,它计算同级聚合中指定度量的所有桶的总和。指定的度量必须是数值,并且同级聚合必须是多桶聚合。
GET /kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "month"
},
"aggs": {
"sales": {
"sum": {
"field": "taxless_total_price"
}
}
}
},
"sum_sales_month": {
"sum_bucket": {
"buckets_path": "sales_per_month>sales"
}
}
}
}
Percentiles bucket aggregation
同级管道聚合,计算同级聚合中指定度量的所有bucket的百分比。指定的度量必须是数值,并且同级聚合必须是多桶聚合。
GET /kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"sales_per_day": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
},
"aggs": {
"sales": {
"sum": {
"field": "taxless_total_price"
}
}
}
},
"percentiles_monthly_sales": {
"percentiles_bucket": {
"buckets_path": "sales_per_day>sales"
}
}
}
}
也可以指定百分比
GET /kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"sales_per_day": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
},
"aggs": {
"sales": {
"sum": {
"field": "taxless_total_price"
}
}
}
},
"percentiles_monthly_sales": {
"percentiles_bucket": {
"buckets_path": "sales_per_day>sales",
"percents": [25,50,75]
}
}
}
}