Elasticsearch的聚合搜索用于对数据做一些复杂的分析统计,主要分为指标聚合、桶聚合、管道聚合、矩阵聚合。其中指标聚合、桶聚合最常使用。
本文测试数据采用官方测试数据库shakespeare(莎士比亚),可在Elasticsearch官网中下载到。此外本文内容均参考官方文档内容。
1 指标聚合
1.1 Max Aggregation
Max Aggregation用于查找最大值,例如查找shakespeare索引中line_id
最大的文档:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"max_line_id": {
"max": {
"field": "line_id"
}
}
}
}
max_line_id
为结果名,也可以为其它字符串,max_line_id
下面的键为聚合方式,其max
代表为Max Aggregation聚合,并需要指定field
为需要进行聚合的文档字段。
类似于MySQL中的select max(line_id) from shakespeare
。
查询结果为:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"max_line_id" : {
"value" : 111396.0
}
}
}
其查询结果位于aggregations
中,即最大值为111396。
1.2 Min Aggregation
和Max Aggregation相反,Min Aggregation用于查找最小值,例如查找shakespeare索引中line_id
最小的文档:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"min_line_id": {
"min": {
"field": "line_id"
}
}
}
}
最后查询结果同样在aggregations
中。
1.3 Avg Aggregation
Avg Aggregation用于计算平均数,例如计算shakespeare索引中line_id
字段的平均数:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"avg_line_id": {
"avg": {
"field": "line_id"
}
}
}
}
查询结果同样在aggregations
中。
1.4 Sum Aggregation
Sum Aggregation用于计算总和,例如计算shakespeare索引中line_id
字段的平均数:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"sum_line_id": {
"sum": {
"field": "line_id"
}
}
}
}
1.5 Cardinality Aggregation
Cardinality Aggregation用于基数统计,其作用是先执行类似SQL中的distinct
去重操作,然后统计其集合长度。例如下列查询中会统计出所有角色的数量:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"player_sum": {
"cardinality": {
"field": "play_name.keyword"
}
}
}
}
查询结果:
{
# 省略其它字段
"aggregations" : {
"player_sum" : {
"value" : 36
}
}
}
表示有36个角色。
1.6 Stats Aggregation
Stats Aggregation即基本统计,会返回count
、max
、min
、avg
、sum
统计数据,例如查询line_id
相关数据:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"line_id_stats": {
"stats": {
"field": "line_id"
}
}
}
}
查询结果:
{
# 省略其它字段
"aggregations" : {
"line_id_stats" : {
"count" : 110486,
"min" : 4.0,
"max" : 111396.0,
"avg" : 55715.89386890647,
"sum" : 6.15582625E9
}
}
}
1.7 Extended Stats Aggregation
Extended Stats Aggregation比Stats Aggregation多了4个字段:平方和、方差、标准差、平均值加减两个标准差的区间,例如:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"line_id_stats": {
"extended_stats": {
"field": "line_id"
}
}
}
}
查询结果:
{
# 省略其它字段
"aggregations" : {
"line_id_stats" : {
"count" : 110486,
"min" : 4.0,
"max" : 111396.0,
"avg" : 55715.89386890647,
"sum" : 6.15582625E9,
"sum_of_squares" : 4.57201930511864E14,
"variance" : 1.0338374861198297E9,
"std_deviation" : 32153.34331169668,
"std_deviation_bounds" : {
"upper" : 120022.58049229984,
"lower" : -8590.792754486894
}
}
}
}
1.8 Percentiles Aggregation
Percentiles Aggregation用于百分位统计,具体操作是将某个字段的数据从大到小排序,并计算相应的累计百分位,某一百分位所对应的数据的值就是这一百分位的百分位数。例如:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"line_id_percent": {
"percentiles": {
"field": "line_id",
"percents": [1, 5, 25, 50, 75, 95, 99]
}
}
}
}
查询结果:
{
# 省略其它字段
"aggregations" : {
"line_id_percent" : {
"values" : {
"1.0" : 1115.3600000000001,
"5.0" : 5575.834045307443,
"25.0" : 27887.286615736997,
"50.0" : 55711.257765161325,
"75.0" : 83561.89545235902,
"95.0" : 105830.47105865781,
"99.0" : 110287.32171428572
}
}
}
}
1.9 Value Count Aggregation
Value Count Aggregation可按字段统计文档数量,例如下面统计包含line_id
字段的文档数量:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"line_id_count": {
"value_count": {
"field": "line_id"
}
}
}
}
查询结果:
{
# 省略其它字段
"aggregations" : {
"line_id_count" : {
"value" : 110486
}
}
}
2 桶聚合
桶聚合类似于SQL中的GROUP BY
,即遍历文档内容,根据的文档内容将其放到不同的桶中。
2.1 Terms Aggregation
Terms Aggregation用于分组聚合,例如根据play_name
字段对不同的文档进行分组,然后统计每组文档的数量,相当于select count(*) from shakespeare group by play_name
。例如:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"per_player": {
"terms": {
"field": "play_name.keyword",
"size": 10
}
}
}
}
field
相当于GROUP BY
后面指定的字段,size
字段表示仅查询出数量前10的桶。
查询结果:
{
# 省略其它字段
"aggregations" : {
"per_player" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 72631,
"buckets" : [
{
"key" : "Hamlet",
"doc_count" : 4219
},
{
"key" : "Coriolanus",
"doc_count" : 3958
},
{
"key" : "Cymbeline",
"doc_count" : 3927
},
{
"key" : "Richard III",
"doc_count" : 3911
},
{
"key" : "Antony and Cleopatra",
"doc_count" : 3815
},
{
"key" : "Othello",
"doc_count" : 3742
},
{
"key" : "King Lear",
"doc_count" : 3735
},
{
"key" : "Troilus and Cressida",
"doc_count" : 3682
},
{
"key" : "A Winters Tale",
"doc_count" : 3469
},
{
"key" : "Henry VIII",
"doc_count" : 3397
}
]
}
}
}
2.2 Filter Aggregation
Filter Aggregation为过滤器聚合搜索,可以把符合过滤器中条件的文档划分到不同的桶中。例如:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"per_player": {
"filter": {
"term": {
"text_entry": "apple"
}
},
"aggs": {
"player": {
"terms": {
"field": "play_name.keyword",
"size": 10
}
}
}
}
}
}
上述查询可以找出text_entry
包含单词apple
的文档,并按play_name
进行分组统计。
查询结果:
{
# 省略其它字段
"aggregations" : {
"per_player" : {
"doc_count" : 10,
"player" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Taming of the Shrew",
"doc_count" : 2
},
{
"key" : "Twelfth Night",
"doc_count" : 2
},
{
"key" : "A Midsummer nights dream",
"doc_count" : 1
},
{
"key" : "Henry IV",
"doc_count" : 1
},
{
"key" : "King Lear",
"doc_count" : 1
},
{
"key" : "Loves Labours Lost",
"doc_count" : 1
},
{
"key" : "Merchant of Venice",
"doc_count" : 1
},
{
"key" : "The Tempest",
"doc_count" : 1
}
]
}
}
}
}
2.3 Filters Aggregation
Filters Aggregation相比Filter Aggregation,可以使用多个过滤器。例如:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"per_player": {
"filters": {
"filters": [
{"match": { "text_entry": "apple" } }
]
},
"aggs": {
"player": {
"terms": {
"field": "play_name.keyword",
"size": 10
}
}
}
}
}
}
filters
数组中可以定义多个过滤器。
2.4 Range Aggregation
Range Aggregation是范围聚合,用于反馈数据的分布情况,例如对line_id
按照0至10000,10000到50000,50000以上进行范围聚合,结果如下:
GET /shakespeare/_search
{
"size": 0,
"aggs": {
"id_range": {
"range": {
"field": "line_id",
"ranges": [
{ "from": 0, "to": 10000 },
{ "from": 10000, "to": 50000},
{ "from": 50000 }
]
}
}
}
}
查询结果:
{
# 省略其它字段
"aggregations" : {
"id_range" : {
"buckets" : [
{
"key" : "0.0-10000.0",
"from" : 0.0,
"to" : 10000.0,
"doc_count" : 9909
},
{
"key" : "10000.0-50000.0",
"from" : 10000.0,
"to" : 50000.0,
"doc_count" : 39664
},
{
"key" : "50000.0-*",
"from" : 50000.0,
"doc_count" : 60913
}
]
}
}
}