桶 Buckets 、指标 Metrics
桶在概念上类似于 SQL 的分组(GROUP BY),而指标则类似于
COUNT()
、SUM()
、MAX()
等统计方法。
桶提供了一种给文档分组的方法来让我们可以计算感兴趣的指标。
大多数 指标 是简单的数学运算(例如最小值、平均值、最大值,还有汇总),这些是通过文档的值来计算。在实践中,指标能让你计算像平均薪资、最高出售价格、95%的查询延迟这样的数据。
数量聚合
eg: 根据村来统计文档的数目,相当于sql中的count(*)
select aab069, count(*) from xxx group by village
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"villages" : {
"terms" : {
"field" : "village.keyword"
}
}
}
}
聚合统计结果为
{
"aggregations": {
"villages": {
"doc_count_error_upper_bound": 7109,
"sum_other_doc_count": 11029748,
"buckets": [
{
"key": "数据A",
"doc_count": 30304
},
{
"key": "数据B",
"doc_count": 20846
},
{
"key": "数据C",
"doc_count": 9466
}
]
}
}
}
其他指标聚合
eg: 计算每个村的平均单价price,price必须为数值类型,不可为text或keyword
select village, avg(price) from xxx group by village
{
"size" : 0,
"aggs": {
"avg_price_for_village": {
"terms": {
"field": "village.keyword"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
聚合结果为:
{
"aggregations": {
"avg_price_for_village": {
"doc_count_error_upper_bound": 7109,
"sum_other_doc_count": 11029748,
"buckets": [
{
"key": "数据A",
"doc_count": 30304,
"avg_price": {
"value": 32500
}
},
{
"key": "数据B",
"doc_count": 20846,
"avg_price": {
"value": 32500
}
},
{
"key": "数据C",
"doc_count": 9466,
"avg_price": {
"value": 32500
}
}
]
}
}
}
嵌套桶聚合
根据多个维度进行聚合统计(汽车的颜色、制造商)
select color, makeby, count(*), avg(price) from xx group by color, makeby
{
"size" : 0,
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
},
"makeby": {
"terms": {
"field": "makeby"
}
}
}
}
}
}
聚合结果如下示例:
{
"aggregations": {
"colors": {
"buckets": [
{
"key": "red",
"doc_count": 4,
"makeby": {
"buckets": [
{
"key": "本田",
"doc_count": 3
},
{
"key": "宝马",
"doc_count": 1
}
]
},
"avg_price": {
"value": 32500
}
},
...
}