引言:aggregation定义
"aggregations" : {
"<aggregation_name>" : {
"<aggregation_type>" : {
<aggregation_body>
}
[,"meta" : { [<meta_data_body>] } ]?
[,"aggregations" : { [<sub_aggregation>]+ } ]?
}
[,"<aggregation_name_2>" : { ... } ]*
}
通过aggs语法我们发现:
- 每个聚合都有一个名字,而且用于响应返回中,标识数据属于对应的聚合。
- 每个聚合都有对应的
aggregation_type
,用户标识是何种聚合,紧接着是该聚合类型的aggregation_body
,每个聚合类型都有特定的语法,同样响应返回也有对应的语法。 - 对于同样的文档,可以同时定义多个同级别聚合,他们在
aggregation_name
层级对齐。 - 聚合可以存着子聚合,子聚合只会作用父集合中的数据,子聚合与父集合的
aggregation_type
层级对齐。
metrics aggregations
metrics aggregations
是数值型聚合,顾名思义,它只作用与数值型的字段。下面说明下各个数值型聚合的语法。
1.avg aggregation
获取对应字段的平均值
POST /exams/_search
{
"size": 0,
"aggs" : {
"avg_grade" : {
"avg" : {
"field" : "grade"
}
}
}
}
返回的语法格式为
{
...
"aggregations": {
"avg_grade": {
"value": 75.0
}
}
}
注意:自定义的agg_name也是返回的,作为各个聚合的识别标识。
另外提供了missing
参数,用于指定对应字段没有值时,使用的默认值,而且该属性基本数值型聚合都是具有的,不在赘述。
2.min aggregation
获取字段最小值。
POST /exams/_search
{
"size": 0,
"aggs" : {
"min_grade" : {
"min" : {
"field" : "grade"
}
}
}
}
返回的语法格式为
{
...
"aggregations": {
"min_grade": {
"value": 75.0
}
}
}
3.max aggregation
获取字段最大值。
POST /exams/_search
{
"size": 0,
"aggs" : {
"max_grade" : {
"max" : {
"field" : "grade"
}
}
}
}
返回的语法格式为
{
...
"aggregations": {
"max_grade": {
"value": 75.0
}
}
}
4.sum aggregation
获取字段之和值。
POST /exams/_search
{
"size": 0,
"aggs" : {
"sum_grade" : {
"sum" : {
"field" : "grade"
}
}
}
}
返回的语法格式为
{
...
"aggregations": {
"sum_grade": {
"value": 75.0
}
}
}
5.value count aggregation
获取含有该字段的文档数
POST /exams/_search
{
"size": 0,
"aggs" : {
"count_grade" : {
"value_count" : {
"field" : "grade"
}
}
}
}
返回的语法格式为
{
...
"aggregations": {
"count_grade": {
"value": 3
}
}
}
6.stats aggregation
一次性获取该字段的 count、max、min、avg、sum的值
POST /exams/_search
{
"size": 0,
"aggs" : {
"stats_grade" : {
"stats" : {
"field" : "grade"
}
}
}
}
返回的语法格式为
{
...
"aggregations": {
"stats_grade": {
"count": 6,
"min": 60,
"max": 98,
"avg": 78.5,
"sum": 471
}
}
}
同时提供了扩展的extended_stats
7.cardinality aggregation
获取字段基数个数。(可以理解为去重)
POST /exams/_search
{
"size": 0,
"aggs" : {
"cardinality_grade" : {
"cardinality" : {
"field" : "grade"
}
}
}
}
返回的语法格式为
{
...
"aggregations": {
"cardinality_grade": {
"value": 3
}
}
}
注意:这不是一个精确的值,对于大的基数或大值,该值不是精确的,而且误差很大。
8.top hits aggregation
按照顺序获取top-n的明细数据,该聚合主要用于bucket aggregation
的子聚合,获取每个分桶后top-n的数据
可以指定的参数:
- from :偏移量
- size :最大量 (默认3)
- sort :排序规则
POST /sales/_search
{
"size" : 0,
"aggs": {
"term_type": {
"terms": {
"field": "type",
"size": 3
},
"aggs": {
"top_sales_hits": {
"top_hits": {
"sort": [
{
"date": {
"order": "desc"
}
}
],
"_source": {
"includes": [ "date", "price" ]
},
"size" : 1
}
}
}
}
}
}
返回的结果
{
...
"aggregations": {
"term_type": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "hat",
"doc_count": 3,
"top_sales_hits": {
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "sales",
"_type": "sale",
"_id": "AVnNBmauCQpcRyxw6ChK",
"_source": {
"date": "2015/03/01 00:00:00",
"price": 200
},
"sort": [
1425168000000
],
"_score": null
}
]
}
}
},
{
"key": "t-shirt",
"doc_count": 3,
"top_sales_hits": {
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "sales",
"_type": "sale",
"_id": "AVnNBmauCQpcRyxw6ChL",
"_source": {
"date": "2015/03/01 00:00:00",
"price": 175
},
"sort": [
1425168000000
],
"_score": null
}
]
}
}
},
{
"key": "bag",
"doc_count": 1,
"top_sales_hits": {
"hits": {
"total": 1,
"max_score": null,
"hits": [
{
"_index": "sales",
"_type": "sale",
"_id": "AVnNBmatCQpcRyxw6ChH",
"_source": {
"date": "2015/01/01 00:00:00",
"price": 150
},
"sort": [
1420070400000
],
"_score": null
}
]
}
}
}
]
}
}
}
9.percentiles aggregation
百分位数表示的是一定百分比的观测值所出现的点。例如,第95个百分位数是比95%的观测值都要大的值。被用来估计数据的分布并确定数据是否倾斜、双峰等。
{
"aggs" : {
"load_time_outlier" : {
"percentiles" : {
"field" : "load_time"
}
}
}
}
默认的,percentiles指标会生成[1,5,25,50,75,95,99]这个范围的百分位数
{
...
"aggregations": {
"load_time_outlier": {
"values" : {
"1.0": 15,
"5.0": 20,
"25.0": 23,
"50.0": 25,
"75.0": 29,
"95.0": 60,
"99.0": 150
}
}
}
}
也可以自定义范围:
{
"aggs" : {
"load_time_outlier" : {
"percentiles" : {
"field" : "load_time",
"percents" : [95, 99, 99.9]
}
}
}
}