文章目录
- 1. 数据准备
- 2. metric aggregation分类
- 3.使用样例
- 1 . Avg Aggregation : 求query出来的结果的average 值
- 2 . Weighted Avg Aggregation: 带权重的average值,可以选取另一个字段的值作为权重值
- 3 . Max Aggregation: 求query出来的结果的max值
- 4 . Min Aggregation: 求query出来的结果的min值
- 5 . Sum Aggregation: 求query出来的结果的sum值
- 6 . Value Count Aggregation: query出来的结果的某个field的的值的个数,注意可能这个field的值是一个数组则这个一个doc可能就贡献了多个value count
- 7 . Cardinality Aggregation: 某个filed的value去重后的count, 可以理解为value_count做了去重
- 8 . Percentiles Aggregation: 百分比求值,对于大小的数值,求百分位,比如响应时间的99分位,95分位等对应的具体响应时间是多少
- 9 . Percentile Ranks Aggregation: 某个具体的响应时间在总体中所处的分位值
- 10. Stats Aggregation: 上面value_count,min,max,sum,avg的快捷方式
- 11. top_hit
elasticsearch的aggregate查询现在越来越丰富了,目前总共有4类。
- metric aggregation: 主要是min,max,avg,sum,percetile 等单个统计指标的查询,同时,可以用作bucket agg的子聚合查询,但是本身不能包含子查询
- bucket aggregation: 主要是类似group by的查询操作,而且可以含有子查询
- matrix aggregation: 使用多个字段的值进行计算从而产生一个多维矩阵
- pipline aggregation: 主要是能够在其他的aggregation进行一些附加的处理来增强数据
本篇就主要学习metric aggregation
1. 数据准备
演唱会的票信息
GET seats1028/_search
{
"play" : "Auntie Jo", # 演唱会名称
"date" : "2018-11-6", # 时间
"theatre" : "Skyline", # 地点
"sold" : false, # 这个票是否已经卖出
"actors" : [ # 演员
"Jo Hangum",
"Jon Hittle",
"Rob Kettleman",
"Laura Conrad",
"Simon Hower",
"Nora Blue"
],
"number" : 8, #可以使用的人数(团体座位)
"datetime" : 1541497200000,
"price" : 8321, # 票价
"tip" : 17.5, # 优惠
"time" : "5:40PM"
}
总共有3w条这样的数据
2. metric aggregation分类
1 . Avg Aggregation : 求query出来的结果的average 值
2 . Weighted Avg Aggregation: 带权重的average值,可以选取另一个字段的值作为权重值
3 . Max Aggregation: 求query出来的结果的max值
4 . Min Aggregation: 求query出来的结果的min值
5 . Sum Aggregation: 求query出来的结果的sum值
6 . Value Count Aggregation: query出来的结果的某个field的的值的个数,注意可能这个field的值是一个数组则这个一个doc可能就贡献了多个value count
7 . Cardinality Aggregation: 某个filed的value去重后的count, 可以理解为value_count做了去重
8 . Percentiles Aggregation: 百分比求值,对于大小的数值,求百分位,比如响应时间的99分位,95分位等对应的具体响应时间是多少
9 . Percentile Ranks Aggregation: 某个具体的响应时间在总体中所处的分位值
10. Stats Aggregation: 上面value_count,min,max,sum,avg的快捷方式
11. Extended Stats Aggregation: 在stats的基础上增加了平方和、方差、标准差、平均值加/减两个标准差的区间
12. Top Hits Aggregation: 一般是嵌套查询,用在term查询当中,返回每个bucket的topN
13. Geo Bounds Aggregation: 地理位置的边界聚合
14. Geo Centroid Aggregation:
15. Scripted Metric Aggregation: 使用script的聚合
16. Median Absolute Deviation Aggregation: 平方差聚合
metric agg可以用作bucket agg的子聚合查询,但是metric agg 本身不能包含子查询
3.使用样例
1 . Avg Aggregation : 求query出来的结果的average 值
GET seats1028/_search
{
"size": 0,
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
返回
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"avg_price" : {
"value" : 4995.498812220319
}
}
}
2 . Weighted Avg Aggregation: 带权重的average值,可以选取另一个字段的值作为权重值
GET seats1028/_search
{
"size": 0,
"aggs": {
"avg_price": {
"weighted_avg": {
"value": {
"field": "price"
},
"weight": {
"field": "number"
}
}
}
}
}
返回
{
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"avg_price" : {
"value" : 4981.713482182667
}
}
}
3 . Max Aggregation: 求query出来的结果的max值
GET seats1028/_search
{
"size": 0,
"aggs": {
"max_price": {
"max": {
"field": "price"
}
}
}
}
返回
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"max_price" : {
"value" : 9999.0
}
}
}
4 . Min Aggregation: 求query出来的结果的min值
GET seats1028/_search
{
"size": 0,
"aggs": {
"min_price": {
"min": {
"field": "price"
}
}
}
}
返回
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"min_price" : {
"value" : 0.0
}
}
}
5 . Sum Aggregation: 求query出来的结果的sum值
GET seats1028/_search
{
"size": 0,
"aggs": {
"sum_price": {
"sum": {
"field": "price"
}
}
}
}
返回
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"sum_price" : {
"value" : 1.80847048E8
}
}
}
6 . Value Count Aggregation: query出来的结果的某个field的的值的个数,注意可能这个field的值是一个数组则这个一个doc可能就贡献了多个value count
GET seats1028/_search
{
"size": 0,
"aggs": {
"count_price": {
"value_count": {
"field": "price"
}
}
}
}
返回
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"count_price" : {
"value" : 36202
}
}
}
7 . Cardinality Aggregation: 某个filed的value去重后的count, 可以理解为value_count做了去重
GET seats1028/_search
{
"size": 0,
"aggs": {
"unique_count_price": {
"cardinality": {
"field": "price"
}
}
}
}
返回
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"unique_count_price" : {
"value" : 9591
}
}
}
8 . Percentiles Aggregation: 百分比求值,对于大小的数值,求百分位,比如响应时间的99分位,95分位等对应的具体响应时间是多少
GET seats1028/_search
{
"size": 0,
"aggs": {
"percentile_price": {
"percentiles": {
"field": "price"
}
}
}
}
返回
{
"took" : 56,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"percentile_price" : {
"values" : {
"1.0" : 100.22,
"5.0" : 500.1716280451575,
"25.0" : 2478.398741375389,
"50.0" : 4990.070945393942,
"75.0" : 7509.41570777247,
"95.0" : 9487.07620155039,
"99.0" : 9894.284278074867
}
}
}
}
9 . Percentile Ranks Aggregation: 某个具体的响应时间在总体中所处的分位值
GET seats1028/_search
{
"size": 0,
"aggs": {
"rank_price": {
"percentile_ranks": {
"field": "price",
"values": [
1000,
2500
]
}
}
}
}
返回
{
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"rank_price" : {
"values" : {
"1000.0" : 9.948376390210644,
"2500.0" : 25.172259605722964
}
}
}
}
10. Stats Aggregation: 上面value_count,min,max,sum,avg的快捷方式
GET seats1028/_search
{
"size": 0,
"aggs": {
"stats_price": {
"stats": {
"field": "price"
}
}
}
}
返回
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"stats_price" : {
"count" : 36202,
"min" : 0.0,
"max" : 9999.0,
"avg" : 4995.498812220319,
"sum" : 1.80847048E8
}
}
}
11. top_hit
这个一般用于嵌套的子查询,比如下面的查询按照row划分bucket,然后找出每个bucket中的price最高的两个
GET seats1105/_search
{
"size": 1,
"query": {
"match_all": {}
},
"aggs": {
"row_term": {
"terms": {
"field": "row",
"size": 10
},
"aggs": {
"top_price": {
"top_hits": {
"size": 2,
"sort": [
{
"price": {
"order": "desc"
}
}
]
}
}
}
}
}
}
返回
"aggregations" : {
"row_term" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 2,
"doc_count" : 5796,
"top_price" : {
"hits" : {
"total" : {
"value" : 5796,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "seats1105",
"_type" : "_doc",
"_id" : "1957",
"_score" : null,
"_source" : {
"price" : 9998
},
"sort" : [
9998
]
},
{
"_index" : "seats1105",
"_type" : "_doc",
"_id" : "7105",
"_score" : null,
"_source" : {
"price" : 9997
},
"sort" : [
9997
]
}
]
}
}
},
{
"key" : 3,
"doc_count" : 5796,
"top_price" : {
"hits" : {
"total" : {
"value" : 5796,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "seats1105",
"_type" : "_doc",
"_id" : "3993",
"_score" : null,
"_source" : {
"price" : 9999
},
"sort" : [
9999
]
},
{
"_index" : "seats1105",
"_type" : "_doc",
"_id" : "6656",
"_score" : null,
"_source" : {
"price" : 9999
},
"sort" : [
9999
]
}
]
}
}
},
{
"key" : 1,
"doc_count" : 5791,
"top_price" : {
"hits" : {
"total" : {
"value" : 5791,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "seats1105",
"_type" : "_doc",
"_id" : "4321",
"_score" : null,
"_source" : {
"price" : 9999
},
"sort" : [
9999
]
},
{
"_index" : "seats1105",
"_type" : "_doc",
"_id" : "8380",
"_score" : null,
"_source" : {
"price" : 9999
},
"sort" : [
9999
]
}
]
}
}
}
]
}