1. 数据的准备
我们的数据结构为:
https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json
{"index":{"_id":"1"}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
{"index":{"_id":"6"}}
{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"hattiebond@netagy.com","city":"Dante","state":"TN"}
通过 POST /bank/account/_bulk 批量插入我们的自定义的 bank中。
2. 定义
elasticsearch7.x中为我们提供了这几种查询:
Elasticsearch将聚合分为三类:
3. 演示
我们用几个案例演示其中的几个API的用法
1)搜索address中包含mill的所有人的年龄分布以及平均年龄
先通过Query DSL查询出address中包含mill的所有数据(如果我们只想要统计的数据,可以用size把数据过滤掉)
然后用aggs聚合。terms将筛选出来的数据按age分组统计;avg则查询数据中的年龄平均值
GET /bank/_search
{
"query": {
"match": {
"address": "mill"
}
},
"aggs": {
"ageAgg": {
"terms": { "field": "age" }
},
"avgAge": { "avg": { "field": "age" } }
},
"size":0
}
最后的结果为:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 38,
"doc_count" : 2
},
{
"key" : 28,
"doc_count" : 1
},
{
"key" : 32,
"doc_count" : 1
}
]
},
"avgAge" : {
"value" : 34.0
}
}
}
可以发现,aggregations对象种有我们定义的agAgg对象和avgAge对象,存放着我们想要的值。
2)按照年龄聚合,并查询这些年龄段对应的平均薪资
这次我们是相当于聚合后,在根据聚合的结果进行再次聚合。
GET /bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageAgg": {
"terms": { "field": "age" },
"aggs":{
"avgBalance":{"avg":{"field":"balance"}}
}
}
},
"size":0
}
我们在ageAgg中,嵌套一个aggs,对balance字段进行求平均值。
返回的数据结构如下:
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 463,
"buckets" : [
{
"key" : 31,
"doc_count" : 61,
"avgBalance" : {
"value" : 28312.918032786885
}
}
...
3)查出所有年龄分布,并且这些年龄分段中M的平均薪资和F的平均薪资以及这个年龄段的总体平均薪资
这次我们先按年龄分组,在年龄分组中嵌套gender分组,然后基于gender分组求balance的平均薪资
GET /bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageAgg": {
"terms": { "field": "age" },
"aggs":{
"genderAgg":{
"terms":{
"field": "gender.keyword"
},
"aggs": {
"balanceAvg": {
"avg": {"field": "balance"}
}
}
}
}
}
},
"size":0
}
返回的数据结构如下:
"aggregations" : {
"ageAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 463,
"buckets" : [
{
"key" : 31,
"doc_count" : 61,
"genderAgg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "M",
"doc_count" : 35,
"balanceAvg" : {
"value" : 29565.628571428573
}
},
{
"key" : "F",
"doc_count" : 26,
"balanceAvg" : {
"value" : 26626.576923076922
}
}
]
}
},
...
...
4. 总结
elasticsearch官方为我们提供了很多的聚合类型,我们需要使用是可以查看对应的文档。