【ElasticSearch-聚合查询】ES聚合统计及springboot对比实现


ElasticSearch聚合操作

Elasticsearch除搜索以外,提供了针对ES 数据进行统计分析的功能。聚合(aggregations)可以让我们极其方便的实现对数据的统计、分析、运算。

基本语法
聚合查询的语法结构与其他查询相似,通常包含以下部分:

  • 查询条件:指定需要聚合的文档,可以使用标准的 Elasticsearch 查询语法,如 term、match、range 等等。
  • 聚合函数:指定要执行的聚合操作,如 sum、avg、min、max、terms、date_histogram 等等。每个聚合命令都会生成一个聚合结果。
  • 聚合嵌套:聚合命令可以嵌套,以便更细粒度地分析数据。

提示:以下是本篇文章正文内容,下面案例可供参考

一、数据准备

1.Index Mapping

{
    "mappings": {
        "properties": {
            "age": {
                "type": "integer"
            },
            "job": {
                "fields": {
                    "keyword": {
                        "ignore_above": 50,
                        "type": "keyword"
                    }
                },
                "type": "text"
            },
            "name": {
                "type": "keyword"
            },
            "salary": {
                "type": "integer"
            },
            "sex": {
                "type": "keyword"
            }
        }
    }
}

2.Index Data

[
  { "name" : "李四","age":41,"job":"Dev Manager","sex":"male","salary": 50000},
  { "name" : "绯色","age":36,"job":"Java Developer","sex":"female","salary":38000 },
  { "name" : "埃斯基","age":33,"job":"Java Developer","sex":"male","salary":28000},
  { "name" : "张三","age":32,"job":"Manager","sex":"female","salary":35000 },
  { "name" : "王佛为","age":32,"job":"Java Developer","sex":"male","salary":22000 },
  { "name" : "马里奥","age":32,"job":"Javascript Developer","sex":"male","salary": 25000},
  { "name" : "马路","age":31,"job":"UI","sex":"female","salary": 25000},
  { "name" : "李佛尔","age":31,"job":"Java Developer","sex":"male","salary": 32000},
  { "name" : "应善","age":30,"job":"Java Developer","sex":"female","salary":30000 },
  { "name" : "坦克","age":30,"job":"DBA","sex":"male","salary": 30000},
  { "name" : "王五","age":25,"job":"Designer","sex":"male","salary":18000 },
  { "name" : "坤坤","age":26,"job":"Designer","sex":"female","salary": 22000},
  { "name" : "王超","age":25,"job":"UI","sex":"female","salary":18000 },
  { "name" : "李飞","age":27,"job":"UI","sex":"male","salary":20000 },
  { "name" : "万五千","age":27,"job":"Java Developer","sex":"male","salary": 20000},
  { "name" : "李讲萨","age":20,"job":"Java Developer","sex":"male","salary": 9000},
  { "name" : "海坤","age":21,"job":"Javascript Developer","sex":"male","salary": 16000},
  { "name" : "奥特","age":25,"job":"Javascript Developer","sex":"male","salary": 16000},
  { "name" : "图图","age":29,"job":"Javascript Developer","sex":"female","salary": 20000},
  { "name" : "李澎","age":29,"job":"DBA","sex":"female","salary": 20000}
]

二、Bucket Aggregation

按照一定的规则,将文档分配到不同的桶中,从而达到分类的目的。ES提供的一些常见的 Bucket Aggregation。

  • Terms,需要字段支持filedata
    • keyword 默认支持 fielddata
    • text 需要在Mapping 中开启 fielddata,会按照分词后的结果进行分桶
  • 数字类型
    • Range / Data Range
    • Histogram(直方图) / Date Histogram
  • 支持嵌套: 也就在桶里再做分桶

桶聚合可以用于各种场景,例如:

  • 对数据进行分组统计,比如按照地区、年龄段、性别等字段进行分组统计。
  • 对时间序列数据进行时间段分析,比如按照每小时、每天、每月、每季度、每年等时间段进行分析。
  • 对各种标签信息分类,并统计其数量。

1.Terms (词项聚合)

Terms Aggregation(词项聚合):将文档按指定字段的值进行分组,并计算每个分组的文档数或其他指标。

聚合可配置属性有:

  • field:指定聚合字段
  • size:指定聚合结果数量
  • order:指定聚合结果排序方式
    默认情况下,Bucket聚合会统计Bucket内的文档数量,记为_count,并且按照_count降序排序。

: 按工作(job)分类统计年龄(age)大于等于33人数,最后按数量增序排列

GET aggs_index/_search
{
  "size": 0, 
  "aggs": {
    "cardinate_job": {
      "terms": {
        "field": "job.keyword",
        "order": {
          "_count": "asc"
        }
      }
    }
  },
  "query": {
    "range": {
      "age": {
        "gte": 33
      }
    }
  }
}

返回数据如下:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "cardinate_job" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Dev Manager",
          "doc_count" : 1
        },
        {
          "key" : "Java Developer",
          "doc_count" : 2
        }
      ]
    }
  }
}

Java实现:

    /**
     * 对年龄大于等于33的数据分类统计数量
     */
    @RequestMapping(value = "/query_terms", method = RequestMethod.GET)
    @ApiOperation(value = "Aggregation - query_terms")
    public void qtQuery() throws Exception {
        // 定义请求对象
        SearchSourceBuilder builder = new SearchSourceBuilder();
        // 定义查询范围
        QueryBuilder queryBuilder = QueryBuilders.rangeQuery("age").gte(33);
        // 聚合分析字段
        String age = "job.keyword";
        // 定义分组名称
        String cardinate_job = "cardinate_job";
        //聚合查询 order true=asc/false=desc
        BucketOrder order = BucketOrder.count(true);
        AggregationBuilder avg = AggregationBuilders.terms(cardinate_job).field(age).order(order);
        builder.query(queryBuilder).aggregation(avg);
        // 打印返回数据
        SearchResponse search = client.aggregationSearch(builder, INDEX_NAME);
        Map<String, Aggregation> map = search.getAggregations().asMap();
        Terms cardData = (Terms) map.get(cardinate_job);
        List<? extends Terms.Bucket> buckets = cardData.getBuckets();
        for (Terms.Bucket bucket : buckets) {
            LOGGER.info("key:{};  doc_count:{};", bucket.getKey().toString(), bucket.getDocCount());
        }
    }

查询数据如下:
key:Dev Manager;  doc_count:1;
key:Java Developer;  doc_count:2;

注意 :
1.其中在查询语句中设置了"size": 0, 含义是:只返回聚合结果,不返回查询结果。
2.可以发现在按照job进行分类统计的时候用的是job.keyword,并不是直接使用job,是因为在创建索引的时候job默认是text类型,如果想要直接对text类型的字段进行聚合统计,需要对 Text 字段打开 fielddata(如下),但是对job.keyword 和 job 进行 terms 聚合,分桶的总数并不一样,是因为对text字段统计是基于分词的结果,而对keyword是基于每个Document整体。

PUT /aggs_index/_mapping
{
  "properties" : {
    "job":{
       "type":  "text",
       "fielddata": true
    }
  }
}

2.Range (范围聚合)

Range Aggregation(范围聚合):将文档按指定的范围进行分组,然后对每个范围内的文档进行统计。

  • 按照数字的范围,进行分桶
  • 在Range Aggregation中,可以自定义Key

: 按工资0-10000,10000-20000,20000+ 这三个区间段统计人数

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "aggs_salary": {
      "range": {
        "field": "salary",
        "ranges": [
          {
            "from": 0,
            "to": 10000
          },{
            "from": 10000,
            "to": 20000
          },{
            "from": 20000
          }
        ]
      }
    }
  }
}

返回结果如下:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "aggs_salary" : {
      "buckets" : [
        {
          "key" : "0.0-10000.0",
          "from" : 0.0,
          "to" : 10000.0,
          "doc_count" : 1
        },
        {
          "key" : "10000.0-20000.0",
          "from" : 10000.0,
          "to" : 20000.0,
          "doc_count" : 4
        },
        {
          "key" : "20000.0-*",
          "from" : 20000.0,
          "doc_count" : 15
        }
      ]
    }
  }
}

Java实现:

    /**
     * 统计工资区间的人数
     */
    @RequestMapping(value = "/range", method = RequestMethod.GET)
    @ApiOperation(value = "Aggregation - range")
    public void rangeQuery() throws Exception {
        // 定义请求对象
        SearchSourceBuilder builder = new SearchSourceBuilder();
        // 聚合分析字段
        String salary = "salary";
        // 定义分组名称
        String range_salary = "range_salary";
        AggregationBuilder range = AggregationBuilders.range(range_salary)
                .field(salary)
                .addRange(0,10000)
                .addRange(10000,20000)
                .addRange(20000,Double.MAX_VALUE);
        builder.aggregation(range);
        // 打印返回数据
        SearchResponse search = client.aggregationSearch(builder, INDEX_NAME);
        Map<String, Aggregation> map = search.getAggregations().asMap();
        Range rangeData = (Range) map.get(range_salary);
        List<? extends Range.Bucket> buckets = rangeData.getBuckets();
        for (Range.Bucket bucket : buckets) {
            LOGGER.info("key:{};  doc_count:{};", bucket.getKey().toString(), bucket.getDocCount());
        }
    }

返回数据如下:
key:0.0-10000.0;  doc_count:1;
key:10000.0-20000.0;  doc_count:4;
key:20000.0-1.7976931348623157E308;  doc_count:15;

3.Histogram (直方图聚合)

Histogram Aggregation(直方图聚合):将文档按指定的间隔进行分组,并对每个间隔内的文档进行统计。

: 按照工资的间隔(区间值为5000)分桶

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "agg_his": {
      "histogram": {
        "field": "salary",
        "interval": 5000
      }
    }
  }
}

返回数据如下:
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "agg_his" : {
      "buckets" : [
        {
          "key" : 5000.0,
          "doc_count" : 1
        },
        {
          "key" : 10000.0,
          "doc_count" : 0
        },
        {
          "key" : 15000.0,
          "doc_count" : 4
        },
        {
          "key" : 20000.0,
          "doc_count" : 6
        },
        {
          "key" : 25000.0,
          "doc_count" : 3
        },
        {
          "key" : 30000.0,
          "doc_count" : 3
        },
        {
          "key" : 35000.0,
          "doc_count" : 2
        },
        {
          "key" : 40000.0,
          "doc_count" : 0
        },
        {
          "key" : 45000.0,
          "doc_count" : 0
        },
        {
          "key" : 50000.0,
          "doc_count" : 1
        }
      ]
    }
  }
}

Java实现:

    @RequestMapping(value = "/Histogram", method = RequestMethod.GET)
    @ApiOperation(value = "Aggregation - Histogram")
    public void histogramQuery() throws Exception {
        // 定义请求对象
        SearchSourceBuilder builder = new SearchSourceBuilder();
        // 聚合分析字段
        String salary = "salary";
        // 定义分组名称
        String range_salary = "histogram_salary";
        AggregationBuilder his = AggregationBuilders.histogram(range_salary).field(salary).interval(5000);
        builder.aggregation(his);
        // 打印返回数据
        SearchResponse search = client.aggregationSearch(builder, INDEX_NAME);
        Map<String, Aggregation> map = search.getAggregations().asMap();
        Histogram rangeData = (Histogram) map.get(range_salary);
        List<? extends Histogram.Bucket> buckets = rangeData.getBuckets();
        for (Histogram.Bucket bucket : buckets) {
            LOGGER.info("key:{};  doc_count:{};", bucket.getKey().toString(), bucket.getDocCount());
        }
    }

返回数据如下:
: key:5000.0;  doc_count:1;
: key:10000.0;  doc_count:0;
: key:15000.0;  doc_count:4;
: key:20000.0;  doc_count:6;
: key:25000.0;  doc_count:3;
: key:30000.0;  doc_count:3;
: key:35000.0;  doc_count:2;
: key:40000.0;  doc_count:0;
: key:45000.0;  doc_count:0;
: key:50000.0;  doc_count:1;

: 此时发现返回的数据有点多,可以通过设置min_doc_count=1,表示返回的数据中统计的数量至少为1

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "agg_his": {
      "histogram": {
        "field": "salary",
        "interval": 5000,
        "min_doc_count": 1
      }
    }
  }
}
返回数据如下:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "agg_his" : {
      "buckets" : [
        {
          "key" : 5000.0,
          "doc_count" : 1
        },
        {
          "key" : 15000.0,
          "doc_count" : 4
        },
        {
          "key" : 20000.0,
          "doc_count" : 6
        },
        {
          "key" : 25000.0,
          "doc_count" : 3
        },
        {
          "key" : 30000.0,
          "doc_count" : 3
        },
        {
          "key" : 35000.0,
          "doc_count" : 2
        },
        {
          "key" : 50000.0,
          "doc_count" : 1
        }
      ]
    }
  }
}

: 此时可以发现key=5000的数量有1个,表示[5000,10000) 有1人,如果此时想从0开始计数可以通过设置extended_bounds的最大最小值来控制查询范围

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "agg_his": {
      "histogram": {
        "field": "salary",
        "interval": 5000,
        "extended_bounds": {
          "min": 0,
          "max": 20000
        }
      }
    }
  }
}

返回数据如下:
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "agg_his" : {
      "buckets" : [
        {
          "key" : 0.0,
          "doc_count" : 0
        },
        {
          "key" : 5000.0,
          "doc_count" : 1
        },
        {
          "key" : 10000.0,
          "doc_count" : 0
        },
        {
          "key" : 15000.0,
          "doc_count" : 4
        },
        {
          "key" : 20000.0,
          "doc_count" : 6
        },
        {
          "key" : 25000.0,
          "doc_count" : 3
        },
        {
          "key" : 30000.0,
          "doc_count" : 3
        },
        {
          "key" : 35000.0,
          "doc_count" : 2
        },
        {
          "key" : 40000.0,
          "doc_count" : 0
        },
        {
          "key" : 45000.0,
          "doc_count" : 0
        },
        {
          "key" : 50000.0,
          "doc_count" : 1
        }
      ]
    }
  }
}

: 此时又发现,其实最大最小值并没有完全生效
extended_bounds: extended_bounds参数也用于限制聚合结果的边界范围,但与hard_bounds不同的是,extended_bounds允许结果在指定的范围之外继续计算。这意味着即使结果超出了边界范围,Elasticsearch也会计算并返回这些超出范围的结果。

三、Metrics Aggregations

Metrics Aggregations(度量聚合)是一种用于计算数值指标的聚合类型。它们允许对字段进行统计计算,例如计算平均值、和、最小值、最大值、计数等。Metrics Aggregations是在查询结果的基础上进行数值计算,以便对数据进行更深入的分析和理解。

1. Avg、Sum、Min、Max Aggregation

(平均值聚合/和聚合/最小值聚合/最大值聚合)

: 统计每种工作的平均工资、工资总和、最小工资、最大工资

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "term_job": {
      "terms": {
        "field": "job.keyword"
      },
      "aggs": {
        "avg_salary": {
          "avg": {
            "field": "salary"
          }
        },
        "sum_salary": {
          "sum": {
            "field": "salary"
          }
        },
        "max_salary": {
          "max": {
            "field": "salary"
          }
        },
        "min_salary": {
          "min": {
            "field": "salary"
          }
        }
      }
    }
  }
}

返回数据如下:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "term_job" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java Developer",
          "doc_count" : 7,
          "max_salary" : {
            "value" : 38000.0
          },
          "sum_salary" : {
            "value" : 179000.0
          },
          "min_salary" : {
            "value" : 9000.0
          },
          "avg_salary" : {
            "value" : 25571.428571428572
          }
        },
        {
          "key" : "Javascript Developer",
          "doc_count" : 4,
          "max_salary" : {
            "value" : 25000.0
          },
          "sum_salary" : {
            "value" : 77000.0
          },
          "min_salary" : {
            "value" : 16000.0
          },
          "avg_salary" : {
            "value" : 19250.0
          }
        },
        {
          "key" : "UI",
          "doc_count" : 3,
          "max_salary" : {
            "value" : 25000.0
          },
          "sum_salary" : {
            "value" : 63000.0
          },
          "min_salary" : {
            "value" : 18000.0
          },
          "avg_salary" : {
            "value" : 21000.0
          }
        },
        {
          "key" : "DBA",
          "doc_count" : 2,
          "max_salary" : {
            "value" : 30000.0
          },
          "sum_salary" : {
            "value" : 50000.0
          },
          "min_salary" : {
            "value" : 20000.0
          },
          "avg_salary" : {
            "value" : 25000.0
          }
        },
        {
          "key" : "Designer",
          "doc_count" : 2,
          "max_salary" : {
            "value" : 22000.0
          },
          "sum_salary" : {
            "value" : 40000.0
          },
          "min_salary" : {
            "value" : 18000.0
          },
          "avg_salary" : {
            "value" : 20000.0
          }
        },
        {
          "key" : "Dev Manager",
          "doc_count" : 1,
          "max_salary" : {
            "value" : 50000.0
          },
          "sum_salary" : {
            "value" : 50000.0
          },
          "min_salary" : {
            "value" : 50000.0
          },
          "avg_salary" : {
            "value" : 50000.0
          }
        },
        {
          "key" : "Manager",
          "doc_count" : 1,
          "max_salary" : {
            "value" : 35000.0
          },
          "sum_salary" : {
            "value" : 35000.0
          },
          "min_salary" : {
            "value" : 35000.0
          },
          "avg_salary" : {
            "value" : 35000.0
          }
        }
      ]
    }
  }
}


Java实现:

   @RequestMapping(value = "/termsQuery", method = RequestMethod.GET, produces = "text/html;charset=UTF-8")
    @ApiOperation(value = "Aggregation - 按工作分类统计工资")
    public void termsQuery() throws Exception {
        // 定义请求对象
        SearchSourceBuilder builder = new SearchSourceBuilder();
        // 分桶字段
        String job = "job.keyword";
        // 分桶分组名称
        String terms_job = "term_job";
        // 按工作类别分桶
        AggregationBuilder job_terms = AggregationBuilders.terms(terms_job).field(job);

        // 聚合分析字段
        String salary = "salary";
        // 1.定义分组名称,查询平均工资
        String avg_salary = "avg_salary";
        AggregationBuilder avg = AggregationBuilders.avg(avg_salary).field(salary);
        // 桶嵌套
        job_terms.subAggregation(avg);
        // 2.定义分组名称,查询最大工资
        String max_salary = "max_salary";
        AggregationBuilder max = AggregationBuilders.max(max_salary).field(salary);
        job_terms.subAggregation(max);
        // 3.定义分组名称,查询最小工资
        String min_salary = "min_salary";
        AggregationBuilder min = AggregationBuilders.min(min_salary).field(salary);
        job_terms.subAggregation(min);
        // 4.定义分组名称,查询工资之和
        String sum_salary = "sum_salary";
        AggregationBuilder sum = AggregationBuilders.sum(sum_salary).field(salary);
        job_terms.subAggregation(sum);

        // 因为是聚合统计,不需要返回查询数据的信息
        builder.size(0).aggregation(job_terms);

        // 查询数据
        SearchResponse search = client.aggregationSearch(builder, INDEX_NAME);
        // 打印返回数据
        Map<String, Aggregation> map = search.getAggregations().asMap();
        Terms termsData = (Terms) map.get(terms_job);
        List<? extends Terms.Bucket> buckets = termsData.getBuckets();
        for (Terms.Bucket bucket : buckets) {
            Map<String, Aggregation> ageMap = bucket.getAggregations().asMap();
            System.out.println("key:"+bucket.getKey().toString() + ";  doc_count:" + bucket.getDocCount() + ";");
            Avg avgSalary = (Avg) ageMap.get(avg_salary);
            Max maxSalary = (Max) ageMap.get(max_salary);
            Sum sumSalary = (Sum) ageMap.get(sum_salary);
            Min minSalary = (Min) ageMap.get(min_salary);
            System.out.println("平均工资:" + avgSalary.getValue());
            System.out.println("最大工资:" + maxSalary.getValue());
            System.out.println("最小工资:" + minSalary.getValue());
            System.out.println("年龄工资:" + sumSalary.getValue() + "\n");
        }
    }

返回数据打印如下:
key:Java Developer;  doc_count:7;
平均工资:25571.428571428572
最大工资:38000.0
最小工资:9000.0
年龄工资:179000.0

key:Javascript Developer;  doc_count:4;
平均工资:19250.0
最大工资:25000.0
最小工资:16000.0
年龄工资:77000.0

key:UI;  doc_count:3;
平均工资:21000.0
最大工资:25000.0
最小工资:18000.0
年龄工资:63000.0

key:DBA;  doc_count:2;
平均工资:25000.0
最大工资:30000.0
最小工资:20000.0
年龄工资:50000.0

key:Designer;  doc_count:2;
平均工资:20000.0
最大工资:22000.0
最小工资:18000.0
年龄工资:40000.0

key:Dev Manager;  doc_count:1;
平均工资:50000.0
最大工资:50000.0
最小工资:50000.0
年龄工资:50000.0

key:Manager;  doc_count:1;
平均工资:35000.0
最大工资:35000.0
最小工资:35000.0
年龄工资:35000.0

2. Stats Aggregation(统计聚合)

同时计算平均值、和、最小值和最大值。

: > : 统计每种工作的平均工资、工资总和、最小工资、最大工资

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "term_job": {
      "terms": {
        "field": "job.keyword"
      },
      "aggs": {
        "stats_salary": {
          "stats": {
            "field": "salary"
          }
        }
      }
    }
  }
}

返回数据如下:
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "term_job" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java Developer",
          "doc_count" : 7,
          "stats_salary" : {
            "count" : 7,
            "min" : 9000.0,
            "max" : 38000.0,
            "avg" : 25571.428571428572,
            "sum" : 179000.0
          }
        },
        {
          "key" : "Javascript Developer",
          "doc_count" : 4,
          "stats_salary" : {
            "count" : 4,
            "min" : 16000.0,
            "max" : 25000.0,
            "avg" : 19250.0,
            "sum" : 77000.0
          }
        },
        {
          "key" : "UI",
          "doc_count" : 3,
          "stats_salary" : {
            "count" : 3,
            "min" : 18000.0,
            "max" : 25000.0,
            "avg" : 21000.0,
            "sum" : 63000.0
          }
        },
        {
          "key" : "DBA",
          "doc_count" : 2,
          "stats_salary" : {
            "count" : 2,
            "min" : 20000.0,
            "max" : 30000.0,
            "avg" : 25000.0,
            "sum" : 50000.0
          }
        },
        {
          "key" : "Designer",
          "doc_count" : 2,
          "stats_salary" : {
            "count" : 2,
            "min" : 18000.0,
            "max" : 22000.0,
            "avg" : 20000.0,
            "sum" : 40000.0
          }
        },
        {
          "key" : "Dev Manager",
          "doc_count" : 1,
          "stats_salary" : {
            "count" : 1,
            "min" : 50000.0,
            "max" : 50000.0,
            "avg" : 50000.0,
            "sum" : 50000.0
          }
        },
        {
          "key" : "Manager",
          "doc_count" : 1,
          "stats_salary" : {
            "count" : 1,
            "min" : 35000.0,
            "max" : 35000.0,
            "avg" : 35000.0,
            "sum" : 35000.0
          }
        }
      ]
    }
  }
}

Java实现:

   @RequestMapping(value = "/statsQuery", method = RequestMethod.GET)
    @ApiOperation(value = "Aggregation - 使用stats按工作分类统计工资")
    public void statsSalaryQuery() throws Exception {
        // 定义请求对象
        SearchSourceBuilder builder = new SearchSourceBuilder();
        // 分桶字段
        String job = "job.keyword";
        // 分桶分组名称
        String terms_job = "term_job";
        // 按工作类别分桶
        AggregationBuilder job_terms = AggregationBuilders.terms(terms_job).field(job);

        // 聚合分析字段
        String salary = "salary";
        // 1.定义分组名称,查询平均工资
        String stats_salary = "stats_salary";
        AggregationBuilder stats = AggregationBuilders.stats(stats_salary).field(salary);
        // 桶嵌套
        job_terms.subAggregation(stats);

        // 因为是聚合统计,不需要返回查询数据的信息
        builder.size(0).aggregation(job_terms);

        // 查询数据
        SearchResponse search = client.aggregationSearch(builder, INDEX_NAME);
        // 打印返回数据
        Map<String, Aggregation> map = search.getAggregations().asMap();
        Terms termsData = (Terms) map.get(terms_job);
        List<? extends Terms.Bucket> buckets = termsData.getBuckets();
        for (Terms.Bucket bucket : buckets) {
            Map<String, Aggregation> ageMap = bucket.getAggregations().asMap();
            System.out.println("key:"+bucket.getKey().toString() + ";  doc_count:" + bucket.getDocCount() + ";");
            Stats statsSalary = (Stats) ageMap.get(stats_salary);
            System.out.println("平均工资:" + statsSalary.getAvg());
            System.out.println("最大工资:" + statsSalary.getMax());
            System.out.println("最小工资:" + statsSalary.getMin());
            System.out.println("年龄工资:" + statsSalary.getSum() + "\n");
        }
    }

查询数据如下:
key:Java Developer;  doc_count:7;
平均工资:25571.428571428572
最大工资:38000.0
最小工资:9000.0
工资总和:179000.0

key:Javascript Developer;  doc_count:4;
平均工资:19250.0
最大工资:25000.0
最小工资:16000.0
工资总和:77000.0

key:UI;  doc_count:3;
平均工资:21000.0
最大工资:25000.0
最小工资:18000.0
工资总和:63000.0

key:DBA;  doc_count:2;
平均工资:25000.0
最大工资:30000.0
最小工资:20000.0
工资总和:50000.0

key:Designer;  doc_count:2;
平均工资:20000.0
最大工资:22000.0
最小工资:18000.0
工资总和:40000.0

key:Dev Manager;  doc_count:1;
平均工资:50000.0
最大工资:50000.0
最小工资:50000.0
工资总和:50000.0

key:Manager;  doc_count:1;
平均工资:35000.0
最大工资:35000.0
最小工资:35000.0
工资总和:35000.0


3. Extended Stats Aggregation(扩展统计聚合)

在统计聚合的基础上,增加了标准差和方差的计算。

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "term_job": {
      "terms": {
        "field": "job.keyword"
      },
      "aggs": {
        "stats_salary": {
          "extended_stats": {
            "field": "salary"
          }
        }
      }
    }
  }
}

返回数据如下:
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "term_job" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java Developer",
          "doc_count" : 7,
          "stats_salary" : {
            "count" : 7,
            "min" : 9000.0,
            "max" : 38000.0,
            "avg" : 25571.428571428572,
            "sum" : 179000.0,
            "sum_of_squares" : 5.117E9,
            "variance" : 7.710204081632654E7,
            "std_deviation" : 8780.776777502464,
            "std_deviation_bounds" : {
              "upper" : 43132.982126433504,
              "lower" : 8009.875016423644
            }
          }
        },
        {
          "key" : "Javascript Developer",
          "doc_count" : 4,
          "stats_salary" : {
            "count" : 4,
            "min" : 16000.0,
            "max" : 25000.0,
            "avg" : 19250.0,
            "sum" : 77000.0,
            "sum_of_squares" : 1.537E9,
            "variance" : 1.36875E7,
            "std_deviation" : 3699.6621467371856,
            "std_deviation_bounds" : {
              "upper" : 26649.324293474372,
              "lower" : 11850.675706525628
            }
          }
        },
        {
          "key" : "UI",
          "doc_count" : 3,
          "stats_salary" : {
            "count" : 3,
            "min" : 18000.0,
            "max" : 25000.0,
            "avg" : 21000.0,
            "sum" : 63000.0,
            "sum_of_squares" : 1.349E9,
            "variance" : 8666666.666666666,
            "std_deviation" : 2943.920288775949,
            "std_deviation_bounds" : {
              "upper" : 26887.8405775519,
              "lower" : 15112.159422448101
            }
          }
        },
        {
          "key" : "DBA",
          "doc_count" : 2,
          "stats_salary" : {
            "count" : 2,
            "min" : 20000.0,
            "max" : 30000.0,
            "avg" : 25000.0,
            "sum" : 50000.0,
            "sum_of_squares" : 1.3E9,
            "variance" : 2.5E7,
            "std_deviation" : 5000.0,
            "std_deviation_bounds" : {
              "upper" : 35000.0,
              "lower" : 15000.0
            }
          }
        },
        {
          "key" : "Designer",
          "doc_count" : 2,
          "stats_salary" : {
            "count" : 2,
            "min" : 18000.0,
            "max" : 22000.0,
            "avg" : 20000.0,
            "sum" : 40000.0,
            "sum_of_squares" : 8.08E8,
            "variance" : 4000000.0,
            "std_deviation" : 2000.0,
            "std_deviation_bounds" : {
              "upper" : 24000.0,
              "lower" : 16000.0
            }
          }
        },
        {
          "key" : "Dev Manager",
          "doc_count" : 1,
          "stats_salary" : {
            "count" : 1,
            "min" : 50000.0,
            "max" : 50000.0,
            "avg" : 50000.0,
            "sum" : 50000.0,
            "sum_of_squares" : 2.5E9,
            "variance" : 0.0,
            "std_deviation" : 0.0,
            "std_deviation_bounds" : {
              "upper" : 50000.0,
              "lower" : 50000.0
            }
          }
        },
        {
          "key" : "Manager",
          "doc_count" : 1,
          "stats_salary" : {
            "count" : 1,
            "min" : 35000.0,
            "max" : 35000.0,
            "avg" : 35000.0,
            "sum" : 35000.0,
            "sum_of_squares" : 1.225E9,
            "variance" : 0.0,
            "std_deviation" : 0.0,
            "std_deviation_bounds" : {
              "upper" : 35000.0,
              "lower" : 35000.0
            }
          }
        }
      ]
    }
  }
}

4. Cardinality Aggregation(基数聚合)

计算指定字段的唯一值数量。

: 统计每种工作中有多少种工资

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "term_job": {
      "terms": {
        "field": "job.keyword"
      },
      "aggs": {
        "card_salary": {
          "cardinality": {
            "field": "salary"
          }
        }
      }
    }
  }
}

返回数据如下:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "term_job" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java Developer",
          "doc_count" : 7,
          "card_salary" : {
            "value" : 7
          }
        },
        {
          "key" : "Javascript Developer",
          "doc_count" : 4,
          "card_salary" : {
            "value" : 3
          }
        },
        {
          "key" : "UI",
          "doc_count" : 3,
          "card_salary" : {
            "value" : 3
          }
        },
        {
          "key" : "DBA",
          "doc_count" : 2,
          "card_salary" : {
            "value" : 2
          }
        },
        {
          "key" : "Designer",
          "doc_count" : 2,
          "card_salary" : {
            "value" : 2
          }
        },
        {
          "key" : "Dev Manager",
          "doc_count" : 1,
          "card_salary" : {
            "value" : 1
          }
        },
        {
          "key" : "Manager",
          "doc_count" : 1,
          "card_salary" : {
            "value" : 1
          }
        }
      ]
    }
  }
}

Java实现:

    @RequestMapping(value = "/cardinality", method = RequestMethod.GET)
    @ApiOperation(value = "Aggregation - 统计每种工作中有多少种工资")
    public void cardinalityQuery() throws Exception {
        // 定义请求对象
        SearchSourceBuilder builder = new SearchSourceBuilder();
        // 分桶字段
        String job = "job.keyword";
        // 分桶分组名称
        String terms_job = "term_job";
        // 按工作类别分桶
        AggregationBuilder job_terms = AggregationBuilders.terms(terms_job).field(job);

        // 聚合分析字段
        String salary = "salary";
        // 1.定义分组名称,查询平均工资
        String card_salary = "card_salary";
        AggregationBuilder card = AggregationBuilders.cardinality(card_salary).field(salary);
        // 桶嵌套
        job_terms.subAggregation(card);

        // 因为是聚合统计,不需要返回查询数据的信息
        builder.size(0).aggregation(job_terms);

        // 查询数据
        SearchResponse search = client.aggregationSearch(builder, INDEX_NAME);
        // 打印返回数据
        Map<String, Aggregation> map = search.getAggregations().asMap();
        Terms termsData = (Terms) map.get(terms_job);
        List<? extends Terms.Bucket> buckets = termsData.getBuckets();
        for (Terms.Bucket bucket : buckets) {
            Map<String, Aggregation> salaryMap = bucket.getAggregations().asMap();
            System.out.println("key:"+bucket.getKey().toString() + ";  doc_count:" + bucket.getDocCount() + ";");
            Cardinality cardSalary = (Cardinality) salaryMap.get(card_salary);
            System.out.println("去重后工资类型数量:" + cardSalary.getValue() + "\n");
        }
    }
返回数据如下:
key:Java Developer;  doc_count:7;
去重后工资类型数量:7

key:Javascript Developer;  doc_count:4;
去重后工资类型数量:3

key:UI;  doc_count:3;
去重后工资类型数量:3

key:DBA;  doc_count:2;
去重后工资类型数量:2

key:Designer;  doc_count:2;
去重后工资类型数量:2

key:Dev Manager;  doc_count:1;
去重后工资类型数量:1

key:Manager;  doc_count:1;
去重后工资类型数量:1

5. Value Count Aggregation(数值计数聚合)

计算指定字段的非空值数量,该函数通常用于对数据进行统计分析,以便了解某一字段中唯一值的数量。例如,可以使用value_count函数来统计一个字段中不同类型的文档数量,或者统计一个字段中的不同取值的数量。这样可以帮助用户更好地了解数据分布和特征。

: 统计每种工作工资不为空的数量

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "term_job": {
      "terms": {
        "field": "job.keyword"
      },
      "aggs": {
        "value_salary": {
          "value_count": {
            "field": "salary"
          }
        }
      }
    }
  }
}

返回数据如下:
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "term_job" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java Developer",
          "doc_count" : 7,
          "value_salary" : {
            "value" : 7
          }
        },
        {
          "key" : "Javascript Developer",
          "doc_count" : 4,
          "value_salary" : {
            "value" : 4
          }
        },
        {
          "key" : "UI",
          "doc_count" : 3,
          "value_salary" : {
            "value" : 3
          }
        },
        {
          "key" : "DBA",
          "doc_count" : 2,
          "value_salary" : {
            "value" : 2
          }
        },
        {
          "key" : "Designer",
          "doc_count" : 2,
          "value_salary" : {
            "value" : 2
          }
        },
        {
          "key" : "Dev Manager",
          "doc_count" : 1,
          "value_salary" : {
            "value" : 1
          }
        },
        {
          "key" : "Manager",
          "doc_count" : 1,
          "value_salary" : {
            "value" : 1
          }
        }
      ]
    }
  }
}

6. Scripted Metric Aggregation(脚本度量聚合)

允许根据自定义脚本计算指标值。

Scripted Metric Aggregation 聚合函数的语法包括以下几个固定词汇:

  1. init_script:初始化脚本,用于初始化聚合状态的脚本逻辑。
  2. map_script:映射脚本,用于在每个文档上执行并更新聚合状态的脚本逻辑。
  3. combine_script:合并脚本,用于合并分片上的聚合状态的脚本逻辑。
  4. reduce_script:归约脚本,用于在所有分片上汇总和计算最终聚合结果的脚本逻辑。

总的来说就是: 1.在每个分片上定义一个数组(容器) --> 2.每个分片汇总符合条件的数据 --> 3.每个分片上执行汇总每个分片上数组操作 --> 4.根据条件汇总每个分片提交上来的数组,最终返回

: 统计每类工作中sex=male的工资总和

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "term_job": {
      "terms": {
        "field": "job.keyword"
      },
      "aggs": {
        "scripted_salary": {
          "scripted_metric": {
            "init_script": "state.transactions = []",
            "map_script": "state.transactions.add(doc.sex.value=='male' ? doc.salary.value : 0)",
            "combine_script": "double price = 0; for(a in state.transactions) {price+=a} return price",
            "reduce_script": "double allpro = 0; for (t in states) {allpro+=t} return allpro"
          }
        }
      }
    }
  }
}
返回数据如下:
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "term_job" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java Developer",
          "doc_count" : 7,
          "scripted_salary" : {
            "value" : 111000.0
          }
        },
        {
          "key" : "Javascript Developer",
          "doc_count" : 4,
          "scripted_salary" : {
            "value" : 57000.0
          }
        },
        {
          "key" : "UI",
          "doc_count" : 3,
          "scripted_salary" : {
            "value" : 20000.0
          }
        },
        {
          "key" : "DBA",
          "doc_count" : 2,
          "scripted_salary" : {
            "value" : 30000.0
          }
        },
        {
          "key" : "Designer",
          "doc_count" : 2,
          "scripted_salary" : {
            "value" : 18000.0
          }
        },
        {
          "key" : "Dev Manager",
          "doc_count" : 1,
          "scripted_salary" : {
            "value" : 50000.0
          }
        },
        {
          "key" : "Manager",
          "doc_count" : 1,
          "scripted_salary" : {
            "value" : 0.0
          }
        }
      ]
    }
  }
}

Java实现:

    @RequestMapping(value = "/scripted_metric", method = RequestMethod.GET)
    @ApiOperation(value = "Aggregation - scripted_metric")
    public void scriptedQuery() throws Exception {
        // 定义请求对象
        SearchSourceBuilder builder = new SearchSourceBuilder();
        // 分桶字段
        String job = "job.keyword";
        // 分桶分组名称
        String terms_job = "term_job";
        // 按工作类别分桶
        AggregationBuilder job_terms = AggregationBuilders.terms(terms_job).field(job);

        // 1.定义分组名称,查询平均工资
        String scripted_salary = "scripted_salary";
        AggregationBuilder scripted = AggregationBuilders.scriptedMetric(scripted_salary)
                .initScript(new Script("state.transactions = []"))
                .mapScript(new Script("state.transactions.add(doc.sex.value=='male' ? doc.salary.value : 0)"))
                .combineScript(new Script("double price = 0; for(a in state.transactions) {price+=a} return price"))
                .reduceScript(new Script("double allpro = 0; for (t in states) {allpro+=t} return allpro"));
        // 桶嵌套
        job_terms.subAggregation(scripted);
        // 因为是聚合统计,不需要返回查询数据的信息
        builder.size(0).aggregation(job_terms);

        // 查询数据
        SearchResponse search = client.aggregationSearch(builder, INDEX_NAME);
        // 打印返回数据
        Map<String, Aggregation> map = search.getAggregations().asMap();
        Terms termsData = (Terms) map.get(terms_job);
        List<? extends Terms.Bucket> buckets = termsData.getBuckets();
        for (Terms.Bucket bucket : buckets) {
            Map<String, Aggregation> salaryMap = bucket.getAggregations().asMap();
            System.out.println("key:"+bucket.getKey().toString() + ";  doc_count:" + bucket.getDocCount() + ";");
            ScriptedMetric scrSalary = (ScriptedMetric) salaryMap.get(scripted_salary);
            System.out.println("去重后工资类型数量:" + scrSalary.aggregation().toString() + "\n");
        }
    }

返回数据如下:
key:Java Developer;  doc_count:7;
去重后工资类型数量:111000.0

key:Javascript Developer;  doc_count:4;
去重后工资类型数量:57000.0

key:UI;  doc_count:3;
去重后工资类型数量:20000.0

key:DBA;  doc_count:2;
去重后工资类型数量:30000.0

key:Designer;  doc_count:2;
去重后工资类型数量:18000.0

key:Dev Manager;  doc_count:1;
去重后工资类型数量:50000.0

key:Manager;  doc_count:1;
去重后工资类型数量:0.0

7.Top_hits(桶聚合查询)

Top_hits 表示返回每个桶内的文档的字段值,类似于SQL中的GROUP BY和TOP子句的组合。它的作用是返回每个桶内的文档,并可以指定返回文档中的字段或者计算字段的值。这种聚合适用于需要查找每个分组内部的具体文档信息的情况。

: 统计不同工种中,年纪最大的3个员工的具体信息

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "job_term": {
      "terms": {
        "field": "job.keyword"
      },
      "aggs": {
        "age_top": {
          "top_hits": {
            "size": 3,
            "sort": [{
              "age": {
                "order": "desc"
              }
            }]
          }
        }
      }
    }
  }
}

返回数据如下:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "job_term" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java Developer",
          "doc_count" : 7,
          "age_top" : {
            "hits" : {
              "total" : {
                "value" : 7,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "aggs_index",
                  "_type" : "_doc",
                  "_id" : "-8SKH40BE9ToH2j_hE3R",
                  "_score" : null,
                  "_source" : {
                    "sex" : "female",
                    "name" : "绯色",
                    "job" : "Java Developer",
                    "salary" : 38000,
                    "age" : 36
                  },
                  "sort" : [
                    36
                  ]
                },
                {
                  "_index" : "aggs_index",
                  "_type" : "_doc",
                  "_id" : "_8SKH40BE9ToH2j_hE3R",
                  "_score" : null,
                  "_source" : {
                    "sex" : "male",
                    "name" : "埃斯基",
                    "job" : "Java Developer",
                    "salary" : 28000,
                    "age" : 33
                  },
                  "sort" : [
                    33
                  ]
                },
                {
                  "_index" : "aggs_index",
                  "_type" : "_doc",
                  "_id" : "-cSKH40BE9ToH2j_hE3R",
                  "_score" : null,
                  "_source" : {
                    "sex" : "male",
                    "name" : "王佛为",
                    "job" : "Java Developer",
                    "salary" : 22000,
                    "age" : 32
                  },
                  "sort" : [
                    32
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "Javascript Developer",
          "doc_count" : 4,
          "age_top" : {
            "hits" : {
              "total" : {
                "value" : 4,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "aggs_index",
                  "_type" : "_doc",
                  "_id" : "_sSKH40BE9ToH2j_hE3R",
                  "_score" : null,
                  "_source" : {
                    "sex" : "male",
                    "name" : "马里奥",
                    "job" : "Javascript Developer",
                    "salary" : 25000,
                    "age" : 32
                  },
                  "sort" : [
                    32
                  ]
                },
                {
                  "_index" : "aggs_index",
                  "_type" : "_doc",
                  "_id" : "AsSKH40BE9ToH2j_hE7R",
                  "_score" : null,
                  "_source" : {
                    "sex" : "female",
                    "name" : "图图",
                    "job" : "Javascript Developer",
                    "salary" : 20000,
                    "age" : 29
                  },
                  "sort" : [
                    29
                  ]
                },
                {
                  "_index" : "aggs_index",
                  "_type" : "_doc",
                  "_id" : "AcSKH40BE9ToH2j_hE7R",
                  "_score" : null,
                  "_source" : {
                    "sex" : "male",
                    "name" : "奥特",
                    "job" : "Javascript Developer",
                    "salary" : 16000,
                    "age" : 25
                  },
                  "sort" : [
                    25
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "UI",
          "doc_count" : 3,
          "age_top" : {
            "hits" : {
              "total" : {
                "value" : 3,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "aggs_index",
                  "_type" : "_doc",
                  "_id" : "9sSKH40BE9ToH2j_hE3R",
                  "_score" : null,
                  "_source" : {
                    "sex" : "female",
                    "name" : "马路",
                    "job" : "UI",
                    "salary" : 25000,
                    "age" : 31
                  },
                  "sort" : [
                    31
                  ]
                },
                {
                  "_index" : "aggs_index",
                  "_type" : "_doc",
                  "_id" : "98SKH40BE9ToH2j_hE3R",
                  "_score" : null,
                  "_source" : {
                    "sex" : "male",
                    "name" : "李飞",
                    "job" : "UI",
                    "salary" : 20000,
                    "age" : 27
                  },
                  "sort" : [
                    27
                  ]
                },
                {
                  "_index" : "aggs_index",
                  "_type" : "_doc",
                  "_id" : "9cSKH40BE9ToH2j_hE3R",
                  "_score" : null,
                  "_source" : {
                    "sex" : "female",
                    "name" : "王超",
                    "job" : "UI",
                    "salary" : 18000,
                    "age" : 25
                  },
                  "sort" : [
                    25
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "DBA",
          "doc_count" : 2,
          "age_top" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "aggs_index",
                  "_type" : "_doc",
                  "_id" : "A8SKH40BE9ToH2j_hE7R",
                  "_score" : null,
                  "_source" : {
                    "sex" : "male",
                    "name" : "坦克",
                    "job" : "DBA",
                    "salary" : 30000,
                    "age" : 30
                  },
                  "sort" : [
                    30
                  ]
                },
                {
                  "_index" : "aggs_index",
                  "_type" : "_doc",
                  "_id" : "BMSKH40BE9ToH2j_hE7R",
                  "_score" : null,
                  "_source" : {
                    "sex" : "female",
                    "name" : "李澎",
                    "job" : "DBA",
                    "salary" : 20000,
                    "age" : 29
                  },
                  "sort" : [
                    29
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "Designer",
          "doc_count" : 2,
          "age_top" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "aggs_index",
                  "_type" : "_doc",
                  "_id" : "9MSKH40BE9ToH2j_hE3R",
                  "_score" : null,
                  "_source" : {
                    "sex" : "female",
                    "name" : "坤坤",
                    "job" : "Designer",
                    "salary" : 22000,
                    "age" : 26
                  },
                  "sort" : [
                    26
                  ]
                },
                {
                  "_index" : "aggs_index",
                  "_type" : "_doc",
                  "_id" : "88SKH40BE9ToH2j_hE3R",
                  "_score" : null,
                  "_source" : {
                    "sex" : "male",
                    "name" : "王五",
                    "job" : "Designer",
                    "salary" : 18000,
                    "age" : 25
                  },
                  "sort" : [
                    25
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "Dev Manager",
          "doc_count" : 1,
          "age_top" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "aggs_index",
                  "_type" : "_doc",
                  "_id" : "8sSKH40BE9ToH2j_hE3R",
                  "_score" : null,
                  "_source" : {
                    "sex" : "male",
                    "name" : "李四",
                    "job" : "Dev Manager",
                    "salary" : 50000,
                    "age" : 41
                  },
                  "sort" : [
                    41
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "Manager",
          "doc_count" : 1,
          "age_top" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "aggs_index",
                  "_type" : "_doc",
                  "_id" : "8cSKH40BE9ToH2j_hE3R",
                  "_score" : null,
                  "_source" : {
                    "sex" : "female",
                    "name" : "张三",
                    "job" : "Manager",
                    "salary" : 35000,
                    "age" : 32
                  },
                  "sort" : [
                    32
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
}

Java实现:

    @RequestMapping(value = "/tophits", method = RequestMethod.GET)
    @ApiOperation(value = "Aggregation - Tophits")
    public void tophitsQuery() throws Exception {
        // 定义请求对象
        SearchSourceBuilder builder = new SearchSourceBuilder();
        // 聚合分析字段
        String job = "job.keyword";
        String age  = "age";
        // 聚合分组名称
        String job_term = "job_term";
        String age_top  = "age_top";

        AggregationBuilder jobData = AggregationBuilders.terms(job_term).field(job);
        AggregationBuilder ageData = AggregationBuilders.topHits(age_top)
                .size(3)
                .sort(SortBuilders.fieldSort(age)
                        .order(SortOrder.DESC));
        jobData.subAggregation(ageData);
        builder.aggregation(jobData);
        // 打印返回数据
        SearchResponse search = client.aggregationSearch(builder, INDEX_NAME);
        Map<String, Aggregation> map = search.getAggregations().asMap();
        Terms rangeData = (Terms) map.get(job_term);
        List<? extends Terms.Bucket> buckets = rangeData.getBuckets();
        for (Terms.Bucket bucket : buckets) {
            Map<String, Aggregation> pileMap = bucket.getAggregations().asMap();
            System.out.println("key:"+bucket.getKey().toString() + ";  doc_count:" + bucket.getDocCount() + ";");
            TopHits ageTopData = (TopHits) pileMap.get(age_top);
            // 打印具体数据
            SearchHits hits = ageTopData.getHits();
            for (SearchHit hit: hits.getHits()) {
                System.out.println(hit.getSourceAsMap().toString());
            }
        }
    }

返回数据如下:
key:Java Developer;  doc_count:7;
{sex=female, name=绯色, job=Java Developer, salary=38000, age=36}
{sex=male, name=埃斯基, job=Java Developer, salary=28000, age=33}
{sex=male, name=王佛为, job=Java Developer, salary=22000, age=32}

key:Javascript Developer;  doc_count:4;
{sex=male, name=马里奥, job=Javascript Developer, salary=25000, age=32}
{sex=female, name=图图, job=Javascript Developer, salary=20000, age=29}
{sex=male, name=奥特, job=Javascript Developer, salary=16000, age=25}

key:UI;  doc_count:3;
{sex=female, name=马路, job=UI, salary=25000, age=31}
{sex=male, name=李飞, job=UI, salary=20000, age=27}
{sex=female, name=王超, job=UI, salary=18000, age=25}

key:DBA;  doc_count:2;
{sex=male, name=坦克, job=DBA, salary=30000, age=30}
{sex=female, name=李澎, job=DBA, salary=20000, age=29}

key:Designer;  doc_count:2;
{sex=female, name=坤坤, job=Designer, salary=22000, age=26}
{sex=male, name=王五, job=Designer, salary=18000, age=25}

key:Dev Manager;  doc_count:1;
{sex=male, name=李四, job=Dev Manager, salary=50000, age=41}

key:Manager;  doc_count:1;
{sex=female, name=张三, job=Manager, salary=35000, age=32}

四、Pipeline Aggregation

对聚合分析的结果,再次进行聚合分析,此类聚合的作用对象往往是桶,而不是文档,是一种后期对每个分桶的一些计算操作

1.概述

Pipeline Aggregation (管道聚合)是Elasticsearch中一种特殊类型的聚合操作,用于对其他聚合结果进行进一步处理和计算。

在 Pipeline Aggregation 中,有两种主要的聚合类型:Sibling Aggregation(兄弟聚合)和Parent Aggregation(父聚合)。

  • Sibling Aggregation 是指多个聚合操作在同一级别进行,并且它们之间的结果是并列的,没有任何层次关系。Sibling Aggregation 可以用于对多个字段进行聚合,然后将它们的结果合并在一起,此类聚合的输入是其【兄弟聚合】的输出。常用的 Sibling Aggregation 函数有:

    • Avg Bucket(平均桶聚合):计算每个桶中的文档平均值。
    • Sum Bucket(求和桶聚合):计算每个桶中的文档求和值。
    • Max Bucket(最大值桶聚合):找出每个桶中的文档最大值。
    • Min Bucket(最小值桶聚合):找出每个桶中的文档最小值。
    • Stats Bucket(统计桶聚合):计算每个桶中的文档统计信息,包括平均值、求和值、最大值、最小值等。
    • Percentiles_bucket(百分位数分桶)计算每个桶百分位数
  • Parent Aggregation 是指多个聚合操作是嵌套层次结构的关系,其中一个聚合是父聚合,其他聚合是子聚合。此类聚合的"输入"是其【父聚合】的输出。Parent Aggregation 可以用于对字段进行分组,然后在每个分组内再进行聚合。常用的 Parent Aggregation 函数有:

    • Derivative(导数聚合):计算桶中指定字段的导数。
    • Moving Average(移动平均聚合):计算桶中指定字段的移动平均值。
    • Cumulative Sum(累积求和聚合):计算桶中指定字段的累积求和。

特殊: Bucket Script(脚本聚合):使用自定义脚本计算桶中指定字段的聚合结果。

总结:
Pipeline Aggregation 是 Elasticsearch 中一种强大的聚合类型,允许在已经聚合的结果上进行进一步的聚合操作。其中,Sibling Aggregation 是并列的多个聚合操作,没有层次关系;Parent Aggregation 是嵌套的聚合操作,其中一个是父聚合,其他是子聚合。使用 Pipeline Aggregation 可以对多个字段进行聚合,分组聚合等复杂的计算操作。

2.Bucket_Path语法

在Pipeline Aggregation中,bucket_path语法用于引用前一个聚合的桶(bucket)或指标值(metric)的结果。它允许您在后续聚合操作中使用前一个聚合操作的结果。

bucket_path语法有两个主要的构造元素:bucketsvalues

  • buckets用于引用前一个聚合操作中创建的桶,可以通过桶的名称或索引来引用。例如,buckets.my_agg_name将引用名为my_agg_name的桶。
  • values用于引用前一个聚合操作中创建的指标值,也可以通过名称或索引来引用。例如,values.my_agg_name.value将引用名为my_agg_name的指标值。

除了引用前一个聚合操作的结果,bucket_path语法还支持一些其他操作。

  • 使用.运算符可以在结果中引用特定的属性。例如,buckets.my_agg_name.key将引用my_agg_name桶的键。
  • 使用[]运算符可以引用桶的索引。例如,buckets[0]将引用第一个桶。
  • 使用[-1]表示引用最后一个桶。

下面是一个示例,以说明bucket_path语法的使用:

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "terms_job": {
      "terms": {
        "field": "job.keyword"
      },
      "aggs": {
        "avg_salary": {
          "avg": {
            "field": "salary"
          }
        }
      }
    },
    "my_bucket" : {
      "min_bucket": {
        "buckets_path": "terms_job>avg_salary"
      }
    }
  }
}

在上面的示例中,首先创建了一个名为terms_job的桶聚合操作,然后在my_bucket聚合操作中使用了bucket_path来引用terms_jobz桶内avg_salary值,最后求得按工作划分的平均工资中最小的工作。

3.Sibline Aggregation

3.1 Min_bucket(最小值桶聚合)

: 查询平均工资最低的工种

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword",
        "size": 10
      },
      "aggs": {
        "avg_salary": {
          "avg": {
            "field": "salary"
          }
        }
      }
    },
    "min_salary_by_job": {
      "min_bucket": {
        "buckets_path": "jobs>avg_salary"
      }
    }
  }
}

查询结果如下:
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "jobs" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java Developer",
          "doc_count" : 7,
          "avg_salary" : {
            "value" : 25571.428571428572
          }
        },
        {
          "key" : "Javascript Developer",
          "doc_count" : 4,
          "avg_salary" : {
            "value" : 19250.0
          }
        },
        {
          "key" : "UI",
          "doc_count" : 3,
          "avg_salary" : {
            "value" : 21000.0
          }
        },
        {
          "key" : "DBA",
          "doc_count" : 2,
          "avg_salary" : {
            "value" : 25000.0
          }
        },
        {
          "key" : "Designer",
          "doc_count" : 2,
          "avg_salary" : {
            "value" : 20000.0
          }
        },
        {
          "key" : "Dev Manager",
          "doc_count" : 1,
          "avg_salary" : {
            "value" : 50000.0
          }
        },
        {
          "key" : "Manager",
          "doc_count" : 1,
          "avg_salary" : {
            "value" : 35000.0
          }
        }
      ]
    },
    "min_salary_by_job" : {
      "value" : 19250.0,
      "keys" : [
        "Javascript Developer"
      ]
    }
  }
}

分析:先按照工做(job)分类,再统计每种工作的平均工资,再获取到统计好的分类中工资最低的桶。

Java实现:

    @RequestMapping(value = "/pipeline_min", method = RequestMethod.GET)
    @ApiOperation(value = "min_bucket")
    public void minQuery() throws Exception {
        // 定义请求对象
        SearchSourceBuilder builder = new SearchSourceBuilder();
        // 分桶字段
        String job = "job.keyword";
        // 分桶分组名称
        String terms_job = "term_job";
        // 按工作类别分桶
        AggregationBuilder job_terms = AggregationBuilders.terms(terms_job).field(job);

        // 聚合分析字段
        String salary = "salary";
        // 1.定义分组名称,查询平均工资
        String avg_salary = "avg_salary";
        AggregationBuilder avgAggs = AggregationBuilders.avg(avg_salary).field(salary);
        // 2.桶嵌套
        job_terms.subAggregation(avgAggs);
        // 因为是聚合统计,不需要返回查询数据的信息
        builder.size(0).aggregation(job_terms);
        // 定义聚合名称
        String bucket_name = "min_salary_by_job";
        // 定义buckets_path
        String buckets_path = terms_job + ">" + avg_salary;
        // 添加聚合
        builder.aggregation(PipelineAggregatorBuilders.minBucket(bucket_name, buckets_path));
        // 查询数据
        SearchResponse search = client.aggregationSearch(builder, INDEX_NAME);
        // 打印返回数据
        Map<String, Aggregation> map = search.getAggregations().asMap();
        Terms termsData = (Terms) map.get(terms_job);
        List<? extends Terms.Bucket> buckets = termsData.getBuckets();
        for (Terms.Bucket bucket : buckets) {
            Map<String, Aggregation> salaryMap = bucket.getAggregations().asMap();
            System.out.println("key:"+bucket.getKey().toString() + ";  doc_count:" + bucket.getDocCount() + ";");
            Avg avg = (Avg) salaryMap.get(avg_salary);
            System.out.println("平均工资:" + avg.getValue());
        }
        // 获取分组最小工资
        BucketMetricValue minData = (BucketMetricValue) map.get(bucket_name);
        System.out.println("工作:" + minData.keys()[0] + "; 最小工资:" + minData.getValueAsString());

    }
}

返回数据如下:
key:Java Developer;  doc_count:7;
平均工资:25571.428571428572
key:Javascript Developer;  doc_count:4;
平均工资:19250.0
key:UI;  doc_count:3;
平均工资:21000.0
key:DBA;  doc_count:2;
平均工资:25000.0
key:Designer;  doc_count:2;
平均工资:20000.0
key:Dev Manager;  doc_count:1;
平均工资:50000.0
key:Manager;  doc_count:1;
平均工资:35000.0

工作:Javascript Developer; 最小工资:19250.0

3.2 Max_bucket(最大值桶聚合)

: 同 Min_bucket,不多作赘述。

3.3 Sum_bucket(求和桶聚合)

: 同 Min_bucket,不多作赘述。

3.4 Avg_bucket(平均桶聚合)

: 同 Min_bucket,不多作赘述。

3.5 Stats_bucket (统计桶聚合)

: 同 Min_bucket, Stats_bucket只是返回值会一次性返回多种聚合操作。

    "min_salary_by_job" : {
      "count" : 7,
      "min" : 19250.0,
      "max" : 50000.0,
      "avg" : 27974.48979591837,
      "sum" : 195821.42857142858
    }

3.6 Extended_stats_bucket(扩展桶统计聚合)

: 同 Stats_bucket,Extended_stats_bucket会基于Stats_bucket返回多种聚合操作(如均值、标准差、最小值、最大值等统计指标)。

    "min_salary_by_job" : {
      "count" : 7,
      "min" : 19250.0,
      "max" : 50000.0,
      "avg" : 27974.48979591837,
      "sum" : 195821.42857142858,
      "sum_of_squares" : 6.215460459183674E9,
      "variance" : 1.0535084339858396E8,
      "std_deviation" : 10264.055894166982,
      "std_deviation_bounds" : {
        "upper" : 48502.601584252334,
        "lower" : 7446.378007584404
      }
    }

3.7 Percentiles_bucket(百分位数分桶)

Elasticsearch的percentiles_bucket(百分位数分桶)是一个聚合操作,用于计算指定字段上的百分位数,并将结果按照另一个字段进行分桶。

百分位数是一种统计指标,用于表示一组数据中某个特定百分比处的值。例如,第50百分位数就是中位数,表示有一半的数据小于它,一半的数据大于它。

percentiles_bucket操作的输入是一个基于某个字段的数据集,它首先通过percentiles子聚合计算出指定百分位数的值。然后,它使用另一个字段来将这些计算得到的百分位数值进行分桶。这个分桶字段可以是任意类型的,但最常用的是日期或数字字段。

使用percentiles_bucket操作时,需要指定以下参数:

  • percentiles:要计算的百分位数值,可以是单个百分位数或一个百分位数列表。
  • buckets_path:需要用来分桶的字段路径。
  • gap_policy:指定如何处理缺少的百分位数值,默认为"skip",表示跳过缺少的百分位数值;另一个选项是"insert_zeros",表示将缺少的百分位数值设置为0

: 每类工作的平均工资的百分位数

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "jobs": {
      "terms": {
        "field": "job.keyword",
        "size": 10
      },
      "aggs": {
        "avg_salary": {
          "avg": {
            "field": "salary"
          }
        }
      }
    },
    "min_salary_by_job": {
      "percentiles_bucket": {
        "buckets_path": "jobs>avg_salary"
      }
    }
  }
}

查询数据如下:
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "jobs" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Java Developer",
          "doc_count" : 7,
          "avg_salary" : {
            "value" : 25571.428571428572
          }
        },
        {
          "key" : "Javascript Developer",
          "doc_count" : 4,
          "avg_salary" : {
            "value" : 19250.0
          }
        },
        {
          "key" : "UI",
          "doc_count" : 3,
          "avg_salary" : {
            "value" : 21000.0
          }
        },
        {
          "key" : "DBA",
          "doc_count" : 2,
          "avg_salary" : {
            "value" : 25000.0
          }
        },
        {
          "key" : "Designer",
          "doc_count" : 2,
          "avg_salary" : {
            "value" : 20000.0
          }
        },
        {
          "key" : "Dev Manager",
          "doc_count" : 1,
          "avg_salary" : {
            "value" : 50000.0
          }
        },
        {
          "key" : "Manager",
          "doc_count" : 1,
          "avg_salary" : {
            "value" : 35000.0
          }
        }
      ]
    },
    "min_salary_by_job" : {
      "values" : {
        "1.0" : 19250.0,
        "5.0" : 19250.0,
        "25.0" : 21000.0,
        "50.0" : 25000.0,
        "75.0" : 35000.0,
        "95.0" : 50000.0,
        "99.0" : 50000.0
      }
    }
  }
}

Java实现:

   @RequestMapping(value = "/percentiles_bucket", method = RequestMethod.GET)
    @ApiOperation(value = "percentiles_bucket")
    public void percentilesQuery() throws Exception {
        // 定义请求对象
        SearchSourceBuilder builder = new SearchSourceBuilder();
        // 分桶字段
        String job = "job.keyword";
        // 分桶分组名称
        String terms_job = "term_job";
        // 按工作类别分桶
        AggregationBuilder job_terms = AggregationBuilders.terms(terms_job).field(job);

        // 聚合分析字段
        String salary = "salary";
        // 1.定义分组名称,查询平均工资
        String per_salary = "avg_salary";
        AggregationBuilder avgAggs = AggregationBuilders.avg(per_salary).field(salary);
        // 2.桶嵌套
        job_terms.subAggregation(avgAggs);
        // 因为是聚合统计,不需要返回查询数据的信息
        builder.size(0).aggregation(job_terms);
        // 定义聚合名称
        String bucket_name = "per_salary_by_job";
        // 定义buckets_path
        String buckets_path = terms_job + ">" + per_salary;
        // 添加聚合
        builder.aggregation(PipelineAggregatorBuilders.percentilesBucket(bucket_name, buckets_path));
        // 查询数据
        SearchResponse search = client.aggregationSearch(builder, INDEX_NAME);
        // 打印返回数据
        Map<String, Aggregation> map = search.getAggregations().asMap();
        ParsedPercentiles perData = (ParsedPercentiles) map.get(bucket_name);
        Iterator<Percentile> it = perData.iterator();
        while (it.hasNext()) {
            Percentile entry = it.next();
            System.out.println("key:" + entry.getPercent() + "; value:" + entry.getValue());
        }
    }
打印数据如下:
key:1.0; value:19250.0
key:5.0; value:19250.0
key:25.0; value:21000.0
key:50.0; value:25000.0
key:75.0; value:35000.0
key:95.0; value:50000.0
key:99.0; value:50000.0

总结: 同一组观测数据中某两个百分位数的差称为百分位数间距,它说明有百分数为这两个百分数差的观测数据的变异程度。例如上述测试在对每类工作平均工资的统计中P25.0=21000.0,P75=35000.0,则间距P75 - P25=14000.0,说明有50%人员的工资处在21000.0与35000.0之间,它们的变异度为 14000.0。因此百分位数间距也可作为描述数据分布离散程度的指标。

4.Parent Aggregation

4.1 Derivative Aggregation (求导聚合)

Derivative Aggregation是elasticsearch中的一个聚合方法,用于计算一个字段的导数(差值)。它可以用于分析时间序列数据,例如计算一个字段在给定时间间隔内的变化率。

使用Derivative Aggregation需要指定以下参数:

  • buckets_path:指定要计算导数的字段路径。可以是一个字段名称,也可以是通过点号连接的多个字段名称,用于指定嵌套字段的路径。

  • gap_policy:指定如何处理缺失的数据点。可以选择填充(fill)缺失的数据点为0,或者忽略(skip)缺失的数据点。

  • format:指定导数的输出格式。可以选择使用默认的格式或者自定义输出格式。

需要注意的是,Derivative Aggregation只能应用于数值类型字段,基于父聚合(只能是histogram或date_histogram类型)的某个权值,并且需要确保指定的字段是已经聚合(例如使用sum、avg等聚合方法)过的。

: 以5000为间隔,求导工资区间总和

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "agg_his": {
      "histogram": {
        "field": "salary",
        "interval": 5000,
        "min_doc_count": 0
      },
      "aggs": {
        "sum_sa": {
          "sum": {
            "field": "salary"
          }
        },
        "dvt" : {
          "derivative": {
            "buckets_path": "sum_sa",
            "gap_policy": "skip"
          }
        }
      }
    }
  }
}

返回数据如下:
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "agg_his" : {
      "buckets" : [
        {
          "key" : 5000.0,
          "doc_count" : 1,
          "sum_sa" : {
            "value" : 9000.0
          }
        },
        {
          "key" : 10000.0,
          "doc_count" : 0,
          "sum_sa" : {
            "value" : 0.0
          },
          "dvt" : {
            "value" : null
          }
        },
        {
          "key" : 15000.0,
          "doc_count" : 4,
          "sum_sa" : {
            "value" : 68000.0
          },
          "dvt" : {
            "value" : null
          }
        },
        {
          "key" : 20000.0,
          "doc_count" : 6,
          "sum_sa" : {
            "value" : 124000.0
          },
          "dvt" : {
            "value" : 56000.0
          }
        },
        {
          "key" : 25000.0,
          "doc_count" : 3,
          "sum_sa" : {
            "value" : 78000.0
          },
          "dvt" : {
            "value" : -46000.0
          }
        },
        {
          "key" : 30000.0,
          "doc_count" : 3,
          "sum_sa" : {
            "value" : 92000.0
          },
          "dvt" : {
            "value" : 14000.0
          }
        },
        {
          "key" : 35000.0,
          "doc_count" : 2,
          "sum_sa" : {
            "value" : 73000.0
          },
          "dvt" : {
            "value" : -19000.0
          }
        },
        {
          "key" : 40000.0,
          "doc_count" : 0,
          "sum_sa" : {
            "value" : 0.0
          },
          "dvt" : {
            "value" : null
          }
        },
        {
          "key" : 45000.0,
          "doc_count" : 0,
          "sum_sa" : {
            "value" : 0.0
          },
          "dvt" : {
            "value" : null
          }
        },
        {
          "key" : 50000.0,
          "doc_count" : 1,
          "sum_sa" : {
            "value" : 50000.0
          },
          "dvt" : {
            "value" : null
          }
        }
      ]
    }
  }
}

4.2 Cumulative_sum(累计求和)

在Elasticsearch中,cumulative_sum是一个聚合函数,用于计算给定字段的累计和。它将给定字段的每个桶中的值相加,并将结果保存在新的桶中。

注意:

  • 确保在使用cumulative_sum聚合时,它的父级聚合是一个直方图、日期直方图或自动日期直方图聚合,这样就可以避免这个报错。
  • 直方图(histogram)聚合的 min_doc_count 必须是 0 (默认是0)
  • 用于计算的权值必须是数值类型

: 按照年龄划分,10为间隔大小,累计求和平均工资

GET aggs_index/_search
{
  "size": 0,
  "aggs": {
    "his_age": {
      "histogram": {
        "field": "age",
        "interval": 10
      },
      "aggs": {
        "avg_salary": {
          "avg": {
            "field": "salary"
          }
        },
        "cus" : {
          "cumulative_sum": {
            "buckets_path": "avg_salary"
          }
        }
      }
    }
  }
}

返回数据如下:
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 20,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "his_age" : {
      "buckets" : [
        {
          "key" : 20.0,
          "doc_count" : 10,
          "avg_salary" : {
            "value" : 17900.0
          },
          "cus" : {
            "value" : 17900.0
          }
        },
        {
          "key" : 30.0,
          "doc_count" : 9,
          "avg_salary" : {
            "value" : 29444.444444444445
          },
          "cus" : {
            "value" : 47344.444444444445
          }
        },
        {
          "key" : 40.0,
          "doc_count" : 1,
          "avg_salary" : {
            "value" : 50000.0
          },
          "cus" : {
            "value" : 97344.44444444444
          }
        }
      ]
    }
  }
}


解释: 所谓的累计求和就是将给定字段的每个桶中的值相加,并将结果保存在新的桶中。上述测试样例,第一次得到key=20的平均值为17900.0,则累计值为17900.0 + 0 = 17900.0,第二次统计的key=30.0的平均值为29444.444444444445,则累计值为上一次的累计值17900.0 + 29444.444444444445(本次平均值)= 47344.444444444445(本次累计值),同理,依次往下累加。

Java实现:

   @RequestMapping(value = "/cumulative_sum", method = RequestMethod.GET)
    @ApiOperation(value = "Cumulative_sum(累计求和)")
    public void cumulativeQuery() throws Exception {
        // 定义请求对象
        SearchSourceBuilder builder = new SearchSourceBuilder();
        // 1.定义字段、分组名称
        String age = "age";                 // 年龄
        String salary = "salary";           // 工资字段
        String his_age = "his_age";         // 直方图分组名称
        String avg_salary = "avg_salary";   // 工资分组
        String bucket_name = "cus";         // 求和分组
        double interval = 10;               // 间隔
        HistogramAggregationBuilder histogramAgg = AggregationBuilders.histogram(his_age) // 按年龄划分
                .field(age)
                .interval(interval)
                .subAggregation(
                        // 2.平均工资
                        AggregationBuilders.avg(avg_salary).field(salary)
                )
                .subAggregation(
                        // 3.累计求和
                        PipelineAggregatorBuilders.cumulativeSum(bucket_name, avg_salary)
                );
        // 4.因为是聚合统计,不需要返回查询数据的信息
        builder.size(0).aggregation(histogramAgg);
        // 5.查询数据
        SearchResponse search = client.aggregationSearch(builder, INDEX_NAME);
        // 打印返回数据
        ParsedHistogram hisData = (ParsedHistogram) search.getAggregations().asMap().get(his_age);
        for (Histogram.Bucket his : hisData.getBuckets()) {
            Map<String, Aggregation> aggs = his.getAggregations().asMap();
            System.out.println("直方图:key=" + his.getKeyAsString()
                        + ";doc_count="     + his.getDocCount()
                        + ";avg_value="     + ((ParsedAvg) aggs.get(avg_salary)).getValue()
                        + ";cus_value="     + ((ParsedSimpleValue) aggs.get(bucket_name)).value());
        }
    }

打印数据如下:
直方图:key=20.0;doc_count=10;avg_value=17900.0;cus_value=17900.0
直方图:key=30.0;doc_count=9; avg_value=29444.444444444445;cus_value=47344.444444444445
直方图:key=40.0;doc_count=1; avg_value=50000.0;cus_value=97344.44444444444
  • 40
    点赞
  • 41
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论
Elasticsearch 中,可以使用聚合(Aggregation)实现对文档进行聚合统计,其中包括出现次数的统计。下面是一个示例: 假设我们有一个名为 "sales" 的索引,包含以下文档: ``` { "product": "A", "price": 10.0, "timestamp": "2021-08-01T10:00:00Z" } { "product": "B", "price": 15.0, "timestamp": "2021-08-01T10:05:00Z" } { "product": "A", "price": 12.0, "timestamp": "2021-08-01T10:10:00Z" } { "product": "C", "price": 20.0, "timestamp": "2021-08-01T10:15:00Z" } { "product": "A", "price": 8.0, "timestamp": "2021-08-01T10:20:00Z" } { "product": "B", "price": 18.0, "timestamp": "2021-08-01T10:25:00Z" } ``` 现在,我们想要统计每个产品出现的次数,可以使用以下聚合查询: ``` { "aggs": { "products": { "terms": { "field": "product" } } } } ``` 其中,"aggs" 是聚合查询的关键字,"products" 是我们给这个聚合起的名字,"terms" 表示我们要按照某个字段进行分组,"field" 指定了我们要按照哪个字段进行分组。 运行上述查询后,得到的结果如下: ``` { "aggregations": { "products": { "buckets": [ { "key": "A", "doc_count": 3 }, { "key": "B", "doc_count": 2 }, { "key": "C", "doc_count": 1 } ] } } } ``` 其中,"key" 表示产品名称,"doc_count" 表示该产品出现的次数。 如果想要对出现次数进行排序,可以使用以下聚合查询: ``` { "aggs": { "products": { "terms": { "field": "product", "order": { "_count": "desc" } } } } } ``` 其中,"order" 表示按照什么字段进行排序,"_count" 表示按照出现次数进行排序,"desc" 表示降序排列。 运行上述查询后,得到的结果如下: ``` { "aggregations": { "products": { "buckets": [ { "key": "A", "doc_count": 3 }, { "key": "B", "doc_count": 2 }, { "key": "C", "doc_count": 1 } ] } } } ``` 其中,产品 A 出现的次数最多,排在第一位。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

皮卡皮卡皮·

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值