Elasticsearch聚合

一、聚合分析简介

1. ES聚合分析是什么?

聚合分析是数据库中重要的功能特性,完成对一个查询的数据集中(解释:结果集类似于“关系型数据库”里的表)数据的聚合计算,如:找出某字段(或计算表达式的结果)的最大值、最小值,计算和、平均值等。ES作为搜索引擎兼数据库,同样提供了强大的聚合分析能力。

对一个数据集求最大、最小、和、平均值等指标的聚合,在ES中称为指标聚合 metric

而在关系型数据库中除了有聚合函数外,还可以对查询出的数据进行分组group by,再在组上进行指标聚合。在 ES 中group by 称为分桶,桶聚合 bucketing

ES中还提供了矩阵聚合(matrix)、管道聚合(pipleline),但还在完善中。
框架中包含许多构建块,有助于构建复杂的数据描述或摘要。聚合的基本结构如下所示 -

2. ES聚合分析查询的写法

在查询请求体中以aggregations节点按如下语法定义聚合分析:

"aggregations" : {
   "<aggregation_name>" : {  //聚合名称
      "<aggregation_type>" : {   //聚合类型
         <aggregation_body>  //聚合体:对哪些字段进行聚合
      }

      [,"meta" : { [<meta_data_body>] } ]?   //元数据
      [,"aggregations" : { [<sub_aggregation>]+ } ]?  //在聚合里面在定义子聚合
   }
}

说明:

aggregations 也可简写为 aggs
有以下不同类型的聚合,每个都有自己的目的

3. 聚合分析的值来源

聚合计算的值可以取字段的值,也可是脚本计算的结果

二、指标聚合

1. max min sum avg

示例1:查询所有客户中余额的最大值 max

POST /bank/_search?
{
  "size": 0, 
  "aggs": {
    "masssbalance": {  //提取出的最大值用“masssbalance”来表示
      "max": {  //对"balance"字段求最大值 
        "field": "balance"  
      }
    }
  }
}

执行结果:

{
  "took": 2080,
  "timed_out": false,
  "_shards": {
    "total": 5,   
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "masssbalance": {
      "value": 49989     //最大值为49989
    }
  }
}

示例2:查询年龄为24岁的客户中的余额最大值

POST /bank/_search?
{
  "size": 2, 
  "query": {
    "match": {   //指定查询条件,是年龄为24
      "age": 24
    }
  },
  "sort": [    //指定结果集排序规则
    {
      "balance": {   //按"balance"字段的降序来排列
        "order": "desc"
      }
    }
  ],
  "aggs": {
    "max_balance": {   //  提取的最大值用“max_balance”表示
      "max": {    //提取“balance”字段的最大值
        "field": "balance"
      }
    }
  }
}

执行结果:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 42,
    "max_score": null,
    "hits": [
      {
        "_index": "bank",
        "_type": "_doc",
        "_id": "697",
        "_score": null,
        "_source": {
          "account_number": 697,
          "balance": 48745,
          "firstname": "Mallory",
          "lastname": "Emerson",
          "age": 24,
          "gender": "F",
          "address": "318 Dunne Court",
          "employer": "Exoplode",
          "email": "malloryemerson@exoplode.com",
          "city": "Montura",
          "state": "LA"
        },
        "sort": [
        ]
      },
      {
        "_index": "bank",
        "_type": "_doc",
        "_id": "917",
        "_score": null,
        "_source": {
          "account_number": 917,
          "balance": 47782,
          "firstname": "Parks",
          "lastname": "Hurst",
          "age": 24,
          "gender": "M",
          "address": "933 Cozine Avenue",
          "employer": "Pyramis",
          "email": "parkshurst@pyramis.com",
          "city": "Lindcove",
          "state": "GA"
        },
        "sort": [
        ]
      }
    ]
  },
  "aggregations": {
    "max_balance": {
      "value": 48745    //提取的“balance”字段的最大值是48745
    }
  }
}

示例3:值来源于脚本,查询所有客户的平均年龄是多少,并对平均年龄加10

POST /bank/_search?size=0
{
  "aggs": {
    "avg_age": {   //提取客户的平均年龄的值用“avg_age”表示
      "avg": {  //"avg”指平均计算
        "script": {     
          "source": "doc.age.value"   //对age的值作平均计算
        }
      }
    },
    "avg_age10": {  //提取出客户的平均年龄并加10以后的值,用“avg_age10”表示
      "avg": {  
        "script": {
          "source": "doc.age.value + 10" //对age的值加上10以后做平均计算
        }
      }
    }
  }
}

执行结果如下:(会返回两个值,一个是客户的平均年龄,另一个是客户的平均年龄+10)

{
  "took": 86,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "avg_age": {
      "value": 30.171   //这是客户的平均年龄
    },
    "avg_age10": {
      "value": 40.171 //这是客户的平均年龄加上10以后的结果
    }
  }
}

示例4:指定field,在脚本中用_value 取字段的值 (sum运算)

  //索引名称是bank ,而_search代表搜索运算
POST /bank/_search?size=0
{
  "aggs": {
    "sum_balance": { // 求和的结果用“sum_balance”来表示
      "sum": { //“sum指求和运算
        "field": "balance", //指定提取的字段名是“balance”
        "script": {
            "source": "_value * 1.03"  //这里的_value 指“balance”字段的值
        }
      }
    }
  }
}

执行结果如下:

{
  "took": 165,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "sum_balance": {  //对balance字段进行求和运算之后的结果是26486282.11
      "value": 26486282.11
    }
  }
}

示例5:为没有值字段指定值。如未指定,缺失该字段值的文档将被忽略。

 //索引名称是bank,type名是_search
POST /bank/_search?size=0
{
  "aggs": {
    "avg_age": {  //提取的平均值用“avg_age”来表示
      "avg": {   //avg表示求平均数运算
        "field": "age",    
        "missing": 18   //如果结果集中的某条结果记录的字段里面没有age字段,那么就把该条记录当成age默认为18来进行运算
      }
    }
  }
}

2. 文档计数 count

示例1:统计银行索引bank下年龄为24的文档数量

//索引名为bank,type类型名为_doc,  而_count代表“计数运算”
POST /bank/_doc/_count 
{
  "query": {       //对结果集中age=24的做count计算,count:代表记录的条数
    "match": {
      "age" : 24   
    }
  }
}

执行结果:

{
  "count": 42,  //结果集中有42条记录中  age=24
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  }
}

3. Value count 统计某字段有值的文档数

示例1:

//索引名为bank, 而_search代表“搜索“”
POST /bank/_search?size=0
{
  "aggs": {
    "age_count": {  //最终的结果用“age_count”来表示
      "value_count": {  //value_count是“统计某字段有值的文档数”,意思就是字段不为空
        "field": "age"  //指定统计的字段名是“age”
      }
    }
  }
}

执行结果

{
  "took": 2022,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_count": { 
      "value": 1000  //age字段不为空的记录的条数为1000条
    }
  }
}

4. cardinality 值去重计数

示例1:

//索引名称为"bank",_search是“搜索”的意思
POST /bank/_search?size=0
{
  "aggs": {
    "age_count": { //最终的结果用“age_count”来表示
      "cardinality": {  //cardinality  值去重计数,去掉重复的记录之后还有多少条
        "field": "age"  //提取的字段是“age”
      }
    },
    "state_count": {  //最终的结果用“state_count”来表示
    "cardinality": {  //cardinality  值去重计数,去掉重复的记录之后还有多少条
        "field": "state.keyword"  //提取的字段是state.keyword
      }
    }
  }
}

说明:state的使用它的keyword版
执行结果如下:

{
  "took": 2074,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "state_count": {  //state_count的值为51
      "value": 51
    },
    "age_count": { //age_count的值为21
      "value": 21
    }
  }
}

5. stats 统计 count max min avg sum 5个值

执行结果里面会把 count max min avg sum 这5个值都打印出来
示例1:

//索引名称为bank  而_search是代表“搜索查询”的意思
POST /bank/_search?size=0
{
  "aggs": {
    "age_stats": {  //最后的结果用“age_stats"来表示
      "stats": { // stats 统计 count max min avg sum 5个值 ,执行结果里面会把 count max min avg sum 这5个值都打印出来
        "field": "age" //指定字段名为age
      }
    }
  }
}

执行结果

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_stats": {   //"age_stats"的值如下
      "count": 1000,
      "min": 20,
      "max": 40,
      "avg": 30.171,
      "sum": 30171
    }
  }
}

6. Extended stats

高级统计,比stats多4个统计结果: 平方和、方差、标准差、平均值加/减两个标准差的区间
示例1:

   //索引名称是bank,  而_search代表“搜索查询”的意思
POST /bank/_search?size=0
{
  "aggs": {
    "age_stats": {  //最终返回的结果用“age_stats”来表示
      "extended_stats": { //extended_stats代表“高级统计”,比stats多4个统计结果: 平方和、方差、标准差、平均值加/减两个标准差的区间
        "field": "age" //指定字段是age
      }
    }
  }
}

执行结果如下:

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_stats": { //"age_stats"的结果如下
      "count": 1000,
      "min": 20,
      "max": 40,
      "avg": 30.171,
      "sum": 30171,
      "sum_of_squares": 946393,  //平方和
      "variance": 36.10375899999996, //方差
      "std_deviation": 6.008640362012022, //标准差
      "std_deviation_bounds": {  //平均值加/减两个标准差的区间
        "upper": 42.18828072402404,
        "lower": 18.153719275975956
      }
    }
  }
}

7. Percentiles 占比百分位对应的值统计

对指定字段(脚本)的值按从小到大累计每个值对应的文档数的占比(占所有命中文档数的百分比),返回指定占比比例对应的值。默认返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值。(我们先看示例来进行理解
示例1:

//索引名称是bank,
POST /bank/_search?size=0
{
  "aggs": {
    "age_percents": { //返回的结果用“age_percents”来表示
      "percentiles": { //Percentiles 占比百分位对应的值统计
        "field": "age"  //指定的字段是age
      }
    }
  }
}

执行结果为:

{
  "took": 87,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_percents": { //age_percents的值如下
      "values": {
        "1.0": 20,
        "5.0": 21,
        "25.0": 25,
        "50.0": 31, //age<=31的文档数占总命中文档数的50%
        "75.0": 35.00000000000001,
        "95.0": 39,
        "99.0": 40
      }
    }
  }
}

示例2:指定分位值

//索引名称是bank,而_search是指“搜索查询操作”
POST /bank/_search?size=0
{
  "aggs": {
    "age_percents": {
      "percentiles": {
        "field": "age", //指定要查询的字段为age
        "percents" : [95, 99, 99.9] 
      }
    }
  }
}

执行结果:

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_percents": {  //age_percents的结果如下
      "values": {
        "95.0": 39,    
        "99.0": 40,  //age<=40的文档数占总命中文档数的99.0%
        "99.9": 40
      }
    }
  }
}

8. Percentiles rank 统计值小于等于指定值的文档占比

示例1:统计年龄小于25和30的文档的占比,和第7项相反

POST /bank/_search?size=0
{
  "aggs": {
    "gge_perc_rank": {
      "percentile_ranks": {
        "field": "age",
        "values": [
          25,30   //指定age的值,是指年龄值,而不是指百分之多少的占比比例        ]
      }
    }
  }
}

执行结果:

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "gge_perc_rank": {
      "values": {
        "25.0": 26.1,  //年龄小于25的文档占比为26.1%
        "30.0": 49.2   //年龄小于30的文档占比为49.2%
      }
    }
  }
}

9. Geo Bounds aggregation 求文档集中的地理位置坐标点的范围

(请自行百度了解)

10. Geo Centroid aggregation 求地理位置中心点坐标值

(请自行百度了解)

三、桶聚合

在这里插入图片描述

1. Terms Aggregation 根据字段值项分组聚合

示例1:

POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": { //最后返回的结果用“age_terms”来接收
      "terms": { //terms代表要根据字段值项分组聚合  类似于mysql的group by
        "field": "age"  //指定字段名是age
      }
    }
  }
}

执行结果:

{
  "took": 2000,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_terms": {  //返回的结果如下
      "doc_count_error_upper_bound": 0,  //文档计数的最大偏差值
      "sum_other_doc_count": 463,  //未返回的其他项的文档数
      "buckets": [  //默认情况下返回按doc_count(文档计数)从高到低的前10个分组:
        {
          "key": 31,   //年龄为31的有 61个
          "doc_count": 61   
        },
        {
          "key": 39,  //年龄为39的有60个   
          "doc_count": 60
        },
        {
          "key": 26,  //年龄为26的有59个
          "doc_count": 59
        },
        {
          "key": 32,
          "doc_count": 52
        },
        {
          "key": 35,
          "doc_count": 52
        },
        {
          "key": 36,
          "doc_count": 52
        },
        {
          "key": 22,
          "doc_count": 51
        },
        {
          "key": 28,
          "doc_count": 51
        },
        {
          "key": 33,
          "doc_count": 50
        },
        {
          "key": 34,
          "doc_count": 49
        }
      ]
    }
  }
}

size 指定返回多少个分组:
示例2:指定返回20个分组

//索引名称是bank,而_searh是指“搜索查询操作”
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": { //最后返回的结果用“age_terms”来接收
      "terms": {  //terms代表根据字段值项分组聚合 
        "field": "age",  //指定字段为age
        "size": 20  // size 指定返回多少个分组
      }
    }
  }
}

执行结果是:

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_terms": {  //最终返回的结果
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 35,
      "buckets": [  //下面有20个分组
        {
          "key": 31,   //年龄为31的记录条数是61
          "doc_count": 61
        },
        {
          "key": 39,   //年龄为39的记录条数是60
          "doc_count": 60
        },
        {
          "key": 26,  //年龄为26的记录条数是59
          "doc_count": 59
        },
        {
          "key": 32,
          "doc_count": 52
        },
        {
          "key": 35,
          "doc_count": 52
        },
        {
          "key": 36,
          "doc_count": 52
        },
        {
          "key": 22,
          "doc_count": 51
        },
        {
          "key": 28,
          "doc_count": 51
        },
        {
          "key": 33,
          "doc_count": 50
        },
        {
          "key": 34,
          "doc_count": 49
        },
        {
          "key": 30,
          "doc_count": 47
        },
        {
          "key": 21,
          "doc_count": 46
        },
        {
          "key": 40,
          "doc_count": 45
        },
        {
          "key": 20,
          "doc_count": 44
        },
        {
          "key": 23,
          "doc_count": 42
        },
        {
          "key": 24,
          "doc_count": 42
        },
        {
          "key": 25,
          "doc_count": 42
        },
        {
          "key": 37,
          "doc_count": 42
        },
        {
          "key": 27,
          "doc_count": 39
        },
        {
          "key": 38,
          "doc_count": 39
        }
      ]
    }
  }
}

示例3:每个分组上显示偏差值

//索引名称是bank,而_search是指“搜索查询的操作”
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": { //terms代表根据字段值项分组聚合 
        "field": "age",  //指定字段是age
        "size": 5,  // size 指定返回多少个分组:
        "shard_size": 20, //shard_size 指定每个分片上返回多少个分组
        "show_term_doc_count_error": true
      }
    }
  }
}

执行结果在下图的下方: 可以看到最后结果只有5条数据。 如果不懂的话,我们先来了解一下什么是“shard_size
在这里插入图片描述
shard_size 的默认值为:
①当索引只有一个分片时:shard_size默认值就等于 size
②当索引有多分片时,:shard_size默认值等于size * 1.5 + 10

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_terms": {
      "doc_count_error_upper_bound": 25,
      "sum_other_doc_count": 716,
      "buckets": [   //上面已经解释了为什么返回的结果只有5条
        {
          "key": 31,
          "doc_count": 61,
          "doc_count_error_upper_bound": 0
        },
        {
          "key": 39,
          "doc_count": 60,
          "doc_count_error_upper_bound": 0
        },
        {
          "key": 26,
          "doc_count": 59,
          "doc_count_error_upper_bound": 0
        },
        {
          "key": 32,
          "doc_count": 52,
          "doc_count_error_upper_bound": 0
        },
        {
          "key": 36,
          "doc_count": 52,
          "doc_count_error_upper_bound": 0
        }
      ]
    }
  }
}

order 指定分组的排序
示例5:根据文档计数排序

//索引名称是bank,而_search代表“搜索查询的操作”
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": { //最后的结果用"age_terms"来做接收
      "terms": { // term 根据字段值项分组聚合 
        "field": "age",  //指定搜索的字段为age
        "order" : { "_count" : "asc" } //查询出的最终结果集根据文档的数量的升序来排列
      }
    }
  }
}

执行结果如下:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 584,
      "buckets": [ //查询出的最终结果集根据文档的数量,即doc_count的升序来排列
        {
          "key": 29,
          "doc_count": 35
        },
        {
          "key": 27,
          "doc_count": 39
        },
        {
          "key": 38,
          "doc_count": 39
        },
        {
          "key": 23,
          "doc_count": 42
        },
        {
          "key": 24,
          "doc_count": 42
        },
        {
          "key": 25,
          "doc_count": 42
        },
        {
          "key": 37,
          "doc_count": 42
        },
        {
          "key": 20,
          "doc_count": 44
        },
        {
          "key": 40,
          "doc_count": 45
        },
        {
          "key": 21,
          "doc_count": 46
        }
      ]
    }
  }
}

示例6:根据分组值排序

//索引名称是bank ,而_search是指“搜索查询的操作”
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {  //terms代表根据字段值项分组聚合 
        "field": "age",  //指定搜索的字段是age
        "order" : { "_key" : "asc" }  //这里的_key其实就是代表age的值, 根据年龄的大小升序排列结果集
      }
    }
  }
}

执行结果是

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 549,
      "buckets": [  //根据年龄的大小升序排列结果集
        {
          "key": 20,
          "doc_count": 44
        },
        {
          "key": 21,
          "doc_count": 46
        },
        {
          "key": 22,
          "doc_count": 51
        },
        {
          "key": 23,
          "doc_count": 42
        },
        {
          "key": 24,
          "doc_count": 42
        },
        {
          "key": 25,
          "doc_count": 42
        },
        {
          "key": 26,
          "doc_count": 59
        },
        {
          "key": 27,
          "doc_count": 39
        },
        {
          "key": 28,
          "doc_count": 51
        },
        {
          "key": 29,
          "doc_count": 35
        }
      ]
    }
  }
}

示例7:取分组指标值排序

POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {
      "terms": {   //terms代表根据字段值项分组聚合 
        "field": "age", //指定搜索的字段是age
        "order": {
          "max_balance": "asc" //根据max_balance的值升序排列结果集,这里的max_balance在下面有定义
        }
      },
      "aggs": {
        "max_balance": {
          "max": {
            "field": "balance"
          }
        },
        "min_balance": {
          "min": {
            "field": "balance"
          }
        }
      }
    }
  }
}

执行的结果如下:

{
  "took": 28,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_terms": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 511,
      "buckets": [
        {
          "key": 27,
          "doc_count": 39,
          "min_balance": {
            "value": 1110
          },
          "max_balance": {
            "value": 46868
          }
        },
        {
          "key": 39,
          "doc_count": 60,
          "min_balance": {
            "value": 3589
          },
          "max_balance": {
            "value": 47257
          }
        },
        {
          "key": 37,
          "doc_count": 42,
          "min_balance": {
            "value": 1360
          },
          "max_balance": {
            "value": 47546
          }
        },
        {
          "key": 32,
          "doc_count": 52,
          "min_balance": {
            "value": 1031
          },
          "max_balance": {
            "value": 48294
          }
        },
        {
          "key": 26,
          "doc_count": 59,
          "min_balance": {
            "value": 1447
          },
          "max_balance": {
            "value": 48466
          }
        },
        {
          "key": 33,
          "doc_count": 50,
          "min_balance": {
            "value": 1314
          },
          "max_balance": {
            "value": 48734
          }
        },
        {
          "key": 24,
          "doc_count": 42,
          "min_balance": {
            "value": 1011
          },
          "max_balance": {
            "value": 48745
          }
        },
        {
          "key": 31,
          "doc_count": 61,
          "min_balance": {
            "value": 2384
          },
          "max_balance": {
            "value": 48758
          }
        },
        {
          "key": 34,
          "doc_count": 49,
          "min_balance": {
            "value": 3001
          },
          "max_balance": {
            "value": 48997
          }
        },
        {
          "key": 29,
          "doc_count": 35,
          "min_balance": {
            "value": 3596
          },
          "max_balance": {
            "value": 49119
          }
        }
      ]
    }
  }
}

示例8:筛选分组-正则表达式匹配值

GET /_search
{
    "aggs" : {
        "tags" : { //最终返回的结果用“tags”来做接收
            "terms" : { //terms代表根据字段值项分组聚合 
                "field" : "tags", //指定搜索的字段是tags
                "include" : ".*sport.*", //字段tags值包含sport
                "exclude" : "water_.*"  //字段tags值不以water_开头
            }
        }
    }
}

示例9:筛选分组-指定值列表

GET /_search
{
    "aggs" : {
        "JapaneseCars" : { //最终返回的结果用“JapaneseCars”来做接收
             "terms" : {   //terms代表根据字段值项分组聚合 
                 "field" : "make", //指定要搜索的字段是“make”
                 "include" : ["mazda", "honda"]  //该字段的值包含哪些
             }
         },
        "ActiveCarManufacturers" : { //最终返回的结果用“ActiveCarManufacturers”来做接收
             "terms" : {  //terms代表根据字段值项分组聚合 
                 "field" : "make", //指定要搜索的字段是“make”
                 "exclude" : ["rover", "jensen"]  //字段的值不包含哪些
             }
         }
    }
}

2. filter Aggregation 对满足过滤查询的文档进行聚合计算

在查询命中的文档中选取符合过滤条件的文档进行聚合,先过滤再聚合
示例1:

//索引名称是bank   而_search是指“搜索查询的操作 ”
POST /bank/_search?size=0
{
  "aggs": {
    "age_terms": {      //最后过滤之后的结果用“age_terms”来接收
      "filter": {"match":{"gender":"F"}},  //定义了一个过滤器,过滤条件为"gender":"F" ,对下面得到的结果做过滤处理
      "aggs": {  //代表这是一个聚合
        "avg_age": { //最终得到的平均值用“avg_age”来接收
          "avg": {//求平均值运算
            "field": "age"  //指定的字段为age
          }
        }
      }
    }
  }
}

执行结果:

{
  "took": 163,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_terms": {
      "doc_count": 493, //查询到的文档数
      "avg_age": {
        "value": 30.3184584178499  //平均年龄
      }
    }
  }
}

3. Filters Aggregation 多个过滤组聚合计算

示例1:

准备数据:

PUT /logs/_doc/_bulk?refresh
{"index":{"_id":1}}
{"body":"warning: page could not be rendered"}
{"index":{"_id":2}}
{"body":"authentication error"}
{"index":{"_id":3}}
{"body":"warning: connection timed out"}

获取组合过滤后聚合的结果:

 //索引名称是logs,    _search代表搜索查询的意思 
GET logs/_search
{
  "size": 0,
  "aggs": {
    "messages": {
      "filters": {
        "filters": {
          "errors": {  //用来接收过滤后的结果
            "match": {
              "body": "error"  //过滤条件是body里面有"error"字符串
            }
          },
          "warnings": { //用来接收过滤后的结果
            "match": {    
              "body": "warning"  //过滤条件是body里面有"warning"字符串
            }
          }
        }
      }
    }
  }
}

执行结果为:

{
  "took": 18,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "messages": {
      "buckets": {
        "errors": {  
          "doc_count": 1   //body里面有"error"字符串的文档数是1
        },
        "warnings": {
          "doc_count": 2  //body里面有"warn"字符串的文档数是2
        }
      }
    }
  }
}

示例2:为其他值组指定key(解释:不满足任何过滤条件的文档就放在“其他组”里面

GET logs/_search
{
  "size": 0,
  "aggs": {
    "messages": {
      "filters": {
        "other_bucket_key": "other_messages", //解释:不满足任何过滤条件的文档就放在“其他组”里面
        "filters": {
          "errors": {
            "match": {
              "body": "error"
            }
          },
          "warnings": {
            "match": {
              "body": "warning"
            }
          }
        }
      }
    }
  }
}

执行结果为:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "messages": {
      "buckets": {
        "errors": {
          "doc_count": 1
        },
        "warnings": {
          "doc_count": 2
        },
        "other_messages": {  //不满足任何过滤条件的文档就放在“其他组”里面
          "doc_count": 0
        }
      }
    }
  }
}

4. Range Aggregation 范围分组聚合

示例1:

//索引名称是bank,    _search代表进行的是搜索查询的操作
POST /bank/_search?size=0
{
  "aggs": {
    "age_range": {
      "range": {
        "field": "age",  //指定搜索的字段为age
        "ranges": [ 
          {              //age值大于0,小于25
            "to": 25 
          },
          {
            "from": 25,  //age值大于25,小于35
            "to": 35
          },
          {
            "from": 35   //age值大于35
          }
        ]
      },
      "aggs": {  //aggs是”聚合“的作用
        "bmax": {  //最终计算的结果用“bmax”来做接收
          "max": {  //max表示进行计算最大值
            "field": "balance"   //计算的字段是balance
          }
        }
      }
    }
  }
}

执行结果如下:

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_range": {
      "buckets": [
        {
          "key": "*-25.0",
          "to": 25,
          "doc_count": 225,
          "bmax": {    //年龄在0到25岁之间的员工的balance字段最大值是49587
            "value": 49587
          }
        },
        {
          "key": "25.0-35.0",
          "from": 25,
          "to": 35,
          "doc_count": 485,
          "bmax": {  
            "value": 49795
          }
        },
        {
          "key": "35.0-*",
          "from": 35,
          "doc_count": 290,
          "bmax": {
            "value": 49989
          }
        }
      ]
    }
  }
}

示例2:为组指定key

//索引名称是bank   _search是指进行搜索查询操作
POST /bank/_search?size=0
{
  "aggs": {
    "age_range": {
      "range": {
        "field": "age",  //指定搜索查询的字段是age
        "keyed": true,   //当keyed为true时,让我们可以自己指定key的值
        "ranges": [
          {
            "to": 25,  //指定从0到25段的key值是Ld,  如果我们不了解这里的key的作用,我们可以看执行结果
            "key": "Ld"   
          },
          {
            "from": 25,
            "to": 35,
            "key": "Md"
          },
          {
            "from": 35,
            "key": "Od"
          }
        ]
      }
    }
  }
}

执行结果:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "age_range": {
      "buckets": {
        "Ld": {  //这里的Ld就是我们之前代码里面自定义的key值
          "to": 25,
          "doc_count": 225
        },
        "Md": {  //这里的Md就是我们之前代码里面自定义的key值
          "from": 25,
          "to": 35,
          "doc_count": 485
        },
        "Od": { //这里的Od就是我们之前代码里面自定义的key值
          "from": 35,
          "doc_count": 290
        }
      }
    }
  }
}

5. Date Range Aggregation 时间范围分组聚合

示例1:


POST /bank/_search?size=0
{
  "aggs": {
    "range": {
      "date_range": {
        "field": "date",  //指定搜索的字段为date
        "format": "MM-yyy",
        "ranges": [
          {               //date值在当前时间之间的
            "to": "now-10M/M"
          },
          {               //date值在当前时间之后的
            "from": "now-10M/M"
          }
        ]
      }
    }
  }
}

6. Date Histogram Aggregation 时间直方图(柱状)聚合

就是按天、月、年等进行聚合统计。可按 year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s) 间隔聚合或指定的时间间隔聚合。

//索引名称是bank  _search是指进行搜索查询的操作
POST /bank/_search?size=0
{
  "aggs": {  //表示这是一个聚合
    "sales_over_time": {  //最终返回的结果用“sales_over_time”来做接收
      "date_histogram": {  //date_histogram代表 时间直方图(柱状)聚合
        "field": "date",   //搜索的字段是date
        "interval": "month"  //以“月”为单位
      }
    }
  }
}

7. Missing Aggregation 缺失值的桶聚合

(意思就是把那些“没有age字段 ”的文档给查询出来)

//索引名称是bank   _search代表进行搜索查询的操作
POST /bank/_search?size=0
{
    "aggs" : {  //表示这是一个聚合
        "account_without_a_age" : {  //最终的聚合结果用“account_without_a_age”来做接收
            "missing" : { "field" : "age" } //指定的字段为age  
        }
    }
}

8. Geo Distance Aggregation 地理距离分区聚合

(请自行百度了解)

本文转载自:https://www.cnblogs.com/leeSmall/p/9215909.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值