Elasticsearch 7.x 常用指标聚合、桶聚合搜索RESTful API

本文链接：https://blog.csdn.net/abc123lzf/article/details/103325034

Elasticsearch的聚合搜索用于对数据做一些复杂的分析统计，主要分为指标聚合、桶聚合、管道聚合、矩阵聚合。其中指标聚合、桶聚合最常使用。

本文测试数据采用官方测试数据库shakespeare（莎士比亚），可在Elasticsearch官网中下载到。此外本文内容均参考官方文档内容。

1 指标聚合

1.1 Max Aggregation

Max Aggregation用于查找最大值，例如查找shakespeare索引中line_id最大的文档：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "max_line_id": {
      "max": {
        "field": "line_id"
      }
    }
  }
}

max_line_id为结果名，也可以为其它字符串，max_line_id下面的键为聚合方式，其max代表为Max Aggregation聚合，并需要指定field为需要进行聚合的文档字段。
类似于MySQL中的select max(line_id) from shakespeare。
查询结果为：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "max_line_id" : {
      "value" : 111396.0
    }
  }
}

其查询结果位于aggregations中，即最大值为111396。

1.2 Min Aggregation

和Max Aggregation相反，Min Aggregation用于查找最小值，例如查找shakespeare索引中line_id最小的文档：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "min_line_id": {
      "min": {
        "field": "line_id"
      }
    }
  }
}

最后查询结果同样在aggregations中。

1.3 Avg Aggregation

Avg Aggregation用于计算平均数，例如计算shakespeare索引中line_id字段的平均数：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "avg_line_id": {
      "avg": {
        "field": "line_id"
      }
    }
  }
}

查询结果同样在aggregations中。

1.4 Sum Aggregation

Sum Aggregation用于计算总和，例如计算shakespeare索引中line_id字段的平均数：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "sum_line_id": {
      "sum": {
        "field": "line_id"
      }
    }
  }
}

1.5 Cardinality Aggregation

Cardinality Aggregation用于基数统计，其作用是先执行类似SQL中的distinct去重操作，然后统计其集合长度。例如下列查询中会统计出所有角色的数量：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "player_sum": {
      "cardinality": {
        "field": "play_name.keyword"
      }
    }
  }
}

查询结果：

{
  # 省略其它字段
  "aggregations" : {
    "player_sum" : {
      "value" : 36
    }
  }
}

表示有36个角色。

1.6 Stats Aggregation

Stats Aggregation即基本统计，会返回count、max、min、avg、sum统计数据，例如查询line_id相关数据：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "line_id_stats": {
      "stats": {
        "field": "line_id"
      }
    }
  }
}

查询结果：

{
  # 省略其它字段
  "aggregations" : {
    "line_id_stats" : {
      "count" : 110486,
      "min" : 4.0,
      "max" : 111396.0,
      "avg" : 55715.89386890647,
      "sum" : 6.15582625E9
    }
  }
}

1.7 Extended Stats Aggregation

Extended Stats Aggregation比Stats Aggregation多了4个字段：平方和、方差、标准差、平均值加减两个标准差的区间，例如：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "line_id_stats": {
      "extended_stats": {
        "field": "line_id"
      }
    }
  }
}

查询结果：

{
  # 省略其它字段
  "aggregations" : {
    "line_id_stats" : {
      "count" : 110486,
      "min" : 4.0,
      "max" : 111396.0,
      "avg" : 55715.89386890647,
      "sum" : 6.15582625E9,
      "sum_of_squares" : 4.57201930511864E14,
      "variance" : 1.0338374861198297E9,
      "std_deviation" : 32153.34331169668,
      "std_deviation_bounds" : {
        "upper" : 120022.58049229984,
        "lower" : -8590.792754486894
      }
    }
  }
}

1.8 Percentiles Aggregation

Percentiles Aggregation用于百分位统计，具体操作是将某个字段的数据从大到小排序，并计算相应的累计百分位，某一百分位所对应的数据的值就是这一百分位的百分位数。例如：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "line_id_percent": {
      "percentiles": {
        "field": "line_id",
        "percents": [1, 5, 25, 50, 75, 95, 99]
      }
    }
  }
}

查询结果：

{
  # 省略其它字段
  "aggregations" : {
    "line_id_percent" : {
      "values" : {
        "1.0" : 1115.3600000000001,
        "5.0" : 5575.834045307443,
        "25.0" : 27887.286615736997,
        "50.0" : 55711.257765161325,
        "75.0" : 83561.89545235902,
        "95.0" : 105830.47105865781,
        "99.0" : 110287.32171428572
      }
    }
  }
}

1.9 Value Count Aggregation

Value Count Aggregation可按字段统计文档数量，例如下面统计包含line_id字段的文档数量：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "line_id_count": {
      "value_count": {
        "field": "line_id"
      }
    }
  }
}

查询结果：

{
  # 省略其它字段
  "aggregations" : {
    "line_id_count" : {
      "value" : 110486
    }
  }
}

2 桶聚合

桶聚合类似于SQL中的GROUP BY，即遍历文档内容，根据的文档内容将其放到不同的桶中。

2.1 Terms Aggregation

Terms Aggregation用于分组聚合，例如根据play_name字段对不同的文档进行分组，然后统计每组文档的数量，相当于select count(*) from shakespeare group by play_name。例如：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "per_player": {
      "terms": {
        "field": "play_name.keyword",
        "size": 10
      }
    }
  }
}

field相当于GROUP BY后面指定的字段，size字段表示仅查询出数量前10的桶。
查询结果：

{
  # 省略其它字段
  "aggregations" : {
    "per_player" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 72631,
      "buckets" : [
        {
          "key" : "Hamlet",
          "doc_count" : 4219
        },
        {
          "key" : "Coriolanus",
          "doc_count" : 3958
        },
        {
          "key" : "Cymbeline",
          "doc_count" : 3927
        },
        {
          "key" : "Richard III",
          "doc_count" : 3911
        },
        {
          "key" : "Antony and Cleopatra",
          "doc_count" : 3815
        },
        {
          "key" : "Othello",
          "doc_count" : 3742
        },
        {
          "key" : "King Lear",
          "doc_count" : 3735
        },
        {
          "key" : "Troilus and Cressida",
          "doc_count" : 3682
        },
        {
          "key" : "A Winters Tale",
          "doc_count" : 3469
        },
        {
          "key" : "Henry VIII",
          "doc_count" : 3397
        }
      ]
    }
  }
}

2.2 Filter Aggregation

Filter Aggregation为过滤器聚合搜索，可以把符合过滤器中条件的文档划分到不同的桶中。例如：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "per_player": {
      "filter": {
        "term": {
          "text_entry": "apple"
        }
      },
      "aggs": {
        "player": {
          "terms": {
            "field": "play_name.keyword",
            "size": 10
          }
        }
      }
    }
  }
}

上述查询可以找出text_entry包含单词apple的文档，并按play_name进行分组统计。
查询结果：

{
  # 省略其它字段
  "aggregations" : {
    "per_player" : {
      "doc_count" : 10,
      "player" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "Taming of the Shrew",
            "doc_count" : 2
          },
          {
            "key" : "Twelfth Night",
            "doc_count" : 2
          },
          {
            "key" : "A Midsummer nights dream",
            "doc_count" : 1
          },
          {
            "key" : "Henry IV",
            "doc_count" : 1
          },
          {
            "key" : "King Lear",
            "doc_count" : 1
          },
          {
            "key" : "Loves Labours Lost",
            "doc_count" : 1
          },
          {
            "key" : "Merchant of Venice",
            "doc_count" : 1
          },
          {
            "key" : "The Tempest",
            "doc_count" : 1
          }
        ]
      }
    }
  }
}

2.3 Filters Aggregation

Filters Aggregation相比Filter Aggregation，可以使用多个过滤器。例如：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "per_player": {
      "filters": {
        "filters": [
          {"match": { "text_entry": "apple" } }
        ]
      }, 
      "aggs": {
        "player": {
          "terms": {
            "field": "play_name.keyword",
            "size": 10
          }
        }
      }
    }
  }
}

filters数组中可以定义多个过滤器。

2.4 Range Aggregation

Range Aggregation是范围聚合，用于反馈数据的分布情况，例如对line_id按照0至10000，10000到50000，50000以上进行范围聚合，结果如下：

GET /shakespeare/_search
{
  "size": 0,
  "aggs": {
    "id_range": {
      "range": {
        "field": "line_id",
        "ranges": [
          { "from": 0, "to": 10000 },
          { "from": 10000, "to": 50000},
          { "from": 50000 }
        ]
      }
    }
  }
}

查询结果：

{
  # 省略其它字段
  "aggregations" : {
    "id_range" : {
      "buckets" : [
        {
          "key" : "0.0-10000.0",
          "from" : 0.0,
          "to" : 10000.0,
          "doc_count" : 9909
        },
        {
          "key" : "10000.0-50000.0",
          "from" : 10000.0,
          "to" : 50000.0,
          "doc_count" : 39664
        },
        {
          "key" : "50000.0-*",
          "from" : 50000.0,
          "doc_count" : 60913
        }
      ]
    }
  }
}