Elasticsearch聚合定义
聚合有助于基于搜索查询提供聚合数据。 它基于称为聚合的简单构建块,可以组合以构建复杂的数据。
基本语法结构如下:
"aggregations" : {
"<aggregation_name>" : {
"<aggregation_type>" : {
<aggregation_body>
}
[,"meta" : { [<meta_data_body>] } ]?
[,"aggregations" : { [<sub_aggregation>]+ } ]?
}
[,"<aggregation_name_2>" : { ... } ]*
}
Elasticsearch聚合分类
es将聚合分析主要分为如下4类:
Bucket
:分桶类型,类似SQL中的GROUP BY语法Metric
:指标分析类型,如计算最大值、最小值、平均值等等Pipeline
:管道分析类型,基于上一级的聚合分析结果进行在分析Matrix
:矩阵分析类型
先准备数据:
POST /cars/transactions/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "red", "make" : "honda", "sold" : "2014-10-28" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2014-05-18" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2014-07-02" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2014-08-19" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "honda", "sold" : "2014-11-05" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2014-01-01" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2014-02-12" }
Metric聚合分析
Metric聚合分析分为单值分析和多值分析两类:
- 单值分析,只输出一个分析结果
min,max,avg,sum
cardinality
多值分析,输出多个分析结果
stats,extended stats
percentile,percentile rank
top hits
min,max,avg,sum
样例:
get /cars/transactions/_search
{
"size": 0,//不返回文档列表
"aggs":{
"price_max":{
"max": {
"field": "price"
}
},
"price_min":{
"min": {
"field": "price"
}
},
"avg_price":{
"avg":{
"field":"price"
}
},
"sum_price":{
"sum":{
"field":"price"
}
}
}
}
cardinality
ardinality
:意为集合的势,或者基数,是指不同数值的个数,类似SQL中的distinct count
概念。
样例:
get /cars/transactions/_search
{
"size": 0,//不返回文档列表
"aggs":{
"count_of_make":{
"cardinality": {
"field": "make.keyword"
}
}
}
}
stats,extended stats
stats
:返回一系列数值类型的统计值,包含min、max、avg、sum
和count
extended stats
:对stats的扩展,包含了更多的统计数据,比如方差、标准差等
样例:
get /cars/transactions/_search
{
"size": 0,
"aggs":{
"stats_price":{
"stats": {
"field": "price"
}
}
}
}
Percentile,Percentile Rank
Percentile
: 百分位数统计。Percentile Rank
: 百分位数统计
Top Hits
Top Hits
: 一般用于分桶后获取该桶内匹配的顶部文档列表,即详情数据
例如根据汽车厂商进行分组,并取每组价格最高的两条transactions(交易)数据
get /cars/transactions/_search
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "make.keyword"
},
"aggs": {
"top_data": {
"top_hits": {
"size": 2,
"_source": [
"price",
"color",
"make"
],
"sort": [
{
"price": {
"order": "desc"
}
}
]
}
}
}
}
}
}
结果:
#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_color" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "honda",
"doc_count" : 3,
"top_data" : {
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "cars",
"_type" : "transactions",
"_id" : "js_K120B6sb1aJIMtJKa",
"_score" : null,
"_source" : {
"color" : "red",
"price" : 20000,
"make" : "honda"
},
"sort" : [
20000
]
},
{
"_index" : "cars",
"_type" : "transactions",
"_id" : "ks_K120B6sb1aJIMtJKa",
"_score" : null,
"_source" : {
"color" : "red",
"price" : 20000,
"make" : "honda"
},
"sort" : [
20000
]
}
]
}
}
},
{
"key" : "ford",
"doc_count" : 2,
"top_data" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "cars",
"_type" : "transactions",
"_id" : "j8_K120B6sb1aJIMtJKa",
"_score" : null,
"_source" : {
"color" : "green",
"price" : 30000,
"make" : "ford"
},
"sort" : [
30000
]
},
{
"_index" : "cars",
"_type" : "transactions",
"_id" : "lM_K120B6sb1aJIMtJKa",
"_score" : null,
"_source" : {
"color" : "blue",
"price" : 25000,
"make" : "ford"
},
"sort" : [
25000
]
}
]
}
}
},
{
"key" : "toyota",
"doc_count" : 2,
"top_data" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "cars",
"_type" : "transactions",
"_id" : "kM_K120B6sb1aJIMtJKa",
"_score" : null,
"_source" : {
"color" : "blue",
"price" : 15000,
"make" : "toyota"
},
"sort" : [
15000
]
},
{
"_index" : "cars",
"_type" : "transactions",
"_id" : "kc_K120B6sb1aJIMtJKa",
"_score" : null,
"_source" : {
"color" : "green",
"price" : 12000,
"make" : "toyota"
},
"sort" : [
12000
]
}
]
}
}
},
{
"key" : "bmw",
"doc_count" : 1,
"top_data" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "cars",
"_type" : "transactions",
"_id" : "k8_K120B6sb1aJIMtJKa",
"_score" : null,
"_source" : {
"color" : "red",
"price" : 80000,
"make" : "bmw"
},
"sort" : [
80000
]
}
]
}
}
}
]
}
}
}
Bucketing聚合
基于检索构成了逻辑文档组,满足特定规则的文档放置到一个桶里,每一个桶关联一个key。
类比Mysql中的group by操作,
最简单的分桶策略,直接按照term来分桶,如果是text
类型,则按照分词后的结果分桶
get /cars/transactions/_search
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color.keyword"
}
},
"group_by_make": {
"terms": {
"field": "make.keyword"
}
}
}
}
注意点:如果不加.keyword会报错:
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [color] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."
}
],
...
Elasticsearch 5.x 版本开始支持通过text的内置字段keyword作精确查询、聚合分析.
Range,Date Range
Range
: 通过制定数值的范围来设定分桶规则Date Range
: 通过指定日期的范围来设定分桶规则
样例:
get /cars/transactions/_search
{
"size": 0,
"aggs": {
"range_price": {
"range": {
"field": "price",
"ranges": [
{
"to": 20000
},
{
"from": 20000,
"to": 30000
},
{
"from":50000
}
]
}
}
}
}
结果:
#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"range_price" : {
"buckets" : [
{
"key" : "*-20000.0",
"to" : 20000.0,
"doc_count" : 3
},
{
"key" : "20000.0-30000.0",
"from" : 20000.0,
"to" : 30000.0,
"doc_count" : 3
},
{
"key" : "50000.0-*",
"from" : 50000.0,
"doc_count" : 1
}
]
}
}
}
Historgram,Date Histogram
Historgram
: 直方图,以固定间隔的策略来分割数据Date Histogram
: 针对日期的直方图或者柱状图,是时序分析中常用的聚合分析类型
示例:
get /cars/transactions/_search
{
"size": 0,
"aggs": {
"hist_price": {
"histogram": {
"field": "price",
"interval": 20000,
"extended_bounds":
{
"min": 10000,
"max": 80000
}
}
}
}
}
结果:
#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"hist_price" : {
"buckets" : [
{
"key" : 0.0,
"doc_count" : 3
},
{
"key" : 20000.0,
"doc_count" : 4
},
{
"key" : 40000.0,
"doc_count" : 0
},
{
"key" : 60000.0,
"doc_count" : 0
},
{
"key" : 80000.0,
"doc_count" : 1
}
]
}
}
}
Bucket + Metric聚合分析
Bucket聚合分析允许通过子分析来进一步进行分析,该分析可以是Bucket也可以是Metric,这也使得es的聚合分析能力变得异常强大
(1)分桶之后再分桶
get /cars/transactions/_search
{
"size": 0,
"aggs": {
"group_by_make": {
"terms": {
"field": "make.keyword"
},
"aggs": {
"range_price": {
"range": {
"field": "price",
"ranges": [
{
"to": 20000
},
{
"from": 20000,
"to": 30000
},
{
"from": 50000
}
]
}
}
}
}
}
}
结果:
#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_make" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "honda",
"doc_count" : 3,
"range_price" : {
"buckets" : [
{
"key" : "*-20000.0",
"to" : 20000.0,
"doc_count" : 1
},
{
"key" : "20000.0-30000.0",
"from" : 20000.0,
"to" : 30000.0,
"doc_count" : 2
},
{
"key" : "50000.0-*",
"from" : 50000.0,
"doc_count" : 0
}
]
}
},
{
"key" : "ford",
"doc_count" : 2,
"range_price" : {
"buckets" : [
{
"key" : "*-20000.0",
"to" : 20000.0,
"doc_count" : 0
},
{
"key" : "20000.0-30000.0",
"from" : 20000.0,
"to" : 30000.0,
"doc_count" : 1
},
{
"key" : "50000.0-*",
"from" : 50000.0,
"doc_count" : 0
}
]
}
},
{
"key" : "toyota",
"doc_count" : 2,
"range_price" : {
"buckets" : [
{
"key" : "*-20000.0",
"to" : 20000.0,
"doc_count" : 2
},
{
"key" : "20000.0-30000.0",
"from" : 20000.0,
"to" : 30000.0,
"doc_count" : 0
},
{
"key" : "50000.0-*",
"from" : 50000.0,
"doc_count" : 0
}
]
}
},
{
"key" : "bmw",
"doc_count" : 1,
"range_price" : {
"buckets" : [
{
"key" : "*-20000.0",
"to" : 20000.0,
"doc_count" : 0
},
{
"key" : "20000.0-30000.0",
"from" : 20000.0,
"to" : 30000.0,
"doc_count" : 0
},
{
"key" : "50000.0-*",
"from" : 50000.0,
"doc_count" : 1
}
]
}
}
]
}
}
}
(2)分桶后进行数据分析
get /cars/transactions/_search
{
"size": 0,
"aggs": {
"group_by_make": {
"terms": {
"field": "make.keyword"
},
"aggs": {
"stats_price":{
"stats": {
"field": "price"
}
}
}
}
}
}
结果:
#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_make" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "honda",
"doc_count" : 3,
"stats_price" : {
"count" : 3,
"min" : 10000.0,
"max" : 20000.0,
"avg" : 16666.666666666668,
"sum" : 50000.0
}
},
{
"key" : "ford",
"doc_count" : 2,
"stats_price" : {
"count" : 2,
"min" : 25000.0,
"max" : 30000.0,
"avg" : 27500.0,
"sum" : 55000.0
}
},
{
"key" : "toyota",
"doc_count" : 2,
"stats_price" : {
"count" : 2,
"min" : 12000.0,
"max" : 15000.0,
"avg" : 13500.0,
"sum" : 27000.0
}
},
{
"key" : "bmw",
"doc_count" : 1,
"stats_price" : {
"count" : 1,
"min" : 80000.0,
"max" : 80000.0,
"avg" : 80000.0,
"sum" : 80000.0
}
}
]
}
}
}
聚合分析中的排序
根据厂商分组后并按照价格进行降序排列:
get /cars/transactions/_search
{
"size": 0,
"aggs": {
"group_by_make": {
"terms": {
"field": "make.keyword",
"order": {
"avg_price": "desc"
}
},
"aggs": {
"avg_price":{
"avg": {
"field": "price"
}
}
}
}
}
}
结果:
#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
"took" : 26,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_make" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "bmw",
"doc_count" : 1,
"avg_price" : {
"value" : 80000.0
}
},
{
"key" : "ford",
"doc_count" : 2,
"avg_price" : {
"value" : 27500.0
}
},
{
"key" : "honda",
"doc_count" : 3,
"avg_price" : {
"value" : 16666.666666666668
}
},
{
"key" : "toyota",
"doc_count" : 2,
"avg_price" : {
"value" : 13500.0
}
}
]
}
}
}