接着上一篇的文章继续,这一篇文章是bucket分桶聚合。数据依然用的是kibana_sample_data_ecommerce数据源。
Terms词项分桶
这个是把所有的数据按照下单的每周的日期进行分桶,统计周一下单数量。
GET kibana_sample_data_ecommerce/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"terms_currency": {
"terms": {
"field": "day_of_week"
}
}
}
}
返回结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4675,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"terms_currency" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Thursday",
"doc_count" : 775
},
{
"key" : "Friday",
"doc_count" : 770
},
{
"key" : "Saturday",
"doc_count" : 736
},
{
"key" : "Sunday",
"doc_count" : 614
},
{
"key" : "Tuesday",
"doc_count" : 609
},
{
"key" : "Wednesday",
"doc_count" : 592
},
{
"key" : "Monday",
"doc_count" : 579
}
]
}
}
}
aggs terms 嵌套aggs terms
上一个查询,我们可以查到一周每天购买的订单数量。
这个查询我们可以根据购买的制造厂家统计每天购买的数量。还可以再继续进行aggs的terms分组,这个例子就不再继续了。
可以理解为桶中存放小桶,把第一个桶中的数据再按照terms分了多个桶。
GET kibana_sample_data_ecommerce/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"terms_currency": {
"terms": {
"field": "day_of_week",
"order": {
"_key": "desc"
}
},
"aggs": {
"terms_manufacturer": {
"terms": {
"field": "manufacturer.keyword",
"size": 1000,
"show_term_doc_count_error": true
}
}
}
}
}
}
order
基于第二层查询出来的订单价钱最高的进行降序排列
GET kibana_sample_data_ecommerce/_search
{
"track_total_hits": true,
"size": 0,
"aggs": {
"terms_currency": {
"terms": {
"field": "day_of_week",
"order": {
"max_manufacturer.value": "desc"
},
"include": ".*",
"exclude": "Monday"
},
"aggs": {
"max_manufacturer": {
"max": {
"field": "taxful_total_price"
}
}
}
}
}
}
直方图-histogram
这个直方图可以统计:按照50的数组去统计订单的分组。即:0-50一个bucket,50-100一个bucket。
参数:
query:注意这里是可以进行过滤的,比如我查询价格在1000到2000之间的进行分桶。
interval:间隔,可以调整。
extended_bounds:自动补充数据,比如1100到1200之间没有数据,那么加上这个字段会给默认填充一个0的分桶。
hard_bounds:只展示min到max之间的数据桶。
order:可以按照key和count进行排序
GET kibana_sample_data_ecommerce/_search
{
"query": {
"match_all": {}
},
"track_total_hits": true,
"size": 0,
"aggs": {
"histogram_taxful_total_price": {
"histogram": {
"field": "taxful_total_price",
"interval": 200,
"keyed": true,
"extended_bounds": {
"min": 1000,
"max": 3000
},
"hard_bounds": {
"min": 1000,
"max": 3000
},
"order": {
"_key": "asc"
}
}
}
}
}
时间直方图-date_histogram
这个是一个基于时间查询的工作实际场景,然后再根据content字段做7天统计的直方图。
GET /_index/_search
{
"size": 0,
"query": {
"range": {
"time": {
"gte": "2022-07-13 00:00:00",
"lte": "2022-07-26 23:59:59"
}
}
},
"aggs": {
"keywords": {
"terms": {
"field": "content.keyword",
"size": 10000000,
"shard_size": 100000,
"show_term_doc_count_error": true
},
"aggs": {
"time": {
"date_histogram": {
"field": "time",
"fixed_interval": "7d",
"min_doc_count": 0,
"format": "yyyy-MM-dd",
"time_zone": "+08:00",
"offset": "-1d",
"extended_bounds": {
"min": "now/d",
"max": "now/d"
}
}
},
"page_sort": {
"bucket_sort": {
"sort": [],
"from": 0,
"size": 10
}
}
}
}
}
}
auto_date_histogram-根据时间自动推算
根据时间自动推出来的一个直方图。没太get到这个点,不清楚怎么使用
GET kibana_sample_data_ecommerce/_search
{
"query": {
"match_all": {}
},
"track_total_hits": true,
"size": 0,
"aggs": {
"date_histogram_taxful_total_price": {
"auto_date_histogram": {
"field": "order_date",
"buckets":"20"
}
}
}
}
range-范围统计
这个查询可以基于from和to的范围进行统计,可以自定义key的名称。基于下单时间做了2个范围统计。
from:开始值,可以为时间,也可以为数字
to:结束值,同上。
key:自定义名称
GET kibana_sample_data_ecommerce/_search
{
"query": {
"match_all": {}
},
"track_total_hits": true,
"size": 0,
"aggs": {
"range_order_date": {
"range": {
"field": "order_date",
"ranges": [
{
"key": "first",
"from": "2022-08-22T09:28:48+00:00",
"to": "2022-08-23T09:28:48+00:00"
},
{
"key": "second",
"from": "2022-08-23T09:28:48+00:00",
"to": "2022-08-24T09:28:48+00:00"
}
]
}
}
}
}
基于script查询的一个范围统计,不过这里需要转换成时间戳来统计,其他时间格式不支持。
GET kibana_sample_data_ecommerce/_search
{
"query": {
"match_all": {}
},
"track_total_hits": true,
"size": 0,
"aggs": {
"range_order_date": {
"range": {
"script": {
"source": """
doc["order_date"].value;
""",
"lang": "painless"
},
"ranges": [
{
"key": "first",
"from": "1661131728000",
"to": "1661218128000"
},
{
"key": "second",
"from": "1661218128000",
"to": "1661304528000"
}
]
}
}
}
}
date_range-时间范围
这个可以实现和上边一样的功能。
GET kibana_sample_data_ecommerce/_search
{
"query": {
"match_all": {}
},
"track_total_hits": true,
"size": 0,
"aggs": {
"date_range_order_date": {
"date_range": {
"field": "order_date",
"ranges": [
{
"key": "first",
"from": "1661131728000",
"to": "1661218128000"
},
{
"key": "second",
"from": "1661218128000",
"to": "1661304528000"
}
]
}
}
}
}
composite-组合查询
可以进行组合的去查询,和多级嵌套作用差不多
GET kibana_sample_data_ecommerce/_search
{
"query": {
"match_all": {}
},
"track_total_hits": true,
"size": 0,
"aggs": {
"composite_aggs": {
"composite": {
"size":"10",
"sources": [
{
"terms_day_of_week": {
"terms": {
"field": "day_of_week",
"order":"asc"
}
}
},
{
"terms_products_manufacturer": {
"terms": {
"field": "products.manufacturer.keyword"
}
}
}
]
}
}
}
}
可以进行分页,增加了after的参数:
GET kibana_sample_data_ecommerce/_search
{
"query": {
"match_all": {}
},
"track_total_hits": true,
"size": 0,
"aggs": {
"composite_aggs": {
"composite": {
"size": 10,
"sources": [
{
"terms_day_of_week": {
"terms": {
"field": "day_of_week",
"order": "asc"
}
}
},
{
"terms_products_manufacturer": {
"terms": {
"field": "products.manufacturer.keyword"
}
}
}
],
"after": {
"terms_day_of_week": "Friday",
"terms_products_manufacturer": "Microlutions"
}
}
}
}
}
官网资料参考:
https://www.elastic.co/guide/en/elasticsearch/reference/8.3/search-aggregations-bucket-datehistogram-aggregation.html#search-aggregations-bucket-datehistogram-offset
自己记录的一些经验,大家可以一起交流,欢迎大家咨询各种ES聚合问题。
可以通过二维码联系我。