ES中bucket分桶聚合

接着上一篇的文章继续,这一篇文章是bucket分桶聚合。数据依然用的是kibana_sample_data_ecommerce数据源。

Terms词项分桶

这个是把所有的数据按照下单的每周的日期进行分桶,统计周一下单数量。

GET kibana_sample_data_ecommerce/_search
{
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    "terms_currency": {
      "terms": {
        "field": "day_of_week"
      }
    }
  }
}

返回结果

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4675,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "terms_currency" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Thursday",
          "doc_count" : 775
        },
        {
          "key" : "Friday",
          "doc_count" : 770
        },
        {
          "key" : "Saturday",
          "doc_count" : 736
        },
        {
          "key" : "Sunday",
          "doc_count" : 614
        },
        {
          "key" : "Tuesday",
          "doc_count" : 609
        },
        {
          "key" : "Wednesday",
          "doc_count" : 592
        },
        {
          "key" : "Monday",
          "doc_count" : 579
        }
      ]
    }
  }
}

aggs terms 嵌套aggs terms

上一个查询,我们可以查到一周每天购买的订单数量。
这个查询我们可以根据购买的制造厂家统计每天购买的数量。还可以再继续进行aggs的terms分组,这个例子就不再继续了。
可以理解为桶中存放小桶,把第一个桶中的数据再按照terms分了多个桶。

GET kibana_sample_data_ecommerce/_search
{
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    "terms_currency": {
      "terms": {
        "field": "day_of_week",
        "order": {
          "_key": "desc"
        }
      },
      "aggs": {
        "terms_manufacturer": {
          "terms": {
            "field": "manufacturer.keyword",
            "size": 1000,
            "show_term_doc_count_error": true
          }
        }
      }
    }
  }
}

order

基于第二层查询出来的订单价钱最高的进行降序排列

GET kibana_sample_data_ecommerce/_search
{
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    "terms_currency": {
      "terms": {
        "field": "day_of_week",
        "order": {
          "max_manufacturer.value": "desc"
        },
        "include": ".*",
        "exclude": "Monday"
      },
      "aggs": {
        "max_manufacturer": {
          "max": {
            "field": "taxful_total_price"
          }
        }
      }
    }
  }
}

直方图-histogram

这个直方图可以统计:按照50的数组去统计订单的分组。即:0-50一个bucket,50-100一个bucket。
参数:
query:注意这里是可以进行过滤的,比如我查询价格在1000到2000之间的进行分桶。
interval:间隔,可以调整。
extended_bounds:自动补充数据,比如1100到1200之间没有数据,那么加上这个字段会给默认填充一个0的分桶。
hard_bounds:只展示min到max之间的数据桶。
order:可以按照key和count进行排序

GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_all": {}
  }, 
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    "histogram_taxful_total_price": {
      "histogram": {
        "field": "taxful_total_price",
        "interval": 200,
        "keyed": true,
        "extended_bounds": {
          "min": 1000,
          "max": 3000
        },
        "hard_bounds": {
          "min": 1000,
          "max": 3000
        },
        "order": {
          "_key": "asc"
        }
      }
    }
  }
}

时间直方图-date_histogram

这个是一个基于时间查询的工作实际场景,然后再根据content字段做7天统计的直方图。

GET /_index/_search
{
  "size": 0,
  "query": {
    "range": {
      "time": {
        "gte": "2022-07-13 00:00:00",
        "lte": "2022-07-26 23:59:59"
      }
    }
  },
  "aggs": {
    "keywords": {
      "terms": {
        "field": "content.keyword",
        "size": 10000000,
        "shard_size": 100000,
        "show_term_doc_count_error": true
      },
      "aggs": {
        "time": {
          "date_histogram": {
            "field": "time",
            "fixed_interval": "7d",
            "min_doc_count": 0,
            "format": "yyyy-MM-dd",
            "time_zone": "+08:00",
            "offset": "-1d",
            "extended_bounds": {
              "min": "now/d",
              "max": "now/d"
            }
          }
        },
        "page_sort": {
          "bucket_sort": {
            "sort": [],
            "from": 0,
            "size": 10
          }
        }
      }
    }
  }
}

auto_date_histogram-根据时间自动推算

根据时间自动推出来的一个直方图。没太get到这个点,不清楚怎么使用

GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_all": {}
  }, 
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    "date_histogram_taxful_total_price": {
      "auto_date_histogram": {
        "field": "order_date",
        "buckets":"20"
      }
    }
  }
}

range-范围统计

这个查询可以基于from和to的范围进行统计,可以自定义key的名称。基于下单时间做了2个范围统计。

from:开始值,可以为时间,也可以为数字
to:结束值,同上。
key:自定义名称

GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_all": {}
  },
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    "range_order_date": {
      "range": {
        "field": "order_date",
        "ranges": [
          {
            "key": "first", 
            "from": "2022-08-22T09:28:48+00:00",
            "to": "2022-08-23T09:28:48+00:00"
          },
          {
            "key": "second", 
            "from": "2022-08-23T09:28:48+00:00",
            "to": "2022-08-24T09:28:48+00:00"
          }
        ]
      }
    }
  }
}

基于script查询的一个范围统计,不过这里需要转换成时间戳来统计,其他时间格式不支持。

GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_all": {}
  },
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    "range_order_date": {
      "range": {
        "script": {
          "source": """
          doc["order_date"].value;
        """,
          "lang": "painless"
        },
        "ranges": [
          {
            "key": "first",
            "from": "1661131728000",
            "to": "1661218128000"
          },
          {
            "key": "second",
            "from": "1661218128000",
            "to": "1661304528000"
          }
        ]
      }
    }
  }
}

date_range-时间范围

这个可以实现和上边一样的功能。

GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_all": {}
  },
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    "date_range_order_date": {
      "date_range": {
        "field": "order_date", 
        "ranges": [
          {
            "key": "first",
            "from": "1661131728000",
            "to": "1661218128000"
          },
          {
            "key": "second",
            "from": "1661218128000",
            "to": "1661304528000"
          }
        ]
      }
    }
  }
}

composite-组合查询

可以进行组合的去查询,和多级嵌套作用差不多

GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_all": {}
  },
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    "composite_aggs": {
      "composite": {
      	"size":"10",
        "sources": [
          {
            "terms_day_of_week": {
              "terms": {
                "field": "day_of_week",
                "order":"asc"
              }
            }
          },
          {
            "terms_products_manufacturer": {
              "terms": {
                "field": "products.manufacturer.keyword"
              }
            }
          }
        ]
      }
    }
  }
}

可以进行分页,增加了after的参数:

GET kibana_sample_data_ecommerce/_search
{
  "query": {
    "match_all": {}
  },
  "track_total_hits": true,
  "size": 0,
  "aggs": {
    "composite_aggs": {
      "composite": {
        "size": 10,
        "sources": [
          {
            "terms_day_of_week": {
              "terms": {
                "field": "day_of_week",
                "order": "asc"
              }
            }
          },
          {
            "terms_products_manufacturer": {
              "terms": {
                "field": "products.manufacturer.keyword"
              }
            }
          }
        ],
        "after": {
          "terms_day_of_week": "Friday",
          "terms_products_manufacturer": "Microlutions"
        }
      }
    }
  }
}

官网资料参考:
https://www.elastic.co/guide/en/elasticsearch/reference/8.3/search-aggregations-bucket-datehistogram-aggregation.html#search-aggregations-bucket-datehistogram-offset

自己记录的一些经验,大家可以一起交流,欢迎大家咨询各种ES聚合问题。
可以通过二维码联系我。

联系我

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值