es 的版本 7.11
学习到 rollup ,做个小笔记
rollup 英语翻译过来是数据上卷,很抽象。我个人经过学习后,有一点理解:数据的统计抽取。
例如,日志记录了每个人的访问,此时如果要统计每个人的日访问量,需要把每一天的访问记录逐行统计,这样的话是非常低的效率,甚至没有效率可言。
此时,如果我们事先把每一天的访问量统计好,某一天用户访问总量,然后存放起来,也就是按天记录访问总数。然后在需要的时候,直接查询按天存放访问总数,从而不需要扫描每一条访问记录,大大的提高效率。
这是从细分(细粒度)原始数据,抽取数据(粗粒度)。
当然,经过抽取,肯定是会失去细粒度的数据(某个时刻的某条访问记录),也就是我们从按天记录访问总数的数据中,只能知道用户一天的访问总数,也不知道用户是在哪个小时里的访问情况。
当然,我们还可以抽取成按月统计的用户访问数量,甚至按年统计。具体按天还是按月,还是按年,是要看需求的。
最理想的是,按照比较理想的粒度进行存放访问数量,然后在需要更粗粒度的展示数据时,我们再从存放的统计中进行更粗粒度的统计,从而实现不丢失一定的细粒度的数据的同时,可以在可接受的时间内查询统计出所需粒度的数据。
在这里,我使用 kibana 的案例数据来学习的。
我们按天统计每个区域的订单总价
1 复制索引
POST _reindex
{
"source": { "index": "kibana_sample_data_ecommerce" },
"dest": { "index": "kibana_sample_data_ecommerce_rollup" }
}
2 统计
查询统计 | 统计结果 |
GET kibana_sample_data_ecommerce_rollup/_search { "track_total_hits": true, "size": 0, "aggs": { "histogram_": { "date_histogram": { "field": "order_date", "calendar_interval": "1d", "format": "yyyy-MM-dd" }, "aggs": { "terms_continent_name": { "terms": { "field": "geoip.continent_name.keyword", "size": 10, "order": { "_key": "asc" } }, "aggs": { "stats_taxful_total_price": { "stats": { "field": "taxful_total_price" } } } } } } } } | { "max_score" : null,"hits" : [ ] ...................... |
这个查询统计,是根据原始数据逐行扫描统计。
使用 rollup 进行抽取统计
1 创建 rollup,把统计出来的数据存放在索引 rollup_job_index_001 中
PUT _rollup/job/job01
{
"index_pattern": "kibana_sample_data_ecommerce_rollup",
"rollup_index": "rollup_job_index_001",
"cron": "*/1 * * * * ?",
"page_size": 1000,
"groups": {
"date_histogram": {
"field": "order_date",
"fixed_interval": "1d"
},
"terms": { "fields": ["geoip.continent_name.keyword"] }
},
"metrics": [
{
"field": "taxful_total_price",
"metrics": ["min", "max", "avg", "sum" ]
}
]
}
2 运行 rollup
POST _rollup/job/job01/_start
然后数据就变成了按天统计每个区域的数据,存放在 索引 rollup_job_index_001 中
3 查询索引 rollup_job_index_001 的数据,可以看到数据,已经是按天按区域统计出来的数据了。
查询,这里只取一条数据来查看效果 | 效果 |
GET rollup_job_index_001/_search { "size": 1, "query": { "match_all": {} } } | { "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 154, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "rollup_job_index_001", "_type" : "_doc", "_id" : "job01$lEVhYHGJfZX0DKPc6yBDbg", "_score" : 1.0, "_source" : { "taxful_total_price.min.value" : 22.979999542236328, "taxful_total_price.avg._count" : 21.0, "geoip.continent_name.keyword.terms._count" : 21, "taxful_total_price.max.value" : 229.97999572753906, "taxful_total_price.sum.value" : 1874.5600242614746, "geoip.continent_name.keyword.terms.value" : "Africa", "order_date.date_histogram.timestamp" : 1614211200000, "order_date.date_histogram.interval" : "1d", "order_date.date_histogram.time_zone" : "UTC", "_rollup.version" : 2, "order_date.date_histogram._count" : 21, "taxful_total_price.avg.value" : 1874.5600242614746, "_rollup.id" : "job01" } } ] } } |
4 按月统计每个区域的数据,这里是基于按天统计的数据,再进行按月统计每个区域的数据
按月统计数据 | 统计结果 |
GET rollup_job_index_001/_search { "size": 0, "aggs": { "histogram": { "date_histogram": { "field": "order_date.date_histogram.timestamp", "calendar_interval": "month", "format": "yyyy-MM" }, "aggs": { "terms_continent_name": { "terms": { "field": "geoip.continent_name.keyword.terms.value", "size": 100 }, "aggs": { "sum_taxful_total_price": { "sum": { "field": "taxful_total_price.sum.value" } }, "max_taxful_total_price": { "max": { "field": "taxful_total_price.max.value" } } } } } } } } | { "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 154, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "histogram" : { "buckets" : [ { "key_as_string" : "2021-02", "key" : 1612137600000, "doc_count" : 20, "terms_continent_name" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Africa", "doc_count" : 4, "max_taxful_total_price" : { "value" : 237.9600067138672 }, "sum_taxful_total_price" : { "value" : 6915.9600830078125 } }, { "key" : "Asia", "doc_count" : 4, "max_taxful_total_price" : { "value" : 277.9599914550781 }, "sum_taxful_total_price" : { "value" : 11235.570068359375 } }, { "key" : "Europe", "doc_count" : 4, "max_taxful_total_price" : { "value" : 184.97999572753906 }, "sum_taxful_total_price" : { "value" : 9556.0400390625 } }, { "key" : "North America", "doc_count" : 4, "max_taxful_total_price" : { "value" : 234.97999572753906 }, "sum_taxful_total_price" : { "value" : 11774.460205078125 } }, { "key" : "South America", "doc_count" : 4, "max_taxful_total_price" : { "value" : 166.97999572753906 }, "sum_taxful_total_price" : { "value" : 1971.4000701904297 } } ] } }, { "key_as_string" : "2021-03", "key" : 1614556800000, "doc_count" : 134, "terms_continent_name" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Africa", "doc_count" : 27, "max_taxful_total_price" : { "value" : 369.9599914550781 }, "sum_taxful_total_price" : { "value" : 60931.130615234375 } }, { "key" : "Asia", "doc_count" : 27, "max_taxful_total_price" : { "value" : 2249.919921875 }, "sum_taxful_total_price" : { "value" : 83935.72094726562 } }, { "key" : "Europe", "doc_count" : 27, "max_taxful_total_price" : { "value" : 242.9600067138672 }, "sum_taxful_total_price" : { "value" : 74276.74084472656 } }, { "key" : "North America", "doc_count" : 27, "max_taxful_total_price" : { "value" : 343.9599914550781 }, "sum_taxful_total_price" : { "value" : 80079.96154785156 } }, { "key" : "South America", "doc_count" : 26, "max_taxful_total_price" : { "value" : 175.97999572753906 }, "sum_taxful_total_price" : { "value" : 10182.060108184814 } } ] } } ] } } } |