es 版本 7.11
基数统计小笔记
基数统计的概念,可以先百度一下。然后再结合案例数据来理解。基本上是可以理解过来的。
我之前不管怎么看视频和文档,理解始终是 不怎么样,最终是通过自己来添加数据,才理解过来。
创建 索引 kibana_sample_data_ecommerce_my , mapping 与 kibana_sample_data_ecommerce 的 mapping 一样
添加数据 ,2021-3 月份 有 3 个用户(1、2、3),2021-4 月份有 3 个用户(4、5、6),2021-5 月份 有 5 个用户(1、2、7、8、9)
PUT kibana_sample_data_ecommerce_my/_doc/1
{
"customer_id":1,
"order_date" : "2021-03-21T14:16:48+00:00"
}
PUT kibana_sample_data_ecommerce_my/_doc/2
{
"customer_id":2,
"order_date" : "2021-03-22T14:16:48+00:00"
}
PUT kibana_sample_data_ecommerce_my/_doc/3
{
"customer_id":3,
"order_date" : "2021-03-23T14:16:48+00:00"
}
PUT kibana_sample_data_ecommerce_my/_doc/4
{
"customer_id":4,
"order_date" : "2021-04-21T14:16:48+00:00"
}
PUT kibana_sample_data_ecommerce_my/_doc/5
{
"customer_id":5,
"order_date" : "2021-04-22T14:16:48+00:00"
}
PUT kibana_sample_data_ecommerce_my/_doc/6
{
"customer_id":6,
"order_date" : "2021-04-23T14:16:48+00:00"
}
PUT kibana_sample_data_ecommerce_my/_doc/7
{
"customer_id":7,
"order_date" : "2021-05-21T14:16:48+00:00"
}
PUT kibana_sample_data_ecommerce_my/_doc/8
{
"customer_id":8,
"order_date" : "2021-05-22T14:16:48+00:00"
}
PUT kibana_sample_data_ecommerce_my/_doc/9
{
"customer_id":9,
"order_date" : "2021-05-23T14:16:48+00:00"
}
PUT kibana_sample_data_ecommerce_my/_doc/10
{
"customer_id":1,
"order_date" : "2021-05-23T14:16:48+00:00"
}
PUT kibana_sample_data_ecommerce_my/_doc/11
{
"customer_id":2,
"order_date" : "2021-05-23T14:16:48+00:00"
}
查询 | 结果 |
GET kibana_sample_data_ecommerce_my/_search { "size": 1 ,"aggs": { "order_data_histogram": { "date_histogram": { "field": "order_date", "calendar_interval": "month", "format": "yyyy-MM" }, "aggs": { "customer_id_car": { "cardinality": { "field": "customer_id" } }, "pipeline-cumstomer-id": { "cumulative_cardinality": { "buckets_path": "customer_id_car" } } } } } } | { "took" : 1023, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 11, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "kibana_sample_data_ecommerce_my", "_type" : "_doc", "_id" : "4", "_score" : 1.0, "_source" : { "customer_id" : 4, "order_date" : "2021-04-21T14:16:48+00:00" } } ] }, "aggregations" : { "order_data_histogram" : { "buckets" : [ { "key_as_string" : "2021-03", "key" : 1614556800000, "doc_count" : 3, "customer_id_car" : { "value" : 3 }, "pipeline-cumstomer-id" : { "value" : 3 } }, { "key_as_string" : "2021-04", "key" : 1617235200000, "doc_count" : 3, "customer_id_car" : { "value" : 3 }, "pipeline-cumstomer-id" : { "value" : 6 } }, { "key_as_string" : "2021-05", "key" : 1619827200000, "doc_count" : 5, "customer_id_car" : { "value" : 5 }, "pipeline-cumstomer-id" : { "value" : 9 } } ] } } } |
查询结果中,可以看出,
# "cardinality" 当前月份用户的累计
# "cumulative_cardinality" 是指前面的月份和当前月份的所有新增用户的累计
2021-3 月份 "customer_id_car" : { "value" : 3 }, "pipeline-cumstomer-id" : { "value" : 3 },也就是 用户基数统计 cardinality 为 3,新增用户基数统计 cumulative_cardinality 为 3
2021-4 月份 "customer_id_car" : { "value" : 3 }, "pipeline-cumstomer-id" : { "value" : 6 },也就是 用户基数统计 cardinality 为 3,新增用户基数统计 cumulative_cardinality 为 6
2021-5 月份 "customer_id_car" : { "value" : 5 }, "pipeline-cumstomer-id" : { "value" : 9 },也就是 用户基数统计 cardinality 为 5,新增用户基数统计 cumulative_cardinality 为 9
这就是用户的基数统计和新增用户的基数统计。可以理解出在时间轴上数据的基数统计和新增数据的基数统计。