背景:通过 clickhouse 统计 pv
现有一张表
create table click_event(
id String comment '点击ID',
cid UInt64 comment '用户ID',
created_time DateTime comment '点击时间'
) ENGINE = MergeTree
order by id;
该表为 click_event ,用户每点击一次,向该表写入一条数据。
写入测试数据
insert into click_event values
('1', 1, '2021-06-22 12:12:12'),
('2', 1, '2021-06-23 12:12:12'),
('3', 1, '2021-06-23 13:12:12');
统计 PV 写法
select toYYYYMMDD(created_time), count()
from click_event
group by toYYYYMMDD(created_time);
以上写法需要在原表上 group by,虽然 ck 查询性能彪悍,但是仍然有性能损耗。
现用 ck 提供的 SummingMergeTree 优化以上的统计查询:
创建视图:
create materialized view view_click_event
engine = SummingMergeTree(value)
order by day1
populate as
select toYYYYMMDD(created_time) as day1, toInt64(1) as value from click_event;
此时:
select * from view_click_event;
复制代码
就是聚合以后过的结果,原表每 insert 一条数据,view_click_event 也会新增一条(此时注意 MergeTree 合并过程)
正确应该是:
select day1,sum(value) from view_click_event group by day1;