1、COUNT(DISTINCT)优化
原始语句运行时间大概215秒
select dt,
count(distinct case when page='2001' then uuid end) as cleanUser
from dw_space_page_orc_dt
where dt>='20170506'
优化之后语句运行时间大概155秒
select sum(t2.uuid_group) cleanUser from
(select t1.tag, count(1) uuid_group from
(select uuid, cast(RAND() * 100 AS BIGINT) tag from dw_space_page_orc_dt where page='2001' and dt>'20170506' group by uuid) t1 group by tag) t2;
- 第一层SELECT:对uuid进行去重,并为去重后的uuid打上整数标记
- 第二层SELECT:按照标记进行分组,统计每个分组下uuid的个数
- 第三层SELECT:对所有分组进行求和
上面这个方法最关键的是为每个uuid进行标记,这样就可以对其进行分组,分别计数,最后去和。如果数据量确实很大,也可以增加分组的个数。例如:CAST(RAND() * 1000 AS BIGINT) tag