应用场景
将所有的维度排列组合在一块
WITH CUBE
会组合所有维度,比如有n个维度,一共排列出2^n种组合查询
缺点:会产生大量无意义的组合
SELECT province, city, devicetype, count(distinct guid) as uv_amt
FROM tmp
GROUP BY province,city,devicetype
WITH CUBE;
GROUPING SETS(...)
可以自定义组合维度,()代表最粗粒度的组合
SELECT province, city, devicetype, count(distinct guid) as uv_amt
FROM tmp
GROUP BY province,city,devicetype
GROUPING SETS((),(province),(province,city),(province,devicetype),(province,city,devicetype));
WITH ROLLUP
如果数据中,所有维度都属于一个层级关系链条,比如省-->市--->区,使用这种函数更简便高效
SELECT province, city, region, street, sum(score) as socre_amt, max(score) as score_max
GROUP BY province,city,region,street
with rollup;
过滤掉不需要查询的维度:
使用coalesce()函数:返回第一个不为空的值,如果所有值都为空,则返回null。
例子:不查询account和score这两个维度
select
*
from dwd.device_account_log t
where COALESCE(t.account,t.score) is null;
设置最大维度基数(默认30个):
set hive.new.job.grouping.set.cardomality=n