HiveSQL 聚合技巧
grouping sets
grouping sets() 在 group by 查询中,根据不同的维度组合进行聚合,等价于将不同维度的 group by 结果集进行 union all。聚合规则在括号中进行指定。
问题1:同时获取用户的性别分布、城市分布、等级分布
select sex,
city,
level,
count(distinct user_id)
from
user_info
group by sex,city,level
grouping sets (sex,city,level);
问题2:同时获取用户的性别分布以及每个性别的城市分布
select sex,
city,
count(distinct user_id),
grouping__id
from user_info
group by sex,city
grouping sets(sex,(sex,city));
grouping__id : (两个下划线) 结果属于哪一个分组集合
运行结果:
sex | city | _c2 | grouping__id |
---|---|---|---|
female | NULL | 177 | 1 |
female | beijing | 26 | 0 |
female | changchun | 32 | 0 |
female | guangzhou | 30 | 0 |
female | hangzhou | 30 | 0 |
female | shanghai | 36 | 0 |
female | shenzhen | 23 | 0 |
male | NULL | 150 | 1 |
male | beijing | 27 | 0 |
male | changchun | 21 | 0 |
male | guangzhou | 25 | 0 |
male | hangzhou | 27 | 0 |
male | shanghai | 25 | 0 |
male | shenzhen | 25 | 0 |
cube
根据 group by 维度的所有组合进行聚合
问题:性别、城市、等级的各种组合的用户分布
select sex,
city,
level,
count(distinct user_id)
from user_info
group by sex,city,level
with cube;
rollup
以最左侧的维度为主,进行层级聚合,是 cube 的子集
问题:同时计算每个月的支付金额,以及每年的总支付金额
select year(dt) as year,
month(dt) as month,
sum(pay_amount) as pay_total
from user_trade
where dt > "0"
group by year(dt),month(dt)
with rollup;
这里的 user_trade 是分区表,分区字段是 dt