业务场景:
一张门店表dim_mty_store ,三个字段分别为 store_code(门店)、director_code(主任)、manager_code(大区),级别从小到大:门店 < 主任 < 大区。
要根据门店、主任、大区这三个粒度分别统计门店数量,最笨的办法是三次group by 然后union all到一起,代码如下:
-- 1. 门店
select
manager_code,
director_code,
store_code,
count(store_code) as cnt
from dim_mty_store
group by
manager_code,
director_code,
store_code
union all
-- 2. 主任
select
manager_code,
director_code,
null as store_code,
count(store_code) as cnt
from dim_mty_store
group by
manager_code,
director_code
union all
-- 3. 大区
select
manager_code,
null as director_code,
null as store_code,
count(store_code) as cnt
from dim_mty_store
group by
manager_code
;
相关的快捷方法如下:
1. cube
功能:按照 所有字段组合的可能组合 分组union all的结果。
执行顺序:先执行全集,再执行子集(从左到右),最后执行空集。
grouping__id:分组字段组合对应的id,从0开始,全集为0,空集时候最大。
举个栗子:
-- cube
set odps.sql.hive.compatible=true; --输出grouping__id时才需要该设置
select
store_code,
director_code,
manager_code,
grouping__id,
count(store_code) as cnt
from dim_mty_store
group by
manager_code,
director_code,
store_code
with cube
;
-- 底层执行的逻辑等同于下面这段逻辑,共8组,这里只写group by部分
-- 1、全集,grouping__id = 0
group by
manager_code,
director_code,
store_code
-- 2、所有子集,这里共有6组,两两组合3组,单个字段3组
union all
group by
manager_code,
director_code -- grouping__id = 1
union all
group by
manager_code,
store_code -- grouping__id = 2
union all
group by
manager_code -- grouping__id = 3
union all
group by
director_code,
store_code -- grouping__id = 4
union all
group by
director_code -- grouping__id = 5
union all
group by
store_code -- grouping__id = 6
-- 3、空集,用111代替
union all
group by 111 -- grouping__id = 7
2. rollup:
功能:cube的子集。
执行顺序:先执行全集,再执行从左到右包含最左边字段的子集(字段个数递减),最后执行空集。
grouping__id:分组字段组合对应的id,从0开始,全集为0,空集时候最大,是cube对应的grouping__id的子集。
用法如下:
-- rollup
set odps.sql.hive.compatible=true; --输出grouping__id时才需要该设置
select
store_code,
director_code,
manager_code,
grouping__id,
count(store_code) as cnt
from dim_mty_store
group by
manager_code,
director_code,
store_code
with rollup
;
-- 底层执行的逻辑等同于下面这段逻辑,这里只写group by部分
-- 1、全集
group by
manager_code,
director_code,
store_code -- grouping__id = 0
--2、全部子集
union all
group by
manager_code,
director_code -- grouping__id = 1
union all
group by
manager_code -- grouping__id = 3
-- 3、空集
union all
group by 111 -- grouping__id = 7
3. grouping sets:
如果觉得cube和rollup执行结果中有一些数据是不需要的,那可以通过grouping sets来灵活的设置分组字段。根据自己的业务需求指定要分组的字段组合,组合字段中没有的字段最终都会置为空,用法如下:
-- grouping sets
set odps.sql.hive.compatible=true; --输出grouping__id时才需要该设置
select
store_code,
director_code,
manager_code,
grouping__id,
count(store_code) as cnt
from dim_mty_store
group by
manager_code,
director_code,
store_code
grouping sets (
manager_code,
(manager_code, director_code),
(manager_code, director_code, store_code)
) ;
--结果等同于如下逻辑,这里只写group by
group by
manager_code, director_code, store_code --grouping__id = 0
union all
group by
manager_code, director_code --grouping__id = 1
union all
group by
manager_code --grouping__id = 3
最后这种情况跑出来的结果,理论上跟开始举例需求的结果是一致的。
新手上手,请多指教,大佬勿喷!