group by、grouping sets、rollup和cube方法详解

在平时的工作中,经常有按照不同维度筛选和统计数据的需求。拿视频会员订单数据来说吧,运营人员要查看深圳市的成功下单数或则深圳市某一种产品的成功下单数或者某一种产品的所有成功下单数时,每天的订单数又很大,现查的话按照不同的维度去查询又很慢。此时本篇文章或许会帮助到你。

  • group by:主要用来对查询的结果进行分组,相同组合的分组条件在结果集中只显示一行记录。可以添加聚合函数。
  • grouping sets:对分组集中指定的组表达式的每个子集执行group by,group by A,B grouping sets(A,B)就等价于 group by A union group by B,其中A和B也可以是一个集合,比如group by A,B,C grouping sets((A,B),(A,C))。
  • rollup:在指定表达式的每个层次级别创建分组集。group by A,B,C with rollup首先会对(A、B、C)进行group by,然后对(A、B)进行group by,然后是(A)进行group by,最后对全表进行group by操作。
  • cube:为指定表达式集的每个可能组合创建分组集。首先会对(A、B、C)进行group by,然后依次是(A、B),(A、C),(A),(B、C),(B),( C),最后对全表进行group by操作。
  • GROUPING_ID(column_name1,column_name2…):GROUPING_ID ( [ ,...n]) 将 GROUPING () 在每个输出行中为其列列表中的每个列返回的对应值作为 0、1 字符串输入。GROUPING_ID 将该字符串解释为二进制数并返回对应的整数。

每个 GROUPING_ID 参数都必须是 GROUP BY 列表的一个元素。GROUPING_ID () 返回一个 integer 位图,其最低 N 位可能为文字。文字 bit 表明对应参数不是给定输出行的分组列。最低顺序 bit 对应于参数 N,第 N-1 个最低顺序 bit 对应于参数 1。

https://docs.microsoft.com/zh-cn/previous-versions/sql/sql-server-2008-r2/bb510624(v=sql.105)

数据库中会员订单的数据映射的对象如下:

insert into orderTempTable 
values ("深圳","钻石会员","钻石会员1个月",25), 
("深圳","钻石会员","钻石会员1个月",25), 
("深圳","钻石会员","钻石会员3个月",70), 
("深圳","钻石会员","钻石会员12个月",300), 
("深圳","铂金会员","铂金会员3个月",60), 
("深圳","铂金会员","铂金会员3个月",60), 
("深圳","铂金会员","铂金会员6个月",120), 
("深圳","黄金会员","黄金会员1个月",15), 
("深圳","黄金会员","黄金会员1个月",15), 
("深圳","黄金会员","黄金会员3个月",45), 
("深圳","黄金会员","黄金会员12个月",180), 
("北京","钻石会员","钻石会员1个月",25), 
("北京","钻石会员","钻石会员1个月",25), 
("北京","铂金会员","铂金会员3个月",60), 
("北京","黄金会员","黄金会员3个月",45), 
("上海","钻石会员","钻石会员1个月",25), 
("上海","钻石会员","钻石会员1个月",25), 
("上海","铂金会员","铂金会员3个月",60), 
("上海","黄金会员","黄金会员3个月",45)

接下来我们通过操作 orderTempTable 来看一下grouping sets、group by、rollup和cube具体如何使用。

1.group by

group by是SELECT语句的从句,用来指定查询分组条件,主要用来对查询的结果进行分组,相同组合的分组条件在结果集中只显示一行记录。使用group by从句时候,通过添加聚合函数(主要有COUNT()、SUM、MAX()、MIN()等)可以使数据聚合。

select area,memberType,product,sum(price) as total from orderTempTable group by area,memberType,product

执行结果如下,可以看到结果中按照area,memberType,product来做聚合sum操作。

+-----+------------+----------+------+
|area |memberType |product |total |
+-----+------------+----------+------+
|深圳 |铂金会员 |铂金会员6个月 |120|
|深圳 |黄金会员 |黄金会员12个月|180|
|深圳 |钻石会员 |钻石会员3个月 |70 |
|深圳 |黄金会员 |黄金会员3个月 |45 |
|深圳 |钻石会员 |钻石会员12个月|300|
|北京 |黄金会员 |黄金会员3个月 |45 |
|深圳 |钻石会员 |钻石会员1个月 |50 |
|深圳 |黄金会员 |黄金会员1个月 |30 |
|深圳 |铂金会员 |铂金会员3个月 |120|
|北京 |钻石会员 |钻石会员1个月 |50 |
|北京 |铂金会员 |铂金会员3个月 |60 |
|上海 |黄金会员 |黄金会员3个月 |45 |
|上海 |钻石会员 |钻石会员1个月 |50 |
|上海 |铂金会员 |铂金会员3个月 |60 |
+----+----------+--------+---+

2.grouping sets

grouping sets是group by子句更进一步的扩展, 它让你能够定义多个数据分组。这样做使聚合更容易, 并且因此使得多维数据分析更容易。

使用grouping sets在同一查询中定义多个分组

select area,memberType,product,sum(price) as total 
from orderTempTable 
group by area,memberType,product 
grouping sets(area,memberType,product)

grouping set相关于多个group by 用union all 拼接起来

上面的语句输出结果如下,可以看到使用grouping sets(area,memberType,product)会分别对这3个维度进行group by,也可以grouping sets ((area,memberType),(area,product)))此时相当于group by (area,memberType) union group by (area,product),也就是说grouping sets 后面可以指定你想要的各种维度组合。

+----+----------+--------+-----+ 
|area|memberType|product |total| 
+----+----------+--------+-----+ 
|null|null |铂金会员3个月 |240 | 
|null|铂金会员 |null |360 | 
|上海 |null |null |155 | 
|null|钻石会员 |null |520 | 
|null|null |钻石会员12个月|300 | 
|null|null |黄金会员12个月|180 | 
|null|null |钻石会员3个月 |70 | 
|null|null |黄金会员3个月 |135 | 
|深圳 |null |null |915 | 
|null|null |钻石会员1个月 |150 | 
|null|null |黄金会员1个月 |30 | 
|null|黄金会员 |null |345 | 
|北京 |null |null |155 | 
|null|null |铂金会员6个月 |120 | 
+----+----------+--------+-----+

3.rollup

rollup 是根据维度在数据结果集中进行的聚合操作。

group by A,B,C with rollup首先会对(A、B、C)进行group by,然后对(A、B)进行group by,然后是(A)进行group by,最后对全表进行group by操作。

group by A,B,C with rollup等价于grouping sets((A,B,C),(A,B),(A),())

select area,memberType,product,sum(price) as total from orderTempTable group by area,memberType,product with rollup 
-- 或者 
select area,memberType,product,sum(price) as total from orderTempTable group by rollup(area,memberType,product)

输出结果中,可以group by A,B,C with rollup,的确是上述几种group by的并集。

+----+----------+--------+-----+ 
|area|memberType|product |total| 
+----+----------+--------+-----+ 
|上海 |null |null |155 | 
|北京 |铂金会员 |null |60 | 
|北京 |钻石会员 |null |50 | 
|上海 |钻石会员 |钻石会员1个月 |50 | 
|深圳 |黄金会员 |黄金会员1个月 |30 | 
|深圳 |钻石会员 |钻石会员12个月|300 | 
|北京 |黄金会员 |黄金会员3个月 |45 | 
|深圳 |钻石会员 |钻石会员3个月 |70 | 
|北京 |铂金会员 |铂金会员3个月 |60 | 
|上海 |铂金会员 |null |60 | 
|上海 |钻石会员 |null |50 | 
|深圳 |黄金会员 |null |255 | 
|深圳 |null |null |915 | 
|上海 |黄金会员 |黄金会员3个月 |45 | 
|深圳 |铂金会员 |铂金会员3个月 |120 | 
|深圳 |钻石会员 |钻石会员1个月 |50 | 
|上海 |铂金会员 |铂金会员3个月 |60 | 
|北京 |黄金会员 |null |45 | 
|深圳 |铂金会员 |null |240 | 
|null|null |null |1225 | 
|深圳 |钻石会员 |null |420 | 
|北京 |null |null |155 | 
|北京 |钻石会员 |钻石会员1个月 |50 | 
|深圳 |黄金会员 |黄金会员12个月|180 | 
|深圳 |铂金会员 |铂金会员6个月 |120 | 
|深圳 |黄金会员 |黄金会员3个月 |45 | 
|上海 |黄金会员 |null |45 | 
+----+----------+--------+-----+

4.cube

group by A,B,C with cube,则首先会对(A、B、C)进行group by,然后依次是(A、B),(A、C),(A),(B、C),(B),( C),最后对全表进行group by操作。

select area,memberType,product,sum(price) as total from orderTempTable group by area,memberType,product with cube 
-- 或者 
select area,memberType,product,sum(price) as total from orderTempTable group by cube(area,memberType,product)
+----+----------+--------+-----+ 
|area|memberType|product |total| 
+----+----------+--------+-----+ 
|深圳 |null |黄金会员12个月|180 | 
|深圳 |null |钻石会员3个月 |70 | 
|深圳 |null |黄金会员3个月 |45 | 
|null|null |铂金会员3个月 |240 | 
|北京 |null |铂金会员3个月 |60 | 
|null|铂金会员 |null |360 | 
|上海 |null |null |155 | 
|北京 |铂金会员 |null |60 | 
|null|钻石会员 |null |520 | 
|北京 |钻石会员 |null |50 | 
|上海 |钻石会员 |钻石会员1个月 |50 | 
|深圳 |黄金会员 |黄金会员1个月 |30 | 
|null|null |钻石会员12个月|300 | 
|深圳 |钻石会员 |钻石会员12个月|300 | 
|null|黄金会员 |黄金会员12个月|180 | 
|null|铂金会员 |铂金会员6个月 |120 | 
|null|黄金会员 |黄金会员3个月 |135 | 
|深圳 |null |钻石会员1个月 |50 | 
|深圳 |null |黄金会员1个月 |30 | 
|北京 |黄金会员 |黄金会员3个月 |45 | 
|null|null |黄金会员12个月|180 | 
|上海 |null |铂金会员3个月 |60 | 
|null|null |钻石会员3个月 |70 | 
|深圳 |钻石会员 |钻石会员3个月 |70 | 
|null|null |黄金会员3个月 |135 | 
|北京 |铂金会员 |铂金会员3个月 |60 | 
|北京 |null |黄金会员3个月 |45 | 
|上海 |铂金会员 |null |60 | 
|上海 |钻石会员 |null |50 | 
|深圳 |黄金会员 |null |255 | 
|null|黄金会员 |黄金会员1个月 |30 | 
|深圳 |null |null |915 | 
|null|钻石会员 |钻石会员12个月|300 | 
|上海 |黄金会员 |黄金会员3个月 |45 | 
|深圳 |铂金会员 |铂金会员3个月 |120 | 
|null|null |钻石会员1个月 |150 | 
|深圳 |钻石会员 |钻石会员1个月 |50 | 
|null|null |黄金会员1个月 |30 | 
|北京 |null |钻石会员1个月 |50 | 
|上海 |铂金会员 |铂金会员3个月 |60 | 
|上海 |null |黄金会员3个月 |45 | 
|null|钻石会员 |钻石会员3个月 |70 | 
|深圳 |null |铂金会员6个月 |120 | 
|null|黄金会员 |null |345 | 
|北京 |黄金会员 |null |45 | 
|深圳 |null |铂金会员3个月 |120 | 
|深圳 |铂金会员 |null |240 | 
|null|null |null |1225 | 
|深圳 |钻石会员 |null |420 | 
|北京 |null |null |155 | 
|null|铂金会员 |铂金会员3个月 |240 | 
|上海 |null |钻石会员1个月 |50 | 
|null|钻石会员 |钻石会员1个月 |150 | 
|深圳 |null |钻石会员12个月|300 | 
|北京 |钻石会员 |钻石会员1个月 |50 | 
|深圳 |黄金会员 |黄金会员12个月|180 | 
|深圳 |铂金会员 |铂金会员6个月 |120 | 
|深圳 |黄金会员 |黄金会员3个月 |45 | 
|null|null |铂金会员6个月 |120 | 
|上海 |黄金会员 |null |45 | 
+----+----------+--------+-----+

grouping的使用,当维度中有null值是,和汇聚结果的null值分不开,可以使用grouping()来处理:

create table temp_city as 
select 'beijing' city,100 as cnt 
union all 
select 'beijing' city,200 as cnt 
union all 
select 'shanghai' city,1000 as cnt 
union all 
select 'shanghai' city,2000 as cnt 
union all 
select null as city,10000 as cnt 
;
-------------
select IF(GROUPING(city)=0,city,'all') as city
    ,sum(cnt)as cnt 
from temp_city
group by cube(city)
;
-------结果------
city	cnt
\N	10000
all	12300
beijing	300
shanghai	2000

 

5.应用

经过group by、grouping sets、rollup和cube处理过后的数据可以直接存储到MySQL等数据库中。

参考链接:

https://www.jianshu.com/p/45cf609f5a61

https://www.cnblogs.com/springy/p/9375236.html

https://blog.csdn.net/BingoXing/article/details/81513891

https://my.oschina.net/u/2000675/blog/2050360

grouping_id : https://docs.microsoft.com/zh-cn/previous-versions/sql/sql-server-2008-r2/bb510624(v=sql.105)

  • 5
    点赞
  • 24
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值