group by 分组汇总

目录

group by:

grouping sets():

group by ...with rollup:

with cube:

rollup()


group by:

数据分析经常会使用到分组汇总的方法。通过汇总的方法可以更快速地看到数据的特征。

例如查看男女生的比例

spark/presto/hive/mysql:

select
    student_sex,
    count(distinct student_id) as stu_num
    from student_info
group by student_sex

或者

spark/presto/hive/mysql:

select
    student_sex,
    count(distinct student_id) as stu_num
    from student_info 
group by 1

又如想查看男女生不同年龄阶段的学生数

spark/presto/hive/mysql:

select
    student_sex,
    student_age,
    count(distinct student_id) as stu_num
    from student_info 
group by student_sex,student_age

此时计算逻辑是想对学生性别进行分类,再对学生年龄分类

想查看男女生不同年龄阶段的年龄最大和最小分别是多少

spark/presto/hive/mysql:

select
    student_sex,
    max(student_age) as max_age,
    min(student_age) as min_age
    from student_info 
group by student_sex

grouping sets():

grouping sets()分层聚合

如果你不仅想看各维度的分组求和,还想看总的求和,那么久可以用grouping sets()

spark:

select
student_sex,
student_age,
count(distinct student_id) as stu_num
from student_info
group by  
grouping sets((student_sex,student_age))


--相当于
/*
select
student_sex,
student_age,
count(distinct student_id) as stu_num
from student_info 
group by student_sex,student_age

union all

select
null as student_sex,
null as student_age,
count(distinct student_id) as stu_num
from student_info 
*/

group by ...with rollup:

group by ...with rollup 不仅可以对字段汇总,并且还可以聚合

例如我们想看到学生的不同性别的人数以及不同性别不同年龄的

spark:

select
student_sex,
student_age,
count(distinct student_id) as stu_num
from student_info 
group by student_sex,student_age with rollup


/*相当于使用grouping by 
student_sex,
student_age,
count(distinct student_id) as stu_num
from student_info 
grouping by ((student_sex,student_age),(student_sex))
*/


/* 也相当于我们这么写
student_sex,
student_age,
count(distinct student_id) as stu_num
from student_info 
group by student_sex,student_age

union all

select
student_sex,
null as student_age,
count(distinct student_id) as stu_num
from student_info 
group by student_sex

union all

select
null student_sex,
null as student_age,
count(distinct student_id) as stu_num
from student_info 
*/



with cube:

spark:

select
student_sex,
student_age,
count(distinct student_id) as stu_num
from student_info 
group by student_sex,student_age with cube

/*相当于
select
student_sex,
student_age,
count(distinct student_id) as stu_num
from student_info 
group by student_sex,student_age with rollup

union all

select
null as student_sex,
student_age,
count(distinct student_id) as stu_num
from student_info 
group by student_age

with cubewith rollup 的区别在于 with cube 比with rollup 在多个字段汇总时  数据条数更多;

因为with rollup 是跟接近的字段往后结和汇总的,但是with cube 相当于是排列组合的方式,将所有结果一一汇总出来。

rollup()

presto:

select
day,
count(distinct stu_id) as stu_num
group by
    rollup (day)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值