HiveSQL——不及格课程数大于2的学生的平均成绩及其排名

注:参考文章:

SQL 不及格课程数大于2的学生的平均成绩及其排名-HQL面试题47【拼多多】_sql 不及格人数超过两人-CSDN博客文章浏览阅读976次。0 问题描述create table scores( sid int, score int, cid int);insert into scores values(1, 90, 1),(1, 59, 2),(1, 67, 3),(2, 20, 1),(2, 30, 2),(2, 40, 3),(3, 14, 1),(3, 13, 2),(3, 15, 3),(4, 90, 1),(4, 90, 2),(4, 87, 3);1 数据分析..._sql 不及格人数超过两人https://blog.csdn.net/godlovedaniel/article/details/119858725

0 问题描述

   求不及格课程数大于2的学生的平均成绩及其成绩平均值后所在的排名。(成绩小于60分的判定为不及格)

1 数据准备

create table scores
(
  sid int,
  score int,
  cid int
)row format delimited
fields terminated by '\t';

insert into scores values
(1, 90, 1),
(1, 59, 2),
(1, 67, 3),
(2, 20, 1),
(2, 30, 2),
(2, 40, 3),
(3, 14, 1),
(3, 13, 2),
(3, 15, 3),
(4, 90, 1),
(4, 90, 2),
(4, 87, 3);

2 数据分析

完整的代码如下:

select
    t3.sid,
    t3.avg_score,
    t3.dr
from (select 
          sid,
          avg_score,
          row_number() over (order by avg_score desc) dr
      from (select
                sid,
                avg(score) as avg_score 
            from scores group by sid) t2) t3
 join (select
              sid
        from scores
        group by sid
        having sum(if(score < 60, 1, 0)) >= 2) t1
  on t3.sid = t1.sid;

代码解析:

step1 :数据打标,成绩小于60的标1; 筛选出不及格课程数大于2的学生信息

select
    sid
from (select *,
             if(score < 60, 1, 0) as flag
      from scores) t1
group by sid
having sum(flag) >= 2

简化版本为:

select sid
from scores
 group by sid
 having sum(if(score <60,1,0)) >=2;

step2 :求学生平均成绩及其排名

select 
     sid,
     avg_score,
     row_number() over (order by avg_score desc) dr
from (select
           sid,
           avg(score) as avg_score
      from scores group by sid) t2;

ps: 这里针对avg_score 平均值排名,因为同一个sid, avg_score有重复值,所以排名需要只能用dense_rank,最后再用distinct 进行去重

step3: 基于step2的结果,与step1的结果进行关联,过滤出最终的结果。最终SQL如下:

select
    t3.sid,
    t3.avg_score,
    t3.dr
from (select 
          sid,
          avg_score,
          row_number() over (order by avg_score desc) dr
      from (select
                sid,
                avg(score) as avg_score 
            from scores group by sid) t2) t3
 join (select
              sid
        from scores
        group by sid
        having sum(if(score < 60, 1, 0)) >= 2) t1
  on t3.sid = t1.sid;

3 小结

   本案例主要涉及到开窗函数及多表关联的使用。需要注意hive中不支持in查询,因此借助join等关联手段代替。

  • 16
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值