很多同学可能对窗口函数,但是他确实已经出现在了sqlserver,oracle,mysql 等关系型数据库中
以及hive ,presto等数仓中。使用 窗口函数确实能够为我们解决很多sql难题。
下面我通过一些经典的sql面试题,讲解如何通过窗口函数,解决。
1.用SQL选出每个人成绩的最高的前两条纪录
原表记录:
zs,math,75
zs,chinese,81
lisi,chinese,76
lisi,math,89
wangwu,math,100
wangwu,chinese,81
wangwu,english,99
zhaoliu,math,16
zhaoliu,english,48
zhaoliu,chinese,59
需要对name进行分组,对成绩进行排名,所以考虑使用窗口函数,目前mysql 8.0支持窗口函数,该操作是在hive上执行的查询结果:
select * from (select *, rank() over(partition by name order by fenshu desc) as flag from studentss)t where t.flag<=2;
lisi math 89 1
lisi chinese 76 2
wangwu math 100 1
wangwu english 99 2
zhaoliu chinese 59 1
zhaoliu english 48 2
zs chinese 81 1
zs math 75 2
2.查询成绩表中每一科成绩最高的分数以及这个学生的名字,学科名,面试的时候碰到的问题
select * from (select *, rank() over(partition by kecheng order by fenshu desc) as flag from studentss)t where t.flag<=1;
wangwu chinese 81 1
zs chinese 81 1
wangwu english 99 1
wangwu math 100 1
3.学生成绩表,把每科最高分前两名统计出来学生成绩表,把每科最高分前两名统计出来
select * from (select *, rank() over(partition by kecheng order by fenshu desc) as flag from studentss)t where t.flag<=2;
wangwu chinese 81 1
zs chinese 81 1
wangwu english 99 1
zhaoliu english 48 2
wangwu math 100 1
lisi math 89 2
4.写一条sql删除订单表中重复的记录
sql中常规做法:
delete from studentss where bianhao not in (
select t.no from (
select min(bianhao) no from studentss group by xuehao,name,kechenghao,kecheng,fenshu
)t
)
通过窗口函数的做法:
select * from (select *, ROW_NUMBER() over(partition by name,kecheng,fenshu) as flag from studentss)t where flag>1;
5.一张网页浏览信息表,有两列,一列是网页ip,一列是浏览网页的用户(比如a或者b、c、d直到z),求这些网页被a和b或者a和c或者b和c两两组合访问的次数
这个题目源于,有一张team表,表种有a,b,c,d 四个队伍,求他们之间两两进行白赛的对阵信息。
原表:
a
b
c
d
select * from a,b where a.name>b.name;
b a
c a
d a
c b
d b
d c
6.找出单科成绩高于该科平均成绩的同学名单(无论该学生有多少科,只要有一科满足即可)
我的做法应该是比较笨的做法,
select distinct name from students left join (
select kecheng,avg(fenshu) as avg_score from students group by kecheng
) tmp on students.kecheng=tmp.kecheng where students.fenshu>tmp.avg_score;
下面是使用窗口函数 解决上面的问题:
step1 : 通过窗口函数,对课程分组, 求课程的平均成绩:
select * from (select *, AVG(fenshu) over(partition by kecheng) as flag from studentss)t;
zhaoliu chinese 59 74.25
wangwu chinese 81 74.25
lisi chinese 76 74.25
zs chinese 81 74.25
zhaoliu english 48 73.5
wangwu english 99 73.5
zhaoliu math 16 70.0
wangwu math 100 70.0
lisi math 89 70.0
zs math 75 70.0
step2:过滤出 成绩大于平均成绩的学生
select * from (select *, AVG(fenshu) over(partition by kecheng) as flag from studentss)t where fenshu >t.flag;
wangwu chinese 81 74.25
lisi chinese 76 74.25
zs chinese 81 74.25
wangwu english 99 73.5
wangwu math 100 70.0
lisi math 89 70.0
zs math 75 70.0
7.找出单科成绩高于该科平均成绩的同学名单(该学生所有科都必须满足)
select distinct name from students where name not in(
select distinct name from students left join (
select kecheng,avg(fenshu) as avg_score from students group by kecheng
) tmp on students.kecheng=tmp.kecheng where students.fenshu<tmp.avg_score
);
窗口函数(因为hive是不支持子查询的,所以这里用left outer join 替代)
select distinct a.name from studentss a left outer join (select * from (select *, AVG(fenshu) over(partition by kecheng) as flag from studentss)t where t.flag<=fenshu) b on a.name = b.name and a.kecheng=b.kecheng and a.fenshu=b.fenshu and b.fenshu is not null;
未完待续
之后还会补充。。。。