案列:求每个班级每科成绩的第1名的学生。
样例数据如下:
这个是典型的分组,组内排序求top N的问题,可以利用rank函数来解决。语法如下:
rank() over([partition by col1] order by col2) :排序时出现相等的值时会有相同的序列值,但会跳跃的排序,如:1,1,3,4,5,5,7
相同的功能的函数语法:
dense_rank() over([partition by col1] order by col2) :排序时出现相等的值时会有相同的序列值,但不会跳跃的排序,如:1,1,2,3,4,4,5
row_number() over([partition by col1] order by col2) :排序时出现相等的值时会有不同的序列值,相同项按照顺序排序,如:1,2,3,4,5
建表:
create external table score (name string,subject string,score string,class string) row format delimited fields terminated by ' ' location '/score'
hql:
select name,subject,score,class from (
select name,subject,score,class,rank() over (partition by class,subject order by score desc) as rank
from score) score_tmp
where rank=1;
结果如下: