HQL max+struct实现开窗功能
一、需求
这里有一张班级分数score表:
class | name | score |
---|---|---|
1 | 11 | 88 |
1 | 12 | 92 |
1 | 13 | 99 |
2 | 21 | 66 |
2 | 22 | 98 |
2 | 23 | 89 |
要求查找每个班级分数最高的人及其分数,输出如下:
class | name | score |
---|---|---|
1 | 13 | 99 |
2 | 22 | 98 |
二、实现
1.使用开窗函数实现
select class
, name
, score
from (select class
, name
, score
, row_number() over (partition by class order by score desc) rn
from score
) t
where rn = 1
说明:该方法为开窗函数的基本用法。
2.使用max+struct实现
step1
select class
, max(struct(score, name)) as tmp
from score
group by class
输出:
class | tmp |
---|---|
1 | {“col1”:99,“col2”:“13”} |
2 | {“col1”:98,“col2”:“22”} |
说明:struct里的字段,按顺序比较,若第一个字段(score)的值相等,则比较第二个字段(name),保留最大的一条数据。tmp的类型为struct。
step2
select class
, tmp.col2 as name
, tmp.col1 as score
from (
select class
, max(struct(score, name)) as tmp
from score
group by class
) t
说明:tmp的数据类型为struct,第一个字段(score)在结构体里的别名为col1,所以外层的select对tmp取值时的写法为tmp.co1,其他字段以此类推。
3.两种方法比较
优势一:
减少了排序或关联的操作,有更好的运行效率。
优势二
代码更简洁。尤其是需要同时取最大和最小记录的时候。
如需求改为求每个班的分数最大者和分数最小者,输出如下:
class | name_max | score_max | name_min | score_min |
---|---|---|---|---|
1 | 13 | 99 | 11 | 88 |
2 | 22 | 98 | 21 | 66 |
(1) 使用传统开窗函数
select class
, max(if(rn_max = 1, name, '')) as name_max
, max(if(rn_max = 1, score, 0)) as score_max
, max(if(rn_min = 1, name, '')) as name_min
, max(if(rn_min = 1, score, 0)) as score_min
from (select class
, name
, score
, row_number() over (partition by class order by score desc) rn_max
, row_number() over (partition by class order by score) rn_min
from score
) t
where rn_max = 1
or rn_min = 1
group by class
(2) 使用max/min+struct方法
select class
, tmp_max.col2 as name_max
, tmp_max.col1 as score_max
, tmp_min.col2 as name_min
, tmp_min.col1 as score_min
from (
select class
, max(struct(score, name)) as tmp_max
, min(struct(score, name)) as tmp_min
from score
group by class
) t
三、总结
提供了一种取最大/最小记录的新思路