hive窗口函数使用

hive窗口函数基本面试面sql必考项目,这里画一下重点猴,那现在就操练起来~
窗口函数格式:

function_name(field) over (partition by field order by field rows between unbounded preceding and current row)

窗口说明:
n PRECEDING:往前n行数据
n FOLLOWING:往后n行数据
CURRENT ROW:当前行
UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点 UNBOUNDED FOLLOWING:表示到后面的终点
默认是自起点到当前行

数据准备:

create table study_test
(
student_id bigint comment '学生',
class_name string comment '课程名称',
score double comment '成绩'
);
insert into table study_test values(1,'语文',66.6)
,(1,'数学',33.6) ,(1,'英语',55.6) 
,(2,'语文',66.6)
,(2,'数学',88.6) ,(2,'英语',35.6) 
,(3,'语文',46.6)
,(3,'数学',58.6) ,(3,'英语',95.6) 
,(4,'语文',96.6)
,(4,'数学',98.6) ,(4,'英语',85.6)
; 

1.常用的排名函数 row_number,rank,dense_rank
1.1 row_number(),排名从1开始增加,无重复并列情况
1.2 rank(),有并列,排名相等会在名次中留下空位
1.3 dense_rank(),有并列,排名相等会在名次中不留空位
测试case:

select 
*,
row_number()over(partition by class_name order by score desc) rnk_1,
rank()over(partition by class_name order by score desc) rnk_1,
dense_rank()over(partition by class_name order by score desc) rnk_1
from study_test

结果:


8940211-8cf78115bcd7ff47.png
排序名次

2.sum,avg,min,max等窗口函数
测试case:

select 
*,
row_number()over(partition by class_name order by score desc) rnk , --排名
sum(score)over(partition by class_name order by score desc) as sum_all, -- 分组所有行
sum(score)over(partition by class_name order by score desc) as sum_1, --默认起点到当前行
sum(score)over(partition by class_name order by score desc rows between unbounded preceding and current row)  as sum_2, --同上
sum(score)over(partition by class_name order by score desc rows between 1 preceding and current row) as sum_3, -- 往前一行到当前行
sum(score)over(partition by class_name order by score desc rows between current row and unbounded following) as sum_4, -- 当前行到往后所有行
sum(score)over(partition by class_name order by score desc rows between 1 preceding and 1 following ) as sum_5 -- 往前一行+当前行+往后一行
from study_test
where class_name='数学'

结果:

student_nameclass_namescorernksum_allsum_1sum_2sum_3sum_4sum_5
小小数学98.6198.698.698.698.6279.4187.2
小白数学88.62187.2187.2187.2187.2180.8245.8
小花数学58.63245.8245.8245.8147.292.2180.8
小明数学33.64279.4279.4279.492.233.692.2

3.lead(),lag()函数,first_value(),last_value()

lead(col,n,default) 用于统计窗口内往下第n行值:
第一个参数为列名;
第二个参数为往下第n行(可选,默认为1);
第三个参数为默认值(当往下第n行为NULL时候,取默认值,如不指定,则为NULL);
同理,lag(col,n,default) 用于统计窗口内往上第n行值。
first_value:取分组内排序后,截止到当前行,第一个值
last_value:取分组内排序后,截止到当前行,最后一个值

测试 case:

select 
*,
row_number()over(partition by class_name order by score desc) rnk , --排名
lead(score,1,0.999)over(partition by class_name order by score desc) as lead_1, -- 往后一行,默认0.999
lead(score,1)over(partition by class_name order by score desc) as lead_2, -- 往后一行,默认null
lag(score,1,999.999)over(partition by class_name order by score desc) as lag_1, -- 往前一行,默认999.999
lag(score,1)over(partition by class_name order by score desc) as lag_2, -- 往前一行,默认null
first_value(score)over(partition by class_name order by score desc) as first_data,
last_value(score)over(partition by class_name order by score desc) as last_data,
last_value(score)over(partition by class_name) as last_data_2
from study_test
where class_name='数学'

测试结果:


8940211-7fccfac0a6803f9d.png

8940211-3703b0ae096473a2.png
任乌拉
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值