Hive的开窗函数

一、窗口函数 ROW_NUMBER,RANK,DENSE_RANK
1、数据准备
 
cookie1,2021-06-10,1
 
cookie1,2021-06-11,5
 
cookie1,2021-06-12,7
 
cookie1,2021-06-13,3
 
cookie1,2021-06-14,2
 
cookie1,2021-06-15,4
 
cookie1,2021-06-16,4
 
cookie2,2021-06-10,2
 
cookie2,2021-06-11,3
 
cookie2,2021-06-12,5
 
cookie2,2021-06-13,6
 
cookie2,2021-06-14,3
 
cookie2,2021-06-15,9
 
cookie2,2021-06-16,7
 
 
 
CREATE TABLE it_t1 (
 
cookieid string,
 
createtime string,   --day
 
pv INT
 
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
 
  
 
-- 加载数据:
 
load data local inpath '/export/data/hivedatas/it_t2.txt' into table it_t1;
​​​​​​​2、ROW_NUMBER
ROW_NUMBER()  从1开始,按照顺序,生成分组内记录的序列

SELECT 
 
  cookieid,
 
  createtime,
 
  pv,
 
  ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn
 
  FROM it_t1;
​​​​​​​3、RANK 和 DENSE_RANK
RANK() 生成数据项在分组中的排名,排名相等会在名次中留下空位

DENSE_RANK() 生成数据项在分组中的排名,排名相等会在名次中不会留下空位

SELECT 
 
cookieid,
 
createtime,
 
pv,
 
RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn1,
 
DENSE_RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn2,
 
ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv DESC) AS rn3
 
FROM it_t1
 
WHERE cookieid = 'cookie1';
​​​​​​​二、Hive分析窗口函数 SUM,AVG,MIN,MAX
​​​​​​​1、数据准备
--建表语句:
 
create table it_t2(
 
cookieid string,
 
createtime string,   --day
 
pv int
 
) row format delimited
 
fields terminated by ',';
 
 
 
--加载数据:
 
load data local inpath '/root/hivedata/ it_t2.txt' into table it_t2;
 
 
 
--开启智能本地模式
 
SET hive.exec.mode.local.auto=true;
​​​​​​​2、SUM(结果和ORDER BY相关,默认为升序)
select cookieid,createtime,pv,
 
sum(pv) over(partition by cookieid order by createtime) as pv1
 
from it_t2;
 
 
 
select cookieid,createtime,pv,
 
sum(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
 
from it_t2;
 
 
 
select cookieid,createtime,pv,
 
sum(pv) over(partition by cookieid) as pv3
 
from it_t2;  --如果没有order  by排序语句  默认把分组内的所有数据进行sum操作
 
 
 
select cookieid,createtime,pv,
 
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as pv4
 
from it_t2;
 
 
 
select cookieid,createtime,pv,
 
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5
 
from it_t2;
 
 
 
select cookieid,createtime,pv,
 
sum(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as pv6
 
from it_t2;
 
 
 
--pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
 
--pv2: 同pv1
 
--pv3: 分组内(cookie1)所有的pv累加
 
--pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号,
 
                       13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
 
--pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
 
--pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,
 
 14号=14号+15号+16号=2+4+4=10
 
 
 
/*
- 如果不指定rows between,默认为从起点到当前行;
- 如果不指定order by,则将分组内所有值累加;
- 关键是理解rows between含义,也叫做window子句:
  - preceding:往前
  - following:往后
  - current row:当前行
  - unbounded:起点
  - unbounded preceding 表示从前面的起点
  - unbounded following:表示到后面的终点
 */ 
​​​​​​​3、AVG,MIN,MAX
AVG,MIN,MAX和SUM用法一样

select cookieid,createtime,pv,
 
avg(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
 
from it_t2;
 
 
 
select cookieid,createtime,pv,
 
max(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
 
from it_t2;
 
 
 
select cookieid,createtime,pv,
 
min(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as pv2
 
from it_t2;

————————————————
版权声明:本文为CSDN博主「Lansonli」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/xiaoweite1/article/details/117933213

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值