本文中介绍Hive中的sum、avg、min和max四种窗口函数。
1、数据准备
数据如下所示:
cookie1 2015-04-10 1
cookie1 2015-04-11 5
cookie1 2015-04-12 7
cookie1 2015-04-13 3
cookie1 2015-04-14 2
cookie1 2015-04-15 4
cookie1 2015-04-16 4
cookie2 2015-04-10 2
cookie2 2015-04-11 3
cookie2 2015-04-12 5
cookie2 2015-04-13 6
cookie2 2015-04-14 3
cookie2 2015-04-15 9
cookie2 2015-04-16 7
2、SUM函数
select cookieid, createtime,pv,
sum(pv) over(partition by cookieid order by createtime) as sumpv1,
sum(pv) over(partition by cookieid order by createtime rows between unbounded preceding and current row) as sumpv2,
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and current row) as sumpv3,
sum(pv) over(partition by cookieid order by createtime rows between 3 preceding and 1 following) as sumpv4,
sum(pv) over(partition by cookieid order by createtime rows between current row and unbounded following) as sumpv5
from cookies;
结果:
2、AVG函数
SELECT cookieid,
createtime,
pv,
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS avgpv1,
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS avgpv2,
AVG(pv) OVER(PARTITION BY cookieid) AS avgpv3,
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS avgpv4,
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS avgpv5,
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS avgpv6
FROM cookies;
结果:
后面的MIN和MAX函数与前两种方式一样,在这里就不一一演示。