Hive创建的窗口函数使用
1.窗口函数理解
- 窗口函数
- 窗口 :函数运行/计算时 所对应的数据集范围
- 函数 :执行的函数
详情见 Hive窗口函数介绍
2.窗口函数应用
2.1 简单使用
2.2 测试数据
hive (ruozedata)> select * from ruozedata_windows;
OK
domain time traffic
ruozedata 2019-04-10 1
ruozedata 2019-04-11 5
ruozedata 2019-04-12 7
ruozedata 2019-04-13 3
ruozedata 2019-04-14 2
ruozedata 2019-04-15 4
ruozedata 2019-04-16 4
2.3 求每个域名每个时间内的和 已经总计(每个相同的域名在同一个窗口内)
执行sql
select
domain, time, traffic,
sum(traffic) OVER (partition by domain order by time) pv1
from ruozedata_windows;
结果
剖析SQL
函数 sum(traffic)
窗口 OVER (partition by domain order by time) pv1
2.4 进阶使用
SQL
select
domain, time, traffic,
sum(traffic) OVER (partition by domain order by time) pv1,
sum(traffic) OVER (partition by domain order by time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) pv2,
sum(traffic) OVER (partition by domain) pv3,
sum(traffic) OVER (partition by domain order by time ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) pv4,
sum(traffic) OVER (partition by domain order by time ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) pv5,
sum(traffic) OVER (partition by domain order by time ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) pv6
from ruozedata_windows
order by time;
执行结果
剖析SQL
2.4.1 pv2
- 函数 sum(traffic)
- 窗口 OVER (partition by domain order by time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) pv2
- 窗口的范围 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
[当前行所有,当前行]
执行过程
2.4.2 pv3
- 函数 sum(traffic)
- 窗口 OVER (partition by domain) pv3
- 窗口的范围 当前窗口所有内容
2.4.3 pv4
- 函数 sum(traffic)
- 窗口 ** OVER (partition by domain order by time ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) pv4**
- 窗口的范围 【当前行前3条,当前行】
2.4.3 pv5
- 函数 sum(traffic)
- 窗口 ** OVER (partition by domain order by time ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) pv5**
- 窗口的范围 【当前行前3条,当前行后一条】
2.4.3 pv6
- 函数 sum(traffic)
- 窗口 ** OVER (partition by domain order by time ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)**
- 窗口的范围 【当前行,当前行后所有行】