hive中有些分析函数功能确实很强大,在和sum,max等聚合函数结合起来能实现不少功能。
直接上代码演示吧
原始数据
- channel1 2016-11-10 1
- channel1 2016-11-11 3
- channel1 2016-11-12 5
- channel1 2016-11-13 6
- channel1 2016-11-14 2
- channel1 2016-11-15 4
- channel2 2016-11-10 5
- channel2 2016-11-11 3
- channel2 2016-11-12 5
- channel2 2016-11-13 2
创建表
- CREATE TABLE test (
- channel string,
- createtime string, --day
- pv INT
- ) ROW FORMAT DELIMITED
- FIELDS TERMINATED BY '\t'
语句以及结果
- SELECT
- channel,
- createtime,
- pv,
- SUM(pv) OVER(PARTITION BY channel ORDER BY createtime) AS pv1,
- SUM(pv) OVER(PARTITION BY channel ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2,
- SUM(pv) OVER(PARTITION BY channel) AS pv3,
- SUM(pv) OVER(PARTITION BY channel ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,
- SUM(pv) OVER(PARTITION BY channel ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,
- SUM(pv) OVER(PARTITION BY channel ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6
- FROM test;
- pv pv1 pv2 pv3 pv4 pv5 pv6
- channel1 2016-11-15 4 21 21 21 17 17 4
- channel1 2016-11-14 2 17 17 21 16 20 6
- channel1 2016-11-13 6 15 15 21 15 17 12
- channel1 2016-11-12 5 9 9 21 9 15 17
- channel1 2016-11-11 3 4 4 21 4 9 20
- channel1 2016-11-10 1 1 1 21 1 4 21
- channel2 2016-11-13 2 15 15 15 15 15 2
- channel2 2016-11-12 5 13 13 15 13 15 7
- channel2 2016-11-11 3 8 8 15 8 13 10
- channel2 2016-11-10 5 5 5 15 5 8 15
分析结果,以channel1 列
pv1 结果是取起点到当前行的积累,比如:10号=10号 11号=10号+11号 12号=10号+11号+12号 以此类推
pv2 和pv1的结果是一样的。
pv3 就是分组channel的pv的聚合
pv4 就是分组内当前行向前+3 行 15号=15+14+13+12 14号=14+13+12+11 13号=13号+12号+11号 以此类推
pv5 就是分组内当前行往前+3行 往后加+1行 15号=14+13+12+11 14号=14号+13号+12号+11号+15号 以此类推
pv6 就是当前行+往后所有行
如果不指定rows between 默认为从起点到当前行 如果不指定order by 默认则将分组内累加
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点