窗口函数

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

wzj,20200201,1
wzj,20200202,3
wzj,20200203,5
wzj,20200204,7
wzj,20200205,9
wzj,20200206,11
wzj,20200207,13
wzj,20200208,15


OVER with standard aggregates:
COUNT
SUM
MIN
MAX
AVG


hive (wzj)> select name,time,sal,
          > sum(sal) over (partition by name order by time ) sal1,
          > sum(sal) over (partition by name order by time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) sal2,
          > sum(sal) over (partition by name order by time ROWS BETWEEN 3  PRECEDING AND CURRENT ROW ) sal3,
          > sum(sal) over (partition by name order by time ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING ) sal4,
          > sum(sal) over (partition by name order by time ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) sal5
          > from window_data;
Ended Job = job_1585756506667_0004
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.5 sec   HDFS Read: 11867 HDFS Write: 232 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 500 msec
OK
name	time	sal	sal1	sal2	sal3	sal4	sal5
wzj	20200201	1	1	1	1	4	64
wzj	20200202	3	4	4	4	9	63
wzj	20200203	5	9	9	9	16	60
wzj	20200204	7	16	16	16	25	55
wzj	20200205	9	25	25	24	35	48
wzj	20200206	11	36	36	32	45	39
wzj	20200207	13	49	49	40	55	28
wzj	20200208	15	64	64	48	48	15
Time taken: 29.764 seconds, Fetched: 8 row(s)

Analytics functions:

RANK

ROW_NUMBER

DENSE_RANK

CUME_DIST

PERCENT_RANK

NTILE

  • NTILE 分区
wzj,20200201,1
wzj,20200202,3
wzj,20200203,5
wzj,20200204,7
wzj,20200205,9
wzj,20200206,11
wzj,20200207,13
wzj,20200208,15
jerry,20200205,9
jerry,20200206,11
jerry,20200207,13
jerry,20200208,15



hive (wzj)> select name,time,sal,
          > NTILE(2) over (partition by name order by time ) sal1,
          > NTILE(3) over (partition by name order by time ) sal2,
          > NTILE(4) over (partition by name order by time ) sal3
          > from window_rownumber_data;
MapReduce Total cumulative CPU time: 3 seconds 500 msec
Ended Job = job_1585756506667_0005
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.5 sec   HDFS Read: 10145 HDFS Write: 266 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 500 msec
OK
name	time	   sal sal1 sal2	sal3
jerry	20200205	9	1	1	1
jerry	20200206	11	1	1	2
jerry	20200207	13	2	2	3
jerry	20200208	15	2	3	4
wzj	    20200201	1	1	1	1
wzj	    20200202	3	1	1	1
wzj	    20200203	5	1	1	2
wzj	    20200204	7	1	2	2
wzj	    20200205	9	2	2	3
wzj	    20200206	11	2	2	3
wzj	    20200207	13	2	3	4
wzj	    20200208	15	2	3	4
Time taken: 27.902 seconds, Fetched: 12 row(s)

  • ROW_NUMBER()/DENSE_RANK()/RANK()

RANK():分组内排序,名次相同的之后保留空位

DENSE_RANK():分组内排序,名次相同的之后不保留空位

hive (wzj)> select name,time,sal,
          > ROW_NUMBER() over (partition by name order by sal ) sal1,
          > DENSE_RANK() over (partition by name order by sal ) sal2,
          > RANK() over (partition by name order by sal ) sal3
          > from window_rownumber_data;
MapReduce Total cumulative CPU time: 3 seconds 260 msec
Ended Job = job_1585756506667_0009
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.26 sec   HDFS Read: 10553 HDFS Write: 266 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 260 msec
OK
name	time	sal	sal1	sal2	sal3
jerry	20200205	9	1	1	1
jerry	20200206	11	2	2	2
jerry	20200207	13	3	3	3
jerry	20200208	15	4	4	4
wzj	    20200201	1	1	1	1
wzj	    20200202	3	2	2	2
wzj	    20200203	5	3	3	3
wzj	    20200205	7	4	4	4
wzj	    20200204	7	5	4	4
wzj	    20200206	11	6	5	6
wzj	    20200207	13	7	6	7
wzj	    20200208	15	8	7	8
Time taken: 28.946 seconds, Fetched: 12 row(s)

  • CUME_DIST:小于等于当前行值的行数/分组内的总行数

  • PERCENT_RANK:分组内当前行的rank-1/分组内总行数的rank-1

Windowing functions

LEAD(col,n,default)往下取值,取不到则取默认值

The number of rows to lead can optionally be specified. If the number of rows to lead is not specified, the lead is one row.
Returns null when the lead for the current row extends beyond the end of the window.

LAG(col,n,default) 往上取值,取不到则取默认值

hive (wzj)> select name,time,sal,
          > LAG(sal,1) over (partition by name order by sal ) sal1
          > from window_rownumber_data;
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.28 sec   HDFS Read: 9325 HDFS Write: 224 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 280 msec
OK
name	time	sal	sal1
jerry	20200205	9	NULL
jerry	20200206	11	9
jerry	20200207	13	11
jerry	20200208	15	13
wzj	    20200201	1	NULL
wzj	    20200202	3	1
wzj	    20200203	5	3
wzj	    20200205	7	5
wzj	    20200204	7	7
wzj	    20200206	11	7
wzj	    20200207	13	11
wzj	    20200208	15	13
Time taken: 26.286 seconds, Fetched: 12 row(s)


欢迎关注公众号,一起愉快的交流
在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值