https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
wzj,20200201,1
wzj,20200202,3
wzj,20200203,5
wzj,20200204,7
wzj,20200205,9
wzj,20200206,11
wzj,20200207,13
wzj,20200208,15
OVER with standard aggregates:
COUNT
SUM
MIN
MAX
AVG
hive (wzj)> select name,time,sal,
> sum(sal) over (partition by name order by time ) sal1,
> sum(sal) over (partition by name order by time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) sal2,
> sum(sal) over (partition by name order by time ROWS BETWEEN 3 PRECEDING AND CURRENT ROW ) sal3,
> sum(sal) over (partition by name order by time ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING ) sal4,
> sum(sal) over (partition by name order by time ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) sal5
> from window_data;
Ended Job = job_1585756506667_0004
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.5 sec HDFS Read: 11867 HDFS Write: 232 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 500 msec
OK
name time sal sal1 sal2 sal3 sal4 sal5
wzj 20200201 1 1 1 1 4 64
wzj 20200202 3 4 4 4 9 63
wzj 20200203 5 9 9 9 16 60
wzj 20200204 7 16 16 16 25 55
wzj 20200205 9 25 25 24 35 48
wzj 20200206 11 36 36 32 45 39
wzj 20200207 13 49 49 40 55 28
wzj 20200208 15 64 64 48 48 15
Time taken: 29.764 seconds, Fetched: 8 row(s)
Analytics functions:
RANK
ROW_NUMBER
DENSE_RANK
CUME_DIST
PERCENT_RANK
NTILE
- NTILE 分区
wzj,20200201,1
wzj,20200202,3
wzj,20200203,5
wzj,20200204,7
wzj,20200205,9
wzj,20200206,11
wzj,20200207,13
wzj,20200208,15
jerry,20200205,9
jerry,20200206,11
jerry,20200207,13
jerry,20200208,15
hive (wzj)> select name,time,sal,
> NTILE(2) over (partition by name order by time ) sal1,
> NTILE(3) over (partition by name order by time ) sal2,
> NTILE(4) over (partition by name order by time ) sal3
> from window_rownumber_data;
MapReduce Total cumulative CPU time: 3 seconds 500 msec
Ended Job = job_1585756506667_0005
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.5 sec HDFS Read: 10145 HDFS Write: 266 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 500 msec
OK
name time sal sal1 sal2 sal3
jerry 20200205 9 1 1 1
jerry 20200206 11 1 1 2
jerry 20200207 13 2 2 3
jerry 20200208 15 2 3 4
wzj 20200201 1 1 1 1
wzj 20200202 3 1 1 1
wzj 20200203 5 1 1 2
wzj 20200204 7 1 2 2
wzj 20200205 9 2 2 3
wzj 20200206 11 2 2 3
wzj 20200207 13 2 3 4
wzj 20200208 15 2 3 4
Time taken: 27.902 seconds, Fetched: 12 row(s)
- ROW_NUMBER()/DENSE_RANK()/RANK()
RANK():分组内排序,名次相同的之后保留空位
DENSE_RANK():分组内排序,名次相同的之后不保留空位
hive (wzj)> select name,time,sal,
> ROW_NUMBER() over (partition by name order by sal ) sal1,
> DENSE_RANK() over (partition by name order by sal ) sal2,
> RANK() over (partition by name order by sal ) sal3
> from window_rownumber_data;
MapReduce Total cumulative CPU time: 3 seconds 260 msec
Ended Job = job_1585756506667_0009
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.26 sec HDFS Read: 10553 HDFS Write: 266 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 260 msec
OK
name time sal sal1 sal2 sal3
jerry 20200205 9 1 1 1
jerry 20200206 11 2 2 2
jerry 20200207 13 3 3 3
jerry 20200208 15 4 4 4
wzj 20200201 1 1 1 1
wzj 20200202 3 2 2 2
wzj 20200203 5 3 3 3
wzj 20200205 7 4 4 4
wzj 20200204 7 5 4 4
wzj 20200206 11 6 5 6
wzj 20200207 13 7 6 7
wzj 20200208 15 8 7 8
Time taken: 28.946 seconds, Fetched: 12 row(s)
-
CUME_DIST:小于等于当前行值的行数/分组内的总行数
-
PERCENT_RANK:分组内当前行的rank-1/分组内总行数的rank-1
Windowing functions
LEAD(col,n,default)往下取值,取不到则取默认值
The number of rows to lead can optionally be specified. If the number of rows to lead is not specified, the lead is one row.
Returns null when the lead for the current row extends beyond the end of the window.
LAG(col,n,default) 往上取值,取不到则取默认值
hive (wzj)> select name,time,sal,
> LAG(sal,1) over (partition by name order by sal ) sal1
> from window_rownumber_data;
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.28 sec HDFS Read: 9325 HDFS Write: 224 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 280 msec
OK
name time sal sal1
jerry 20200205 9 NULL
jerry 20200206 11 9
jerry 20200207 13 11
jerry 20200208 15 13
wzj 20200201 1 NULL
wzj 20200202 3 1
wzj 20200203 5 3
wzj 20200205 7 5
wzj 20200204 7 7
wzj 20200206 11 7
wzj 20200207 13 11
wzj 20200208 15 13
Time taken: 26.286 seconds, Fetched: 12 row(s)
欢迎关注公众号,一起愉快的交流