Hive QL: Windowing and Analytics Functions

1.windowing functions

(1)LEAD:The number of rows to lead can optionally be specified. If the number of rows to lead is not specified, the lead is one row. Returns null when the lead for the current row extends beyond the end of the window.

当前行向下多少行可以指定, 如果未指定, 默认为1行. 当向下取超过窗口的范围时, 返回null.

(2)LAG:LAG
The number of rows to lag can optionally be specified. If the number of rows to lag is not specified, the lag is one row.Returns null when the lag for the current row extends before the beginning of the window.

(3)FIRST_VALUE:This takes at most two parameters. The first parameter is the column for which you want the first value, the second (optional) parameter must be a boolean which is false by default. If set to true it skips null values.

FIRST_VALUE取截止到当前行第一个值, 最多接受两个参数, 第一个参数是你想要获取值的列,第二个参数可选, 默认为false, 如果置为true, 则跳过null值.

(4)LAST_VALUE:This takes at most two parameters. The first parameter is the column for which you want the last value, the second (optional) parameter must be a boolean which is false by default. If set to true it skips null values.

2.The OVER clause

2.1 OVER with standard aggregates:

COUNT
SUM
MIN
MAX
AVG

2.2 OVER with a PARTITION BY statement with one or more partitioning columns of any primitive datatype.

带有任何数据类型的一个或多个分区列的partition by语句的OVER

2.3 OVER with PARTITION BY and ORDER BY with one or more partitioning and/or ordering columns of any datatype.

带有任意数据类型的一个或多个分区或排序列的partition by和order by语句的OVER.

  • OVER with a window specification. Windows can be defined separately in a WINDOW clause. Window specifications support the following formats:

带有window定义的OVER, 窗口可以在一个window语句中单独定义, 定义支持一下格式:

(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUND | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING

When ORDER BY is specified with missing WINDOW clause, the WINDOW specification defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

当order by定义没有window语句时, 默认为首行到当前行的范围

When both ORDER BY and WINDOW clauses are missing, the WINDOW specification defaults to ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

当order by和window语句都没有指定时, 默认为首行到末行

The OVER clause supports the following functions, but it does not support a window with them:

Ranking functions: Rank, NTile, DenseRank, CumeDist, PercentRank.

Lead and Lag functions.

OVER语句支持以下函数, 但是当定义了window语句的时候不支持.

3. Analytics functions

RANK
ROW_NUMBER
DENSE_RANK
CUME_DIST
PERCENT_RANK
NTILE

4.Distinct support in Hive 2.1.0 and later

Distinct is supported for aggregation functions including SUM, COUNT and AVG, which aggregate over the distinct values within each partition. Current implementation has the limitation that no ORDER BY or window specification can be supported in the partitioning clause for performance reason. The supported syntax is as follows.

COUNT(DISTINCT a) OVER(PARTITION BY c)

ORDER BY and window specification is supported for distinct in Hive 2.2.0 (see HIVE-13453). An example is as follows.

2.2.0版本支持了带有order by和window子句的distinct

COUNT(DISTINCT a) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)

5.Aggregate functions in OVER clause support in Hive 2.1.0 and later (see HIVE-13475)

2.1.0之后支持了over语句中的聚合函数

Support to reference aggregate functions within the OVER clause has been added. For instance, currently we can use the SUM aggregation function within the OVER clause as follows.

SELECT rank() OVER (ORDER BY sum(b))
FROM T
GROUP BY a;

6.Examples

SELECT a, COUNT(b) OVER (PARTITION BY c)
FROM T;

SELECT a, COUNT(b) OVER (PARTITION BY c, d)
FROM T;

SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d)
FROM T;

SELECT a, SUM(b) OVER (PARTITION BY c, d ORDER BY e, f)
FROM T;

SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM T;

SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN 3 PRECEDING AND CURRENT ROW)
FROM T;

SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING)
FROM T;

SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM T;

SELECT 
 a,
 COUNT(b) OVER (PARTITION BY c) AS b_count,
 SUM(b) OVER (PARTITION BY c) b_sum
FROM T;

6.1window clause

SELECT a, SUM(b) OVER w
FROM T
WINDOW w AS (PARTITION BY c ORDER BY d ROWS UNBOUNDED PRECEDING);

6.2LAG and LEAD

SELECT a, LEAD(a) OVER (PARTITION BY b ORDER BY C)
FROM T;

SELECT a, LAG(a, 3, 0) OVER (PARTITION BY b ORDER BY C)
FROM T;

6.3Distinct

SELECT a, COUNT(distinct a) OVER (PARTITION BY b)
FROM T;
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

uncle_mooncake

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值