你真的懂Hive窗口函数吗,如何开窗聚合?

本文详细解析了Hive的窗口函数,包括FIRST_VALUE、LAST_VALUE、LEAD、LAG等,以及OVER子句的应用,涵盖标准聚合函数和分析函数,如COUNT、SUM、RANK等,并介绍了窗口范围的设置和自定义窗口的使用。
摘要由CSDN通过智能技术生成

目录

1 窗口函数 Windowing functions

FIRST_VALUE(col, bool DEFAULT)

LAST_VALUE(col, bool DEFAULT)

LEAD(col, n, DEFAULT)

LAG(col, n, DEFAULT)

2 OVER详解 The OVER clause

FUNCTION(expr) OVER([PARTITION BY statement] [ORDER BY statement] [window clause])

2.1 标准聚合函数

COUNT(expr) OVER()

SUM(expr) OVER()

MIN(expr) OVER()

MAX(expr) OVER()

AVG(expr) OVER()

2.2 分析函数 Analytics functions

RANK() OVER()

ROW_NUMBER() OVER()

DENSE_RANK() OVER()

CUME_DIST() OVER()

PERCENT_RANK() OVER()

NTILE(INTEGER x) OVER()

2.3 OVER子句也支持聚合函数

2.4 window clause 的另一种写法


1 窗口函数 Windowing functions

FIRST_VALUE(col, bool DEFAULT)

返回分组窗口内第一行col的值,DEFAULT默认为false,如果指定为true,则跳过NULL后再取值

WITH tmp AS
(
  SELECT 1 AS group_id, 'a' AS col 
  UNION ALL SELECT 1 AS group_id,  'b' AS col 
  UNION ALL SELECT 1 AS group_id,  'c' AS col 
  UNION ALL SELECT 2 AS group_id,  NULL AS col 
  UNION ALL SELECT 2 AS group_id,  'e' AS col
)
SELECT group_id,
       col,
       FIRST_VALUE(col) over(partition by group_id order by col) as col_new
FROM tmp;
group_id col col_new
1 a a
1 b a
1 c a
2 NULL NULL
2 e NULL
WITH tmp AS
(
  SELECT 1 AS group_id, NULL AS col 
  UNION ALL SELECT 1 AS group_id,  'b' AS col 
  UNION ALL SELECT 1 AS group_id,  'c' AS col 
  UNION ALL SELECT 2 AS group_id,  NULL AS col 
  UNION ALL SELECT 2 AS group_id,  'e' AS col
)
SELECT group_id,
       col,
       FIRST_VALUE(col, true) over(partition by group_id order by col) as col_new
FROM tmp;
group_id col col_new
1 NULL NULL
1 b b
1 c b
2 NULL NULL
2 e e

LAST_VALUE(col, bool DEFAULT)

返回分组窗口内最后一行col的值,DEFAULT默认为false,如果指定为true,则跳过NULL后再取值

WITH tmp AS
(
  SELECT 1 AS group_id, 'a' AS col 
  UNION ALL SELECT 1 AS group_id,  NULL AS col 
  UNION ALL SELECT 1 AS group_id,  'c' AS col 
  UNION ALL SELECT 2 AS group_id,  'd' AS col 
  UNION ALL SELECT 2 AS group_id,  'e' AS col
)
SELECT group_id,
       col,
       LAST_VALUE(col) over(partition by group_id order by col desc) as col_new
FROM tmp;
group_id col col_new
1 c c
1 a a
1 NULL NULL
2 e e
2 d d
WITH tmp AS
(
  SELECT 1 AS group_id, 'a' AS col 
  UNION ALL SELECT 1 AS group_id,  NULL AS col 
  UNION ALL SELECT 1 AS group_id,  'c' AS col 
  UNION ALL SELECT 2 AS group_id,  'd' AS col 
  UNION ALL SELECT 2 AS group_id,  'e' AS col
)
SELECT group_id,
       col,
       LAST_VALUE(col, true) over(order by group_id,col desc rows between 1 preceding and 1 following) as col_new
FROM tmp;
group_id col col_new
1 c a
1 a a
1 NULL e
2 e d
2 d d

LEAD(col, n, DEFAULT)

返回分组窗口内往下第n行col的值,n默认为1,往下第n没有时返回DEFAULT(DEFAULT默认为NULL)

WITH tmp AS
(
  SELECT 1 AS group_id, 'a' AS col 
  UNION ALL SELECT 1 AS group_id,  'b' AS col 
  UNION ALL SELECT 1 AS group_id,  'c' AS col 
  UNION ALL SELECT 2 AS group_id,  'd' AS col 
  UNION ALL SELECT 2 AS group_id,  'e' AS col
)
SELECT group_id,
       col,
       LEAD(col) over(partition by group_id order by col) as col_new
FROM tmp;

等同于:

WITH tmp AS
(
  SELECT 1 AS group_id, 'a' AS col 
  UNION ALL SELECT 1 AS group_id,  'b' AS col 
  UNION ALL SELECT 1 AS group_id,  'c' AS col 
  UNION ALL SELECT 2 AS group_id,  'd' AS col 
  UNION ALL SELECT 2 AS group_id,  'e' AS col
)
SELECT group_id,
       col,
       LAST_VALUE(col) over(partition by group_id order by col rows between 1 FOLLOWING and 1 FOLLOWING) as col_new
FROM tmp;

返回结果都是:

group_id col col_new
1 a b
1 b c
1 c NULL
2 d e
2 e NULL
WITH tmp AS
(
  SELECT 1 AS group_id, 'a' AS col 
  UNION ALL SELECT 1 AS group_id,  'b' AS col 
  UNION ALL SELECT 1 AS group_id,  'c' AS col 
  UNION ALL SELECT 2 AS group_id,  'd' AS col 
  UNION ALL SELECT 2 AS group_id,  'e' AS col
)
SELECT group_id,
       col,
       LEAD(col, 2, 'z') over(partition by group_id order by col) as col_new
FROM tmp;

返回结果:

group_id col
  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 5
    评论
HiveSQL中,开窗函数是一种特殊的函数,用于在查询结果中添加一个新的窗口函数值列。开窗函数可以分为排序开窗函数聚合开窗函数两类。常用的排序开窗函数包括row_number()和rank()等。这些排序函数在over()子句中的order by语句只起到窗口内部排序的作用。开窗函数的基本用法包括设置窗口的方法,例如使用window_name、partition by和order by子句来指定窗口的大小和排序方式。另外,开窗函数还可以用于计算序号函数和分布函数。序号函数包括row_number()、rank()和dense_rank()等,用于计算每个行的序号。分布函数包括percent_rank()和cume_dist()等,用于计算某个值在整个结果集中的位置。此外,还有lag()和lead()函数用于获取前后指定行的值,以及first_value()和last_value()函数用于获取窗口内的第一个和最后一个值。在HiveSQL中,开窗函数可以与聚合函数结合使用,以便进行更复杂的计算和分析。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* [hive sql常用开窗函数](https://blog.csdn.net/a822631129/article/details/124672228)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] - *2* *3* [hiveSQL开窗函数详解](https://blog.csdn.net/weixin_62759952/article/details/129269434)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值