Hive函数之窗口(开窗)函数

官方介绍

LanguageManual WindowingAndAnalytics

一句话解释开窗作用

使用开窗函数可以在一个select语句中且不使用子查询的情况下,显示同一指标不同维度的数据。(这么说虽然片面,但是便于理解,实际开窗函数的作用远远超出该描述,具体请看下面案例理解)

开窗函数格式

格式:over(partition by 要分组字段,order by 要排序的字段 window_clause)

其中 partition by ,order by ,window_clause 均为可选参数

window_clause 定义窗口

window_clause格式

(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING

格式:ROWS 或 RANGE BETWEEN 窗口起始关键字 AND 窗口结束关键字

关键字

CURRENT ROW:当前行

n PRECEDING:往前n行数据

n FOLLOWING:往后n行数据

UNBOUNDED:起点

  • UNBOUNDED PRECEDING 表示partition by 后各组数据起始
  • UNBOUNDED FOLLOWING 表示partition by 后各组数据末尾

相关函数

  1. Windowing functions
  • LEAD
    • The number of rows to lead can optionally be specified. If the number of rows to lead is not specified, the lead is one row.
    • Returns null when the lead for the current row extends beyond the end of the window.
  • LAG
    • The number of rows to lag can optionally be specified. If the number of rows to lag is not specified, the lag is one row.
    • Returns null when the lag for the current row extends before the beginning of the window.
  • FIRST_VALUE
    • This takes at most two parameters. The first parameter is the column for which you want the first value, the second (optional) parameter must be a boolean which is false by default. If set to true it skips null values.
  • LAST_VALUE
    • This takes at most two parameters. The first parameter is the column for which you want the last value, the second (optional) parameter must be a boolean which is false by default. If set to true it skips null values.

LAG(col,n):往前第n行数据

LEAD(col,n):往后第n行数据

FIRST_VALUE(col,[flase]):返回当前窗口指定列的第一个值,参数二若为true跳过为null的行

LAST_VALUE(col,[flase]):返回当前窗口指定列的第最后一个值,参数二若为true跳过为null的行

NTILE(n):把有序分区中的行分发到指定数据的组中,各个组有编号,编号从1开始,对于每一行,NTILE返回此行所属的组的编号。注意:n必须为int类型。

The OVER clause supports the following functions, but it does not support a window with them (see HIVE-4797):

Ranking functions: Rank, NTile, DenseRank, CumeDist, PercentRank.

Lead and Lag functions.

Lead 和 Lag 函数以及所有排名函数 over中不支持写window_clause

关于默认参数

When ORDER BY is specified with missing WINDOW clause, the WINDOW specification defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

When both ORDER BY and WINDOW clauses are missing, the WINDOW specification defaults to ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

  1. 当指定了ORDER BY 未指定WINDOW 字句时 窗口未上无边界 到当前行
  2. 当未定义ORDER BY也未指定WINDOW 字句时 窗口为上无边界 到下无边界

案例

案例准备

/*
表中列依次为 姓名,消费日期,消费金额

数据集
jack,2017-01-01,10
tony,2017-01-02,15
jack,2017-02-03,23
tony,2017-01-04,29
jack,2017-01-05,46
jack,2017-04-06,42
tony,2017-01-07,50
jack,2017-01-08,55
mart,2017-04-08,62
mart,2017-04-09,68
neil,2017-05-10,12
mart,2017-04-11,75
neil,2017-06-12,80
mart,2017-04-13,94
*/

-- 创建表
create table business(
name string, 
orderdate string,
cost int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

-- 导入数据
load data local inpath "/opt/module/datas/business.txt" into table business;

案例一

/*
需求:增加一列加入月消费金额
*/

select name,
       orderdate,
       cost,
       sum(cost)
           over (partition by name,substr(orderdate, 1, 7) rows between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING)
from business;

-- 简写
select name,
       orderdate,
       cost,
       sum(cost)
           over (partition by name,substr(orderdate, 1, 7))
from business;


/*
显示结果
jack	2017-01-01	10	111
jack	2017-01-05	46	111
jack	2017-01-08	55	111
jack	2017-02-03	23	23
jack	2017-04-06	42	42
mart	2017-04-13	94	299	
mart	2017-04-08	62	299
... ...
*/

案例二

/*
需求:新增一列显示累加消费金额
*/

select name,
       orderdate,
       cost,
       sum(cost)
           over (partition by name order by orderdate rows between UNBOUNDED PRECEDING and CURRENT ROW )
from business;

--简写
select name,
       orderdate,
       cost,
       sum(cost)
           over (partition by name order by orderdate)
from business;


/*
显示结果 
jack	2017-01-01	10	10
jack	2017-01-05	46	56
jack	2017-01-08	55	111
jack	2017-02-03	23	134
jack	2017-04-06	42	176
mart	2017-04-08	62	62
mart	2017-04-09	68	130
... ...
*/

案例三:LAG、LEAD 函数

/*
需求: 新增一列显示上/下次购买时间,若没有显示"无数据"
*/

-- 上次
select name,
       orderdate,
       cost,
       lag(orderdate,1,'无数据')
           over (partition by name order by orderdate )
from business;

-- 下次
select name,
       orderdate,
       cost,
       LEAD(orderdate,1,'无数据')
           over (partition by name order by orderdate )
from business;


案例四:FIRST_VALUE、LAST_VALUE 函数

/*
需求:新增两列显示顾客本月第一次和最后一次购买时间
*/

select name,
       orderdate,
       cost,
       FIRST_VALUE(orderdate,true)
           over (partition by name,substring(orderdate,1,7) order by orderdate ),
       LAST_VALUE(orderdate,true)
           over (partition by name,substring(orderdate,1,7) order by orderdate rows between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING)
from business;

案例五:

/*
需求:查询顾客最近三次cost花费
*/

select name,
       orderdate,
       cost,
       sum(cost) over (partition by name order by orderdate rows between 2 preceding and current row )
from business
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
HiveSQL中,开窗函数是一种特殊的函数,用于在查询结果中添加一个新的窗口函数值列。开窗函数可以分为排序开窗函数和聚合开窗函数两类。常用的排序开窗函数包括row_number()和rank()等。这些排序函数在over()子句中的order by语句只起到窗口内部排序的作用。开窗函数的基本用法包括设置窗口的方法,例如使用window_name、partition by和order by子句来指定窗口的大小和排序方式。另外,开窗函数还可以用于计算序号函数和分布函数。序号函数包括row_number()、rank()和dense_rank()等,用于计算每个行的序号。分布函数包括percent_rank()和cume_dist()等,用于计算某个值在整个结果集中的位置。此外,还有lag()和lead()函数用于获取前后指定行的值,以及first_value()和last_value()函数用于获取窗口内的第一个和最后一个值。在HiveSQL中,开窗函数可以与聚合函数结合使用,以便进行更复杂的计算和分析。<span class="em">1</span><span class="em">2</span><span class="em">3</span> #### 引用[.reference_title] - *1* [hive sql常用开窗函数](https://blog.csdn.net/a822631129/article/details/124672228)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] - *2* *3* [hiveSQL开窗函数详解](https://blog.csdn.net/weixin_62759952/article/details/129269434)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_1"}}] [.reference_item style="max-width: 50%"] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值