窗口函数的基本用法如下:
函数名()over(partition by <分组的列> order by <排序的列> rows between<数据范围>)
其中,over是关键字,用来指定函数执行的窗口范围,包含三个分析子句:
分组(partition by)子 句,排序(order by)子句,窗口(rows)子句,如果后面括号中什么都不写,则意味着窗口包含满足where条件的所有行,窗口函数基于所有行进行计算;
数据范围格式:
rows between 2 preceding and current row # 取当前行和前面两行
rows between unbounded preceding and current row # 包括本行和之前所有的行
rows between current row and unbounded following # 包括本行和之后所有的行
rows between 3 preceding and current row # 包括本行和前面三行
rows between 3 preceding and 1 following # 从前面三行和下面一行,总共五行
# 当order by后面缺少窗口从句条件,窗口规范默认是rows between unbounded
preceding and current row.
# 当order by和窗口从句都缺失, 窗口规范默认是 rows between unbounded preceding
and unbounded following
聚合类窗口函数:
sum() count() avg() max() min()
专有窗口函数:
rank() dense_rank() row_number() untile(n) lag(...) lead(...)
举例:
现有2018~2020某电商平台订单信息表user_trade,表结构如下:
列名 | 释义 |
---|---|
user_name | 用户名 |
piece | 购买数量 |
price | 价格 |
pay_amount | 支付金额 |
goods_catagory | 商品品类 |
pay_time | 支付日期 |
- 聚合类函数
要求:查询出2019年每月的支付总额和当年累计支付总额
SELECT
a.MONTH,
a.pay_amount,
sum( a.pay_amount ) over ( ORDER BY a.MONTH )
FROM(
SELECT MONTH
( pay_time ) MONTH,
sum( pay_amount ) pay_amount
FROM
user_trade
WHERE
YEAR ( pay_time )= 2019
GROUP BY
MONTH ( pay_time )) a
结果:
- 排序函数 rank() dense_rank() row_number()
要求:2020年一月,购买商品品类数的用户排名
SELECT
user_name,
count( DISTINCT goods_category ) category_count,
row_number() over ( ORDER BY count( DISTINCT goods_category ) ) order1,-- row_number生成了行的编号从1开始
rank() over ( ORDER BY count( DISTINCT goods_category ) ) order2,
dense_rank() over ( ORDER BY count( DISTINCT goods_category ) ) order3
FROM
user_trade
WHERE
substring( pay_time, 1, 7 ) = '2020-01'
GROUP BY
user_name;
结果:
- ntile(n) 分组
要求:查询出将2020年2月的支付用户,按照支付金额分成5组后的结果
SELECT user_name,
sum( pay_amount ) pay_amount,
ntile( 5 ) over ( ORDER BY sum( pay_amount ) DESC ) LEVEL
FROM
user_trade
WHERE
substr( pay_time, 1, 7 )= '2020-02'
GROUP BY
user_name;
结果:
- 偏移分析函数 lag(...) lead(...)
要求:查询出King和West的时间偏移(前N行)
SELECT
user_name,
pay_time,
lag( pay_time, 1, pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lag1,
lag( pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lag1_s,
lag( pay_time, 2, pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lag2,
lag( pay_time, 2 ) over ( PARTITION BY user_name ORDER BY pay_time ) lag2_s
FROM
user_trade
WHERE
user_name IN ( 'King', 'West' );
结果:
要求:King和West的时间偏移(后N行)
SELECT
user_name,
pay_time,
lead( pay_time, 1, pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lead1,
lead( pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lead2,
lead( pay_time, 2, pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lead3,
lead( pay_time, 2 ) over ( PARTITION BY user_name ORDER BY pay_time ) lead4
FROM
user_trade
WHERE
user_name IN ( 'King', 'West' );
结果:
***(以上数据可私信获取)***