窗口函数
MySQL 8.0 开始支持窗口函数,对于查询中的每一行,它使用与该行相关的行执行计算。 以下部分讨论如何使用窗口函数,包括 OVER 和 WINDOW 子句的描述。第一部分提供非聚合窗口函数的描述。
窗口函数语法
window_function_name(expression)
OVER (
[partition_defintion]
[order_definition]
[frame_definition]
)
在这个语法中:
- 首先,指定窗口函数名称,后跟表达式。
- 其次,指定OVER具有三个可能元素的子句:分区定义,排序定义和帧定义。
OVER子句后面的开括号和右括号是强制性的,若OVER子句是空的,它将整个查询行集视为一个分区。因此窗口函数会对每一行产生一个全局和,例如:
window_function_name(expression) OVER()
窗口函数
非聚合窗口函数
名称 | 描述 |
---|---|
CUME_DIST() | 计算一组值中值的累积分布 |
DENSE_RANK() | 根据ORDER BY子句为其分区中的每一行分配一个排名。它为具有相同值的行分配相同的排名。如果两行或更多行具有相同的等级,则排序值序列中将没有间隙。 |
FIRST_VALUE() | 返回指定表达式相对于窗口框架中第一行的值。 |
LAG() | 返回分区中当前行之前的第N行的值。 如果不存在前一行,则返回NULL。 |
LAST_VALUE() | 返回指定表达式相对于窗口框架中最后一行的值。 |
LEAD() | 返回分区中当前行之后的第N行的值。 如果不存在后续行,则返回NULL。 |
NTH_VALUE() | 返回窗口框架第N行的参数值 |
NTILE() | 将每个窗口分区的行分配到指定数量的已排名组中。 |
PERCENT_RANK() | 计算分区或结果集中行的百分位数 |
RANK() | 与DENSE_RANK()函数类似,只是当两行或更多行具有相同的排名时,排序值序列中存在间隙 |
ROW_NUMBER() | 为其分区中的每一行分配一个连续整数 |
聚合函数
名称 | 描述 |
---|---|
AVG() | 计算一组值或表达式的平均值。 |
COUNT() | 计算表中的行数。 |
INSTR() | 返回字符串中第一次出现的子字符串的位置。 |
SUM() | 计算一组值或表达式的总和。 |
MIN() | 在一组值中找到最小值 |
MAX() | 在一组值中找到最大值 |
… | … |
句法
over_clause 句法
over_clause:
{OVER (window_spec) | OVER window_name}
- 在第一种情况下,window_spec窗口规范直接出现在OVER子句中的括号之间。
- 在第二种情况下,是由查询中其他地方的子句窗口规范定义的window_name窗口名称。
对于OVER (window_spec)语法,窗口规范有几个部分,都是可选的:
window_spec:
[window_name] [partition_clause] [order_clause] [frame_clause]
window_name
window_name:由查询中其他地方的子句定义的窗口的名称。
- 如果命名窗口定义包含partition_clause、order_clause或frame_clause属性,则OVER引用窗口名称的子句后续也不能包含相同类型的属性。
## 允许的语法 使用窗口 w + order_clause
OVER (w ORDER BY country)
...WINDOW w AS (PARTITION BY country)
## 不允许的语法 使用窗口 w + partition_clause
OVER (w PARTITION BY year)
...WINDOW w AS (PARTITION BY country)
- 命名窗口的定义本身可以以 window_name 开头。在这种情况下,允许向前和向后引用,但不允许循环。
## 允许的语法
WINDOW w1 AS (w2), w2 AS (), w3 AS (w1)
## 错误的语法
WINDOW w1 AS (w2), w2 AS (w3), w3 AS (w1)
partition_clause 句法
PARTITION BY子句指示如何将查询行划分为组。给定行的窗口函数结果基于包含该行的分区的行。如果省略PARTITION BY,则由所有查询行组成的一个分区。partition_clause语法如下所示:
partition_clause:
PARTITION BY expr [, expr] ...
标准 SQL 要求PARTITION BY后跟列名。MySQL扩展是允许表达式,而不仅仅是列名。例如,如果一个表包含一个名为 ts 的TIMESTAMP列,则标准 SQL 允许PARTITION BY ts,但不允许PARTITION BY HOUR(ts),而 MySQL 允许两者。
order_clause 句法
ORDER BY子句指定行在分区中的排序方式。可以在多个键上的分区内对数据进行排序,每个键由表达式指定。多个表达式用逗号分隔。根据 ORDER BY 子句,相等的分区行被视为对等项。如果省略 ORDER BY,则分区行是无序的,不隐含任何处理顺序,并且所有分区行都是对等的。order_clause具有以下语法:
order_clause:
ORDER BY expr [ASC|DESC] [, expr [ASC|DESC]] ...
frame_clause 句法
frame_clause是当前分区的子集。frame是相对于当前行确定的,这使得frame能够根据当前行在其分区中的位置在分区内移动。比如:
- 通过将frame定义为从分区开始到当前行的所有行,可以计算每行的运行总计。
- 通过将frame定义为在当前行的任一侧扩展 N 行,可以计算滚动平均值。
frame_clause(如果给定)具有以下语法:
frame_clause:
frame_units frame_extent
frame_units:
{ROWS | RANGE}
-
frame_units值指示当前行和框架行之间的关系类型:
- ROWS:框架由开始行和结束行位置定义。偏移量是行号与当前行号的差异。
- RANGE:框架由值范围内的行定义。偏移量是行值与当前行值的差异。
-
frame_extent值指示帧的起点和终点。可以仅指定帧的开头(在这种情况下,当前行隐式指定结尾),也可以使用 BETWEEN 指定两个帧端点:
frame_extent:
{frame_start | frame_between}
frame_between:
BETWEEN frame_start AND frame_end
frame_start, frame_end: {
CURRENT ROW
| UNBOUNDED PRECEDING
| UNBOUNDED FOLLOWING
| expr PRECEDING
| expr FOLLOWING
}
使用 BETWEEN 语法时,frame_start不得晚于frame_end.
允许frame_start和frame_end值具有以下含义:
- CURRENT ROW: 当前计算的行。
- UNBOUNDED PRECEDING: 开始于分区的第一行。
- UNBOUNDED FOLLOWING: 结束于分区的最后一行。
- expr PRECEDING: 对于ROWS,当前行之前的expr行。expr可以是文字数字或计算结果的表达式。对于 RANGE,值等于当前行值减去 expr 的行;如果当
前行值为 NULL,则是该行的对等项。对于 expr PRECEDING(和 expr FOLLOWING),expr 可以是 ? 参数标记(用于预编译语句)、非负数字文本或INTERVAL val unit的时间间隔。对于 INTERVAL 表达式,val 指定非负间隔值,unit 是指示应在其中解释值的单位的关键字。 - expr FOLLOWING: 对于ROWS,是当前行之后的 expr 行。对于 RANGE,值等于当前行值加 expr 的行;如果当前行值为 NULL,则是该行的对等项。
在没有frame_clause子句的情况下,默认框架取决于是否存在 ORDER BY 子句:
- 使用 ORDER BY: 默认frame_clause包括从分区开始到当前行的行,包括当前行的所有对等项。默认值等效于:
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- 不使用ORDER BY: 默认frame_clause包括所有分区行(因为如果没有ORDER BY则所有分区行都是对等的)。默认值等效于:
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
由于默认frame_claus因是否存在 ORDER BY 而异,因此将 ORDER BY 添加到查询可能会更改最终的结果。
当当前行值为 NULL 时,frame_claus的含义可能不同。假设是这种情况,这些示例说明了各种框架规范如何应用:
1. ORDER BY X ASC RANGE BETWEEN 10 FOLLOWING AND 15 FOLLOWING
帧从NULL开始,以NULL结束,因此仅包含值为 NULL 的行.
2. ORDER BY X ASC RANGE BETWEEN 10 FOLLOWING AND UNBOUNDED FOLLOWING
帧从 NULL 开始,在分区的末尾结束。因为 ASC 排序将 NULL 值放在第一位,所以帧就是整个分区。
3. ORDER BY X DESC RANGE BETWEEN 10 FOLLOWING AND UNBOUNDED FOLLOWING
帧从 NULL 开始,在分区的末尾停止。由于 DESC 排序将NULL值放在最后,因此帧只是NULL值。
4. ORDER BY X ASC RANGE BETWEEN 10 PRECEDING AND UNBOUNDED FOLLOWING
帧从 NULL 开始,在分区的末尾停止。因为 ASC 排序将 NULL 值放在第一位,所以帧就是整个分区。
5. ORDER BY X ASC RANGE BETWEEN 10 PRECEDING AND 10 FOLLOWING
帧从NULL开始,以NULL结束,因此仅包含值为 NULL 的行.
6. ORDER BY X ASC RANGE BETWEEN 10 PRECEDING AND 1 PRECEDING
帧从NULL开始,以NULL结束,因此仅包含值为 NULL 的行.
7. ORDER BY X ASC RANGE BETWEEN UNBOUNDED PRECEDING AND 10 FOLLOWING
帧从分区的开头开始,在值为 NULL 的行处停止。由于 ASC 排序将NULL值放在首位,因此帧只是NULL值。
示例
序号函数- ROW_NUMBER、RANK、DENSE_RANK
select goods_name,
goods_id,
pay_price,
-- 根据pay_price为每个商品的每一行分配一个连续整数
row_number() over (partition by goods_id order by pay_price desc) as goods_row_number,
-- 根据pay_price为每个商品的每一行分配一个排名。具有相同值的行分配相同的排名。当两行或更多行具有相同的排名时,排序值序列中存在间隙
rank() over (w1 order by pay_price desc ) as goods_rank,
-- 根据pay_price为每个商品的每一行分配一个排名。具有相同值的行分配相同的排名。当两行或更多行具有相同的排名时,排序值序列中不存在间隙
dense_rank() over (w2) as goods_dense_rank
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0
WINDOW w1 AS (partition by goods_id)
, w2 as (w1 order by pay_price desc);
-- 根据商品总销量为每个商品分配一个排名
select goods_id,
sum(pay_price) goods_sales,
row_number() over (w3) as goods_row_number,
rank() over (w3) as goods_rank,
dense_rank() over (w3) as goods_dense_rank
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0
group by goods_id
WINDOW w3 as (order by sum(pay_price) desc rows between unbounded preceding and unbounded following);
-- 根据商品总销量为每一行分配一个排名
with temp_goods_sales as (select id,
goods_name,
goods_id,
pay_price,
sum(pay_price)
over (partition by goods_id order by pay_price desc rows between unbounded preceding and unbounded following) goods_sales
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0)
select *,
row_number() over (order by goods_sales desc) as goods_row_number,
rank() over (order by goods_sales desc) as goods_rank,
dense_rank() over (order by goods_sales desc) as goods_dense_rank
from temp_goods_sales
order by goods_row_number;
聚合函数- SUM,AVG,MIN,MAX
-- sum()
select id,
pay_price,
goods_id,
goods_name,
-- 统计截止到当前,商品的销售额
sum(pay_price) over (partition by goods_id order by id rows unbounded preceding) as goods_current_sales,
-- 统计商品的总销售额
sum(pay_price)
over (partition by goods_id rows between unbounded preceding and unbounded following) as goods_total_sale
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0;
-- avg()
select id,
pay_price,
market_id,
market_name,
-- 统计截止到当前,商户的平均销售额
avg(pay_price) over (partition by market_id order by id rows unbounded preceding) as market_current_avg_sales,
-- 统计商户的平均销售额
avg(pay_price)
over (partition by market_id rows between unbounded preceding and unbounded following) as market_total_avg_sales
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0;
-- max() min() 获取每个商户 商品最高的销量额和商品最低的销量额
with temp_goods_sale as (select id,
pay_price,
goods_id,
goods_name,
market_id,
sum(pay_price)
over (partition by goods_id order by id rows unbounded preceding) as goods_current_sales,
sum(pay_price)
over (partition by goods_id rows between unbounded preceding and unbounded following) as goods_total_sale
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0)
select id,
pay_price,
goods_id,
goods_name,
market_id,
goods_current_sales,
goods_total_sale,
max(goods_current_sales)
over (partition by market_id order by goods_id, id rows unbounded preceding) as shop_current_max_price,
max(goods_total_sale)
over (partition by market_id rows between unbounded preceding and unbounded following) as shop_total_max_price,
min(goods_current_sales)
over (partition by market_id order by goods_id, id rows unbounded preceding) as shop_current_min_price,
min(goods_total_sale)
over (partition by market_id rows between unbounded preceding and unbounded following) as shop_current_min_price
from temp_goods_sale
order by market_id, id;
分布函数- CUME_DIST
分组内小于、等于当前rank值的行数 / 分组内总行数
select id,
pay_price,
goods_id,
goods_name,
rank() over (partition by goods_id order by id ) as rn,
CUME_DIST() over (partition by goods_id order by id ) as rn2
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0;
前后函数-LAG和LEAD
返回位于当前行的前n行(LAG(expr,n))或后n行(LEAD(expr,n))的expr的值
with temp_goods_sale as (select id,
goods_id,
goods_name,
market_id,
sum(pay_price)
over (partition by goods_id rows between unbounded preceding and unbounded following) as goods_sales
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0)
select *,
lag(goods_sales, 1) over (partition by market_id order by goods_sales, goods_id, id) as last_goods_sales,
lead(goods_sales, 1) over (partition by market_id order by goods_sales, goods_id, id) as next_goods_sales
from temp_goods_sale;
头尾函数-FIRST_VALUE和LAST_VALUE
返回第一个(FIRST_VALUE(expr))或最后一个(LAST_VALUE(expr))expr的值
with temp_goods_sale as (select id,
goods_id,
goods_name,
market_id,
sum(pay_price)
over (partition by goods_id rows between unbounded preceding and unbounded following) as goods_sales
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0)
select *,
first_value(goods_sales)
over (partition by market_id order by goods_sales RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as min_goods_sales,
last_value(goods_sales)
over (partition by market_id order by goods_sales RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as max_goods_sales
from temp_goods_sale;
其他函数-NTH_VALUE(expr, n)、NTILE(n)
将分区中的有序数据分为n个等级,记录等级数
with temp_goods_sale as (select id,
goods_id,
goods_name,
market_id,
sum(pay_price)
over (partition by goods_id rows between unbounded preceding and unbounded following) as goods_sales
from ods_goods_order
where order_status between 0 and 2
and pay_price > 0)
select *,
ntile(3) over (partition by market_id order by goods_sales desc) as ntile
from temp_goods_sale;