文章目录
随着项目框架的升级,mysql也从5.7升级到了8.0,特地跑去官网瞅了瞅新特性,然后发现了一个好东西12.20.1 窗口函数
非聚合窗口函数
函数名称 | 描述 | 表达式参数 |
---|---|---|
CUME_DIST() | 累计分布值,<=当前rank值的行数/总行数 | CUME_DIST() over_clause |
DENSE_RANK() | 按分区顺序排序,不跳过重复序号,如1,1,1,2 | DENSE_RANK() over_clause |
FIRST_VALUE() | 按分区,截止到目前行的第一行的expr值 | FIRST_VALUE(expr) [null_treatment] over_clause |
LAG() | 按分区,当前行的前一行的expr值 | LAG(expr [, N[, default]]) [null_treatment] over_clause |
LAST_VALUE() | 按分区,截止到目前行的最后一行的expr值 | LAST_VALUE(expr) [null_treatment] over_clause |
LEAD() | 按分区,当前行的(紧接着)后一行的expr值 | LEAD(expr [, N[, default]]) [null_treatment] over_clause |
NTH_VALUE() | 按分区,截止到目前行的第n行的expr值 若取不到则为null | NTH_VALUE(expr, N) [from_first_last] [null_treatment] over_clause |
NTILE() | Bucket number of current row within its partition. | NTILE(N) over_clause |
PERCENT_RANK() | (rank - 1) / (rows - 1),其中rank为rank()排序值 | PERCENT_RANK() over_clause |
RANK() | 并列排序,跳过重复序号,如1,1,1,4 | RANK() over_clause |
ROW_NUMBER() | 按分区排序,顺序排序 | ROW_NUMBER() over_clause |
cume_dist()/percent_rank() over_clause
CUME_DIST;>=当前rank值的行数/总行数
NOTE:ROW_NUMBER
没有指定order by
,则排序结果是不确定的。
mysql> SELECT
val,
ROW_NUMBER() OVER w AS 'row_number',
CUME_DIST() OVER w AS 'cume_dist',
PERCENT_RANK() OVER w AS 'percent_rank'
FROM numbers
WINDOW w AS (ORDER BY val);
+------+------------+--------------------+--------------+------+------------+
| val | row_number | cume_dist | percent_rank | rank | dense_rank |
+------+------------+--------------------+--------------+------+------------+
| 1 | 1 | 0.2222222222222222 | 0 | 1 | 1 |-->小于等于val=1值的行数为2,cume_dist=2/9
| 1 | 2 | 0.2222222222222222 | 0 | 1 | 1 |-->此行rank=1, percent_rank=(1-1)/(9-1)
| 2 | 3 | 0.3333333333333333 | 0.25 | 3 | 2 |-->cume_dist=3/9, percent_rank=(3-1)/(9-1)
| 3 | 4 | 0.6666666666666666 | 0.375 | 4 | 3 |-->cume_dist=6/9, percent_rank=(4-1)/(9-1)
| 3 | 5 | 0.6666666666666666 | 0.375 | 4 | 3 |
| 3 | 6 | 0.6666666666666666 | 0.375 | 4 | 3 |
| 4 | 7 | 0.8888888888888888 | 0.75 | 7 | 4 |-->cume_dist=8/9, percent_rank=(7-1)/(9-1)
| 4 | 8 | 0.8888888888888888 | 0.75 | 7 | 4 |
| 5 | 9 | 1 | 1 | 9 | 5 |-->cume_dist=9/9, percent_rank=(9-1)/(9-1)
+------+------------+--------------------+--------------+------+------------+
first_value(expr)/last_value(expr)/nth_value(expr) over_clause
以下案例为:按subject字段分组,按time升序排序。
- subject:
st113
分组内,以第三行数据为例| 07:30:00 | st113 | 25 | 10 | 25 | 9 | NULL |
- 截止到当前行的第一行的val值,first=10
- 截止到当前行的最后一行的val值,last=25
- 截止到当前行的第n=2行的val值,NTH_VALUE(val, 2)=9
- 截止到当前行的第n=4行的val值,NTH_VALUE(val, 4)=NULL
mysql> SELECT
time, subject, val,
FIRST_VALUE(val) OVER w AS 'first',
LAST_VALUE(val) OVER w AS 'last',
NTH_VALUE(val, 2) OVER w AS 'second',
NTH_VALUE(val, 4) OVER w AS 'fourth'
FROM observations
WINDOW w AS (PARTITION BY subject ORDER BY time
ROWS UNBOUNDED PRECEDING);
+----------+---------+------+-------+------+--------+--------+
| time | subject | val | first | last | second | fourth |
+----------+---------+------+-------+------+--------+--------+
| 07:00:00 | st113 | 10 | 10 | 10 | NULL | NULL | -->该分组内的first=10(第一个val值)
| 07:15:00 | st113 | 9 | 10 | 9 | 9 | NULL |
| 07:30:00 | st113 | 25 | 10 | 25 | 9 | NULL |
| 07:45:00 | st113 | 20 | 10 | 20 | 9 | 20 |
| 07:00:00 | xh458 | 0 | 0 | 0 | NULL | NULL |
| 07:15:00 | xh458 | 10 | 0 | 10 | 10 | NULL |
| 07:30:00 | xh458 | 5 | 0 | 5 | 10 | NULL |
| 07:45:00 | xh458 | 30 | 0 | 30 | 10 | 30 |
| 08:00:00 | xh458 | 25 | 0 | 25 | 10 | 30 |
+----------+---------+------+-------+------+--------+--------+
lag()/lead() over_clause
LAG(expr [, N[, default]]) [null_treatment] over_clause
- 按分区有n行,返回当前行(第i行)的上一行(第i-1行)的expr值,若没有这样的行,则返回default默认值
- 若果n为3,那么返回值是第三行的expr值;如果n或者default默认值未指定,那么返回值就是1和null
- 若n=0,那么返回值为当前行的expr值
LEAD(expr [, N[, default]]) [null_treatment] over_clause
- 按分区有n行,返回当前行(第i行)的下一行(第i+1行)的expr值,若没有这样的行,则返回default默认值
NOTE:从8.0.22版本后,n不能为null了,必须定义在1-263范围内,具体限制请看官方说明
Beginning with MySQL 8.0.22, N cannot be NULL. In addition, it must now be an integer in the range 1 to 263, inclusive, in any of the following forms:
an unsigned integer constant literal
a positional parameter marker (?)
a user-defined variable
a local variable in a stored routine
以下案例:根据字段n升序排列。
mysql> SELECT
n,
LAG(n, 1, 0) OVER w AS 'lag',
LEAD(n, 1, 0) OVER w AS 'lead',
n + LAG(n, 1, 0) OVER w AS 'next_n',
n + LEAD(n, 1, 0) OVER w AS 'next_next_n'
FROM fib
WINDOW w AS (ORDER BY n);
+------+------+------+--------+-------------+
| n | lag | lead | next_n | next_next_n |
+------+------+------+--------+-------------+
| 1 | 0 | 1 | 1 | 2 |-->第1行,lag=第(1-1)行的n值,没有则取默认值0;lead=第(1+1)行的n值
| 1 | 1 | 2 | 2 | 3 |
| 2 | 1 | 3 | 3 | 5 |
| 3 | 2 | 5 | 5 | 8 |-->第4行,第3行的n值,没有则取默认值0,lag=2,第5行的n值,lead=5
| 5 | 3 | 8 | 8 | 13 |
| 8 | 5 | 0 | 13 | 8 |
+------+------+------+--------+-------------+
ntile(n) over_clause
NTILE(N) over_clause
- 首先分区,在将分区分为n份
- n必须为正整数 Bucket number return values range from 1 to N.
NOTE:从8.0.22开始, NTILE(NULL)不在允许使用
如下案例:按val升序排列,分区后该区有9行
- ntile(2),将9行数据分为2组
- 第一组:第1-5行
- 第二组:第6-9行
- ntile(4),将9行数据分为4组
- 第一组:第1-3行
- 第二组:第4-5行
- 第三组:第6-7行
- 第四组:第8-9行
mysql> SELECT
val,
ROW_NUMBER() OVER w AS 'row_number',
NTILE(2) OVER w AS 'ntile2',
NTILE(4) OVER w AS 'ntile4'
FROM numbers
WINDOW w AS (ORDER BY val);
+------+------------+--------+--------+
| val | row_number | ntile2 | ntile4 |
+------+------------+--------+--------+
| 1 | 1 | 1 | 1 |
| 1 | 2 | 1 | 1 |
| 2 | 3 | 1 | 1 |
| 3 | 4 | 1 | 2 | -->改行属于分割为2组后的第1组,属于分割为4组后的第2组
| 3 | 5 | 1 | 2 |
| 3 | 6 | 2 | 3 | -->改行属于分割为2组后的第2组,属于分割为4组后的第3组
| 4 | 7 | 2 | 3 |
| 4 | 8 | 2 | 4 |
| 5 | 9 | 2 | 4 |
+------+------------+--------+--------+
over_clause
over_clause:
{OVER (window_spec) | OVER window_name}
- 若over()没有参数值,则窗口函数将对所有行进行计算
- 若有则按partition by分区
以下案例:按country分组
- ROW_NUMBER() OVER(PARTITION BY country),没有order by排序字段,则分组内数据乱序,row_number是不确定的
- ROW_NUMBER() OVER(PARTITION BY country ORDER BY year, product),按year、product升序排列
mysql> SELECT
year, country, product, profit,
ROW_NUMBER() OVER(PARTITION BY country) AS row_num1,
ROW_NUMBER() OVER(PARTITION BY country ORDER BY year, product) AS row_num2
FROM sales;
+------+---------+------------+--------+----------+----------+
| year | country | product | profit | row_num1 | row_num2 |
+------+---------+------------+--------+----------+----------+
| 2000 | Finland | Computer | 1500 | 2 | 1 |
| 2000 | Finland | Phone | 100 | 1 | 2 |
| 2001 | Finland | Phone | 10 | 3 | 3 |
| 2000 | India | Calculator | 75 | 2 | 1 |
| 2000 | India | Calculator | 75 | 3 | 2 |
| 2000 | India | Computer | 1200 | 1 | 3 |
| 2000 | USA | Calculator | 75 | 5 | 1 |
| 2000 | USA | Computer | 1500 | 4 | 2 |
| 2001 | USA | Calculator | 50 | 2 | 3 |
| 2001 | USA | Computer | 1500 | 3 | 4 |
| 2001 | USA | Computer | 1200 | 7 | 5 |
| 2001 | USA | TV | 150 | 1 | 6 |
| 2001 | USA | TV | 100 | 6 | 7 |
+------+---------+------------+--------+----------+----------+
window_spec
window_spec:
[window_name] [partition_clause] [order_clause] [frame_clause]
window_name
partition_clause
- partition_clause:
PARTITION BY expr [, expr] …
NOTE:partition by 字段/表达式,但表达式有些是不被允许的!例如字段ts(类型TIMESTAMP ),可以使用:PARTITION BY ts
不能使用:PARTITION BY HOUR(ts)
order_clause
- order_clause:
ORDER BY expr [ASC|DESC] [, expr [ASC|DESC]] …
NOTE:字段值为null,在ASC时排在第一位,在DESC时排最后一位。