窗口函数 OVER(PARTITION BY ...)
MySQL窗口函数,12.21.1 窗口功能说明 https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html
开窗函数的语法结构: 分析函数() over(partition by 分组列名 order by 排序列名 rows between 开始位置 and 结束位置)
over()函数 中包括三个函数:包括分区partition by 列名、排序order by 列名、指定窗口范围rows between 开始位置 and 结束位置
rows between … and … 用得较少,将在最后一节分析
我们知道聚合函数对一组值执行计算并返回单一的值,如sum(),count(),max(),min(), avg()等,这些函数常与group by子句连用。除了 COUNT 以外,聚合函数忽略空值。
但有时候一组数据只返回一组值是不能满足需求的,如我们经常想知道各个地区的前几名、各个班或各个学科的前几名。这时候需要每一组返回多个值。用开窗函数解决这类问题非常方便。它和聚合函数的不同之处是:对于每个组返回多行,而聚合函数对于每个组只返回一行。
建表
<span style="background-color:#282c34"><span style="color:#abb2bf"><span style="color:#c678dd">DROP</span> <span style="color:#c678dd">TABLE</span> <span style="color:#c678dd">IF</span> <span style="color:#c678dd">EXISTS</span> <span style="color:#c678dd">temp</span>
<span style="color:#c678dd">CREATE</span> <span style="color:#c678dd">TABLE</span> <span style="color:#c678dd">temp</span>(
id <span style="color:#d19a66">INT</span>,
<span style="color:#d19a66">name</span> <span style="color:#d19a66">VARCHAR</span>(<span style="color:#d19a66">10</span>),
<span style="color:#c678dd">class</span> <span style="color:#d19a66">VARCHAR</span>(<span style="color:#d19a66">10</span>),
score <span style="color:#d19a66">INT</span>
);
<span style="color:#c678dd">INSERT</span> <span style="color:#c678dd">INTO</span> <span style="color:#c678dd">temp</span> (id, <span style="color:#d19a66">name</span>, <span style="color:#c678dd">class</span>, score) <span style="color:#c678dd">VALUES</span> (<span style="color:#d19a66">1</span>,<span style="color:#98c379">'公孙衍'</span>, <span style="color:#98c379">'2'</span>, <span style="color:#d19a66">81</span>);
<span style="color:#c678dd">INSERT</span> <span style="color:#c678dd">INTO</span> <span style="color:#c678dd">temp</span> (id, <span style="color:#d19a66">name</span>, <span style="color:#c678dd">class</span>, score) <span style="color:#c678dd">VALUES</span> (<span style="color:#d19a66">2</span>,<span style="color:#98c379">'廉颇'</span>, <span style="color:#98c379">'3'</span>, <span style="color:#d19a66">55</span>);
<span style="color:#c678dd">INSERT</span> <span style="color:#c678dd">INTO</span> <span style="color:#c678dd">temp</span> (id, <span style="color:#d19a66">name</span>, <span style="color:#c678dd">class</span>, score) <span style="color:#c678dd">VALUES</span> (<span style="color:#d19a66">3</span>,<span style="color:#98c379">'李牧'</span>, <span style="color:#98c379">'3'</span>, <span style="color:#d19a66">55</span>);
<span style="color:#c678dd">INSERT</span> <span style="color:#c678dd">INTO</span> <span style="color:#c678dd">temp</span> (id, <span style="color:#d19a66">name</span>, <span style="color:#c678dd">class</span>, score) <span style="color:#c678dd">VALUES</span> (<span style="color:#d19a66">4</span>,<span style="color:#98c379">'王翦'</span>, <span style="color:#98c379">'1'</span>, <span style="color:#d19a66">96</span>);
<span style="color:#c678dd">INSERT</span> <span style="color:#c678dd">INTO</span> <span style="color:#c678dd">temp</span> (id, <span style="color:#d19a66">name</span>, <span style="color:#c678dd">class</span>, score) <span style="color:#c678dd">VALUES</span> (<span style="color:#d19a66">5</span>,<span style="color:#98c379">'王贲'</span>, <span style="color:#98c379">'1'</span>, <span style="color:#d19a66">92</span>);
<span style="color:#c678dd">INSERT</span> <span style="color:#c678dd">INTO</span> <span style="color:#c678dd">temp</span> (id, <span style="color:#d19a66">name</span>, <span style="color:#c678dd">class</span>, score) <span style="color:#c678dd">VALUES</span> (<span style="color:#d19a66">6</span>,<span style="color:#98c379">'白起'</span>, <span style="color:#98c379">'1'</span>, <span style="color:#d19a66">96</span>);
<span style="color:#c678dd">INSERT</span> <span style="color:#c678dd">INTO</span> <span style="color:#c678dd">temp</span> (id, <span style="color:#d19a66">name</span>, <span style="color:#c678dd">class</span>, score) <span style="color:#c678dd">VALUES</span> (<span style="color:#d19a66">7</span>,<span style="color:#98c379">'蔺相如'</span>, <span style="color:#98c379">'3'</span>, <span style="color:#d19a66">90</span>);
<span style="color:#c678dd">INSERT</span> <span style="color:#c678dd">INTO</span> <span style="color:#c678dd">temp</span> (id, <span style="color:#d19a66">name</span>, <span style="color:#c678dd">class</span>, score) <span style="color:#c678dd">VALUES</span> (<span style="color:#d19a66">8</span>,<span style="color:#98c379">'赵胜'</span>, <span style="color:#98c379">'3'</span>, <span style="color:#d19a66">81</span>);
<span style="color:#c678dd">INSERT</span> <span style="color:#c678dd">INTO</span> <span style="color:#c678dd">temp</span> (id, <span style="color:#d19a66">name</span>, <span style="color:#c678dd">class</span>, score) <span style="color:#c678dd">VALUES</span> (<span style="color:#d19a66">9</span>,<span style="color:#98c379">'赵雍'</span>, <span style="color:#98c379">'3'</span>, <span style="color:#d19a66">93</span>);
<span style="color:#c678dd">INSERT</span> <span style="color:#c678dd">INTO</span> <span style="color:#c678dd">temp</span> (id, <span style="color:#d19a66">name</span>, <span style="color:#c678dd">class</span>, score) <span style="color:#c678dd">VALUES</span> (<span style="color:#d19a66">10</span>,<span style="color:#98c379">'魏无忌'</span>, <span style="color:#98c379">'2'</span>, <span style="color:#d19a66">92</span>);
</span></span>
OVER(PARTITION BY … ORDER BY … DESC)
无分组排序 | 分组排序(对班级) |
---|---|
SELECT name,class,score, ROW_NUMBER() OVER(ORDER BY score DESC) mm FROM temp | SELECT name,class,score, ROW_NUMBER() OVER(PARTITION BY class ORDER BY score DESC) mm FROM temp |
![]() | ![]() |
<span style="background-color:#282c34"><span style="color:#abb2bf"><span style="color:#5c6370"><em>--这个也能实现无分组的排序,只是它没有排序后的次序(也就是没有上面的mm列)</em></span>
<span style="color:#c678dd">SELECT</span> * <span style="color:#c678dd">FROM</span> <span style="color:#c678dd">temp</span> <span style="color:#c678dd">ORDER</span> <span style="color:#c678dd">BY</span> sroce <span style="color:#c678dd">DESC</span>
</span></span>
实例:
查询每个班的第一名的成绩 | 查询每个班的最后一名的成绩 |
---|---|
SELECT name,class,score FROM (SELECT name,class,score, RANK() OVER(PARTITION BY class ORDER BY score DESC) mm FROM TEMP ) a WHERE mm = 1; | SELECT name,class,score FROM ( SELECT name,class,score, RANK() OVER(PARTITION BY class ORDER BY score) mm FROM temp ) a WHERE mm = 1; |
![]() | ![]() |
在求第一名成绩的时候,不能用row_number(),因为如果同班有两个并列第一,mm=1就只返回一个结果
不加desc,排序就默认升序,取mm=1,就是最后一名
分组排序函数row_number()、rank() 、dense_rank()、ntile()的区别
- select ROW_NUMBER()over(order by name) as 排序,* from temp
— 排序,即使值一样,也不会重复排序。例如1,2,3,4,5 - select RANK()over(order by name) as 排序,* from temp
— 排序,值一样,就重复排序,有间隙。例如1,1,3,4 - select DENSE_RANK()over(order by name) as 排序,* from temp
— 排序,值一样,就重复排序,没有间隙。例如1,1,2,2,3,4,5 - select NTILE(2)over(order by name) as 排序,* from temp
— 排序,分成2组。此函数一般用于取表中前百分之几的数据。例如,取数据的前25%就将数据分4组,然后字段的条件是等于1。
SELECT *, 函数名 OVER(ORDER BY sroce DESC) AS 排序 FROM temp
ROW_NUMBER() | RANK() | DENSE_RANK() | NTILE(2) |
---|---|---|---|
![]() | ![]() | ![]() | ![]() |
偏移分析窗口函数 lag()、lead()
lag和lead分析函数可以在同一次查询中,取出同一字段的 前N行的数据(lag)和 后N行的数据(lead)作为独立的列。
在实际应用中,若要用到取今天和昨天的某字段差值时,lag和lead函数的应用就显得尤为重要。当然,这种操作可以用表的自连接实现,但是lag和lead与left join、right join 等自连接相比,效率更高,SQL更简洁。
lag(exp_str, offset, defval) over(partition by … order by …)
lead(exp_str, offset, defval) over(partition by … order by …)
- exp_str 是字段名称
- offset 是偏移量, 即是上1个或上N个的值,假设当前行在表中排在第5行,则offset 为3,则表示我们所要找的数据行就是表中的第2行(即5-3=2) 。offset默认值为1。
- defval 默认值, 当两个函数取上N/下N个值,当在表中从当前行位置向前数N行已经超出了表的范围时,lag()函数将defval这个参数值作为函数的返回值,若没有指定默认值,则返回NUL,那么在数学运算中,总要给一个默认值才不会出错。
lag() 实例
<span style="background-color:#282c34"><span style="color:#abb2bf">SELECT id,score,
<span style="color:#61aeee">LAG</span>(score,<span style="color:#d19a66">1</span>,<span style="color:#d19a66">0</span>)<span style="color:#61aeee">OVER</span>() AS n1,
<span style="color:#61aeee">LAG</span>(score,<span style="color:#d19a66">1</span>) <span style="color:#e6c07b">OVER</span>() AS n2,
<span style="color:#61aeee">LAG</span>(score,<span style="color:#d19a66">2</span>,<span style="color:#d19a66">0</span>)<span style="color:#61aeee">OVER</span>() AS n6,
<span style="color:#61aeee">LAG</span>(score,<span style="color:#d19a66">2</span>) <span style="color:#e6c07b">OVER</span>() AS n7
FROM temp
</span></span>
lead() 实例
<span style="background-color:#282c34"><span style="color:#abb2bf">SELECT id,score,
<span style="color:#61aeee">LEAD</span>(score,<span style="color:#d19a66">1</span>,<span style="color:#d19a66">0</span>)<span style="color:#61aeee">OVER</span>() AS n1,
<span style="color:#61aeee">LEAD</span>(score,<span style="color:#d19a66">1</span>) <span style="color:#e6c07b">OVER</span>() AS n2,
<span style="color:#61aeee">LEAD</span>(score,<span style="color:#d19a66">2</span>,<span style="color:#d19a66">0</span>)<span style="color:#61aeee">OVER</span>() AS n6,
<span style="color:#61aeee">LEAD</span>(score,<span style="color:#d19a66">2</span>) <span style="color:#e6c07b">OVER</span>() AS n7
FROM temp
</span></span>
其他聚合函数
名称 | 描述 |
---|---|
CUME_DIST() | 累计分配值 |
DENSE_RANK() | 当前行在其分区内的排名,没有间隙 |
FIRST_VALUE() | 窗口框架第一行的参数值 |
LAG() | 来自分区内滞后当前行的行的参数值 |
LAST_VALUE() | 窗口框架最后一行的参数值 |
LEAD() | 分区内行前导当前行的参数值 |
NTH_VALUE() | 来自第 N 行窗口框架的参数值 |
NTILE() | 其分区内当前行的桶数 |
PERCENT_RANK() | 百分比排名值 |
RANK() | 当前行在其分区内的排名,有间隙 |
ROW_NUMBER() | 其分区内的当前行数 |
group by是对检索结果的保留行进行单纯分组,一般和聚合函数一起使用例如max、min、sum、avg、count等一块用。 partition by虽然也具有分组功能,但同时也具有其他的高级功能。
sum() over()的使用
显示全部字段是为了方便查看,当有明确目标的时候可以适当选择相应字段。
SELECT t.*, SUM(t.score) s_sum FROM temp t GROUP BY t.class | SELECT t.*, SUM(t.score) OVER(PARTITION BY t.class ORDER BY t.score DESC) s_sum FROM TEMP t | SELECT t.*, SUM(t.score) OVER(ORDER BY t.id) s_sum FROM temp t |
![]() | ![]() | ![]() |
avg() over()的使用
计算移动平均值,一般用于股票变化。
n个数值 { x 1 , x 2 , x 3 , . . . , x n } \{ {x_1,x_2,x_3,...,x_n}\} {x1,x2,x3,...,xn},按顺序取一定个数所做的全部算术平均值。例如 x 1 + x 2 + x 3 3 , x 2 + x 3 + x 4 3 , x 3 + x 4 + x 5 3 , x 4 + x 5 + x 6 3 , . . . . . \frac{x_1+x_2+x_3}{3},\frac{x_2+x_3+x_4}{3},\frac{x_3+x_4+x_5}{3},\frac{x_4+x_5+x_6}{3},..... 3x1+x2+x3,3x2+x3+x4,3x3+x4+x5,3x4+x5+x6,..... 等就是移动平均值。
<span style="background-color:#282c34"><span style="color:#abb2bf"><span style="color:#c678dd">SELECT</span> id, score, AVG(score) <span style="color:#c678dd">OVER</span>(<span style="color:#c678dd">ORDER</span> <span style="color:#c678dd">BY</span> id <span style="color:#c678dd">ROWS</span> <span style="color:#c678dd">BETWEEN</span> <span style="color:#d19a66">2</span> <span style="color:#c678dd">PRECEDING</span> <span style="color:#c678dd">AND</span> <span style="color:#c678dd">CURRENT</span> <span style="color:#c678dd">ROW</span>) <span style="color:#c678dd">FROM</span> <span style="color:#c678dd">temp</span>
</span></span>
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
限制计算移动平均值的范围,本语句包含本行和前两行。
语法总结:
avg(…A…) over(partition by …b… order by …C… rows between …D1… and …D2…)
sum(…A…) over(partition by …b… order by …C… rows between …D1… and …D2…)
- A:需要被加工的字段名称
- B:分组的字段名称
- C:排序的字段名称
- D:计算的行数范围
窗口范围说明:
- preceding:往前
- following:往后
- current row:当前行
- unbounded:起点(一般结合preceding,following使用)
- unbounded preceding表示该窗口最前面的行(起点)
- unbounded following:表示该窗口最后面的行(终点)
比如:
- rows between unbounded preceding and current row - - 表示本行和之前所有的行
- rows between current row and unbounded following - - 表示本行和之后所有的行
- rows between 3 preceding and current row - - 表示往前3行到本行(共计4行)
- rows between 3 preceding and 1 following - - 表示往前3行到往后1行(共计5行)
下面还有很多用法,就不逐一列举了,简单介绍一下,和上面用法类似:
count() over(partition by … order by …):求分组后的总数。
max() over(partition by … order by …):求分组后的最大值。
min() over(partition by … order by …):求分组后的最小值。
avg() over(partition by … order by …):求分组后的平均值。
lag() over(partition by … order by …):取出前n行数据。
lead() over(partition by … order by …):取出后n行数据。
first_value() over(partition by … order by …):取出第一个数据。
last_value() over(partition by … order by …):取出最后一个数据。
ratio_to_report() over(partition by … order by …):Ratio_to_report() 括号中就是分子,over() 括号中就是分母。
percent_rank() over(partition by … order by …)