MySQL高级窗口函数
分析函数有3个基本组成
1、分区子句
2、排序子句
3、开窗子句
function1(arg1,arg2,…argn)
over w
window w as ([partition by col] [order by col] [windowing clause])
窗口说明子句的语法:
默认的窗口子句是rows between unbounded preceding and current row。如果你没有显示声明窗口,就将会使用默认窗口。
并不是所有的分析函数都支持开窗子句[rows | range] between and [end expr]
is [unbounded preceding | current row | n preceding | n following]
is [unbounded following | current row | n preceding | n following]
1、排序:row_number,rank,dense_rank
语法:
row_number() over w – 不关注重复,直接排名 例如:1-2-3-4
rank() over w – 重复排名,会跳过 例如:1-1-3-4
dense_rank() over w – 重复排名,不会跳过 例如:1-1-2-3
window w (partition by col order by col)
要求:不同部门内工资排名,从大到小排序
select t.empno,
t.ename,
t.deptno,
t.sal,
row_number() over w as ‘rowNum’,
rank() over w as ‘rank’,
dense_rank() over w as ‘denserank’
from emp t
window w as (partition by t.deptno order by t.sal desc);
2、前一个,后一个:lag,lead
语法:
lag(expr, offset,default) over w
lead(expr,offset,default) over w
window w as (partition by col order by col)
要求:根据分组,取上n条和下n条,如果是第一条或最后一条,就给默认值
select t.empno,
t.ename,
t.hiredate,
t.sal,
lag(sal,1,0) over w as ‘pre_sal’,
lead(sal,1,0) over w as ‘next_sal’,
lag(sal,2,0) over w as ‘pre_sal’,
lead(sal,2,0) over w as ‘next_sal’
from emp t
window w as (partition by t.deptno order by t.hiredate desc);
select deptno,
ename,
lag(ename,1,‘AAA’) over w as ‘pre_name’,
lead(ename,1,‘ZZZ’) over w as ‘next_name’
from emp
window w as (partition by deptno order by ename);
3、取值:first_value,last_value,nth_value
语法:
first_value(expr) over w
last_value(expr) over w
nth_value(measure,n) [from first | from last [respect nulls | ignore nulls]] over w
window w as (partition by col order by col windowing-clause)
需求:求每个部门工资最高和最低以及第二高的
select t.ename,
t.deptno,
t.sal,
first_value(t.sal) over w as ‘top_1’,
last_value(t.sal) over w as ‘last’,
nth_value(t.sal,1) over w as ‘top_1’,
nth_value(t.sal,2) over w as ‘top_2’
from emp t
window w as (partition by t.deptno order by t.sal desc)
rows between unbounded preceding and current row 从第一行到当前行
select t.ename,
t.deptno,
t.sal,
first_value(t.sal) over w as ‘top_1’,
last_value(t.sal) over w as ‘last’,
nth_value(t.sal,1) over w as ‘top_1’,
nth_value(t.sal,2) over w as ‘top_2’
from emp t
window w as (partition by t.deptno order by t.sal desc rows between unbounded preceding and current row)
rows between unbounded preceding and unbounded following 从第一行到最后一行
select t.ename,
t.deptno,
t.sal,
first_value(t.sal) over w as ‘top_1’,
last_value(t.sal) over w as ‘last’,
nth_value(t.sal,1) over w as ‘top_1’,
nth_value(t.sal,2) over w as ‘top_2’
from emp t
window w as (partition by t.deptno order by t.sal desc rows between unbounded preceding and unbounded following)
rows between 1 preceding and 1 following 当前行的前一行到当前行的后一行
select t.ename,
t.deptno,
t.sal,
first_value(t.sal) over w as ‘top_1’,
last_value(t.sal) over w as ‘last’,
nth_value(t.sal,1) over w as ‘top_1’,
nth_value(t.sal,2) over w as ‘top_2’
from emp t
window w as (partition by t.deptno order by t.sal desc rows between 1 preceding and 1 following)
4、分区排名占比:percent_rank,cume_dist
语法:
percent_rank() over w
cume_dist() over w
window w as (partition by col order by col)
percent_rank函数以0-1之间的分数形式返回,某个值在数据分区中的排名
percent_rank函数的计算公式:(rank-1)/ (n-1)
cume_dist函数返回累计分布值,如一个5行的组中,返回累计分布值:0.2,0.4,0.6,0.8,1.0
重复行,计算时取重复行的最后一行。
select t.empno,
t.ename,
t.deptno,
t.sal,
percent_rank() over w as ‘percent’,
cume_dist() over w as ‘cume’
from emp t
window w as (partition by t.deptno order by t.sal desc)
5、分桶:ntile
语法:
ntile(expr) over w
window w as ([partition by col] order by col)
ntile 把数据分为多个桶,每个桶有相同的行数,正负误差1
分成2个桶
select ename,
sal,
ntile(2) over w as ‘n’
from emp
window w as (order by sal asc)
分成3个桶
select ename,
sal,
ntile(3) over w as ‘n’
from emp
window w as (order by sal asc)