sql的一大神器就是over()开窗函数,以前我经常使用的就是row_number() over(partition by rowName order by rowName)别的都不熟悉,今天特简单做个总结,以便以后使用。
数据排序使用
- row_number() over(partition by rowName ORDER BY rowName)
- row_number 的功能是为从第一行开始为每一行设置一个递增的数字
- rank() over(partition by prov_name ORDER BY val_cnt)
- rank 是排名,我们从例子中看到,相同的是相同的数字,而且到了第三行,变成了 3
- dense_rank() over(partition by prov_name ORDER BY val_cnt)
- dense_rank 和 rank 的区别是序号不递增,大家可以看到 1 下面是 2 。
- sum/avg/count/min/max() over([partition by prov_name [ORDER BY val_cnt]])
- 有order by;按照排序连续累加;无order by,计算partition by后的聚合;over()中没有partition by,计算所有数据聚合函数
- first_value/last_value/lag/lead() over([partition by prov_name [ORDER BY val_cnt]])
- first_value:是在窗口里面取到第一个值
- last_value:是在窗口里面取到最后一个值
- lag 是取当前行的下 N 条数据,并且可以设置默认值
- lead 是取当前行的上 N 条数据,并且可以设置默认值
最后引用个大神的案例:
select ID_1 ,
SUM(ID_1) over(ORDER BY ID_2) default_sum,
SUM(ID_1) over(ORDER BY ID_2 RANGE BETWEEN unbounded preceding AND CURRENT ROW) range_unbound_sum,
SUM(ID_1) over(ORDER BY ID_2 ROWS BETWEEN unbounded preceding AND CURRENT ROW) rows_unbound_sum,
SUM(ID_1) over(ORDER BY ID_2 RANGE BETWEEN 1 preceding AND 2 following) range_sum,
SUM(ID_1) over(ORDER BY ID_2 ROWS BETWEEN 1 preceding AND 2 following) rows_sum
from (
SELECT (CASE
WHEN id IN (1, 2) THEN
1
WHEN id IN (4, 5) THEN
6
ELSE
id
END) as ID_1,
(CASE
WHEN id IN (1, 2) THEN
1
WHEN id IN (4, 5) THEN
6
ELSE
id
END) as ID_2
FROM (
select 1 as id
union all
select 2 as id
union all
select 3 as id
union all
select 4 as id
union all
select 5 as id
union all
select 6 as id
union all
select 7 as id
union all
select 8 as id
union all
select 9 as id
union all
select 10 as id
) as a
) as b
结果:
从上面的例子可知:
1、窗口子句必须和order by 子句同时使用,且如果指定了order by 子句未指定窗口子句,则默认为RANGE BETWEEN unbounded preceding AND CURRENT ROW,如上例结果集中的defult_sum等于range_unbound_sum;
2、如果分析函数没有指定ORDER BY子句,也就不存在ROWS/RANGE窗口的计算;
3、range是逻辑窗口,是指定当前行对应值的范围取值,列数不固定,只要行值在范围内,对应列都包含在内,如上例中range_sum(即range 1 preceing and 2 following)例的分析结果:
当id=1时,是sum为1-1<=id<=1+2 的和,即sum=1+1+3=5(取id为1,1,3);
当id=3时,是sum为3-1<=id<=3+2 的和,即sum=3(取id为3);
当id=6时,是sum为6-1<=id<=6+2 的和,即sum=6+6+6+7+8=33(取id为6,6,6,7,8);
以此类推下去,结果如上例中所示。
4、rows是物理窗口,即根据order by 子句排序后,取的前N行及后N行的数据计算(与当前行的值无关,只与排序后的行号相关),如上例中rows_sum例结果,是取前1行和后2行数据的求和,分析上例rows_sum的结果:
当id=1(第一个1时)时,前一行没数,后二行分别是1和3,sum=1+1+3=5;
当id=3时,前一行id=1,后二行id都为6,则sum=1+3+6+6=16;