MySQL窗口函数
MySQL窗口函数在我们处理SQL需求时有很大的帮助,特别是处理一些排名问题以及累计问题,今天给大家整理常用的窗口函数.
以如下scores表为例来说明窗口函数
Scores表:
+-------------+--------+------------+--------------+
| player_name | gender | day | score_points |
+-------------+--------+------------+--------------+
| Aron | F | 2020-01-01 | 17 |
| Alice | F | 2020-01-07 | 23 |
| Bajrang | M | 2020-01-07 | 7 |
| Khali | M | 2019-12-25 | 11 |
| Slaman | M | 2019-12-30 | 13 |
| Joe | M | 2019-12-31 | 3 |
| Jose | M | 2019-12-18 | 2 |
| Priya | F | 2019-12-31 | 23 |
| Priyanka | F | 2019-12-30 | 17 |
+-------------+--------+------------+--------------+
row_number() over(…)
row_number会根据over后的条件进行分组和排序(可以只分组,也可以只排序)
我们现在可以根据score_points对所有人进行排序
select *,row_number() over(order by score_points) as `rank` from Scores;
#结果
| player_name | gender | day | score_points | rank |
| ----------- | ------ | ---------- | ------------ | ---- |
| Jose | M | 2019-12-18 | 2 | 1 |
| Joe | M | 2019-12-31 | 3 | 2 |
| Bajrang | M | 2020-01-07 | 7 | 3 |
| Khali | M | 2019-12-25 | 11 | 4 |
| Slaman | M | 2019-12-30 | 13 | 5 |
| Aron | F | 2020-01-01 | 17 | 6 |
| Priyanka | F | 2019-12-30 | 17 | 7 |
| Alice | F | 2020-01-07 | 23 | 8 |
| Priya | F | 2019-12-31 | 23 | 9 |
大家发现其实可以直接用order by呀,为什么用窗口函数多此一举呢。以刚才的排序需求来讲确实是这样的,但是窗口函数的核心是分组之后会保留所有数据,我们用group by和窗口函数做一个比较
group by
select * from Scores group by gender;
#结果
| player_name | gender | day | score_points |
| ----------- | ------ | ---------- | ------------ |
| Aron | F | 2020-01-01 | 17 |
| Bajrang | M | 2020-01-07 | 7 |
窗口函数
select *,row_number() over(partition by gender order by score_points) as `rank` from Scores;
#结果
| player_name | gender | day | score_points | rank |
| ----------- | ------ | ---------- | ------------ | ---- |
| Aron | F | 2020-01-01 | 17 | 1 |
| Priyanka | F | 2019-12-30 | 17 | 2 |
| Alice | F | 2020-01-07 | 23 | 3 |
| Priya | F | 2019-12-31 | 23 | 4 |
| Jose | M | 2019-12-18 | 2 | 1 |
| Joe | M | 2019-12-31 | 3 | 2 |
| Bajrang | M | 2020-01-07 | 7 | 3 |
| Khali | M | 2019-12-25 | 11 | 4 |
| Slaman | M | 2019-12-30 | 13 | 5 |
从上面的对比很明显可以看到窗口函数的优点,特别是处理前几名的信息类似问题非常强大。
除了row_number之外还有两种与之类似的窗口函数,但是有所区别
rank() over(…)
select *,rank() over(order by score_points) as `rank` from Scores;
#结果
| player_name | gender | day | score_points | rank |
| ----------- | ------ | ---------- | ------------ | ---- |
| Jose | M | 2019-12-18 | 2 | 1 |
| Joe | M | 2019-12-31 | 3 | 2 |
| Bajrang | M | 2020-01-07 | 7 | 3 |
| Khali | M | 2019-12-25 | 11 | 4 |
| Slaman | M | 2019-12-30 | 13 | 5 |
| Aron | F | 2020-01-01 | 17 | 6 |
| Priyanka | F | 2019-12-30 | 17 | 6 |
| Alice | F | 2020-01-07 | 23 | 8 |
| Priya | F | 2019-12-31 | 23 | 8 |
很明显的可以看出,当排序的数值相同时,他们的排名是一致的,直到下个数值时会变成它的行号
dense_rank() over(…)
select *,dense_rank() over(order by score_points) as `rank` from Scores;
#结果
| player_name | gender | day | score_points | rank |
| ----------- | ------ | ---------- | ------------ | ---- |
| Jose | M | 2019-12-18 | 2 | 1 |
| Joe | M | 2019-12-31 | 3 | 2 |
| Bajrang | M | 2020-01-07 | 7 | 3 |
| Khali | M | 2019-12-25 | 11 | 4 |
| Slaman | M | 2019-12-30 | 13 | 5 |
| Aron | F | 2020-01-01 | 17 | 6 |
| Priyanka | F | 2019-12-30 | 17 | 6 |
| Alice | F | 2020-01-07 | 23 | 7 |
| Priya | F | 2019-12-31 | 23 | 7 |
很明显的可以看出,当排序的数值相同时,他们的排名是一致的,直到下个数值时会以上一个排名+1
这三种窗口函数在特定的场景下看情况进行使用,来满足不同的需求。
聚合窗口函数
- sum() over(order by col rows n preceding) 表示根据col列排序,然后计算
- max() over(…)
- min() over(…)
- count() over(…)
- avg() over(…)
这里的聚合窗口函数用于求累积值,累计计数等,并且可以限制计算的行数,以sum为例,其他函数同理
select *,sum(score_points) over(order by day rows 1 preceding) as rk from Scores;
#结果
| player_name | gender | day | score_points | rk |
| ----------- | ------ | ---------- | ------------ | -- |
| Jose | M | 2019-12-18 | 2 | 2 |
| Khali | M | 2019-12-25 | 11 | 13 |
| Slaman | M | 2019-12-30 | 13 | 24 |
| Priyanka | F | 2019-12-30 | 17 | 30 |
| Joe | M | 2019-12-31 | 3 | 20 |
| Priya | F | 2019-12-31 | 23 | 26 |
| Aron | F | 2020-01-01 | 17 | 40 |
| Alice | F | 2020-01-07 | 23 | 40 |
| Bajrang | M | 2020-01-07 | 7 | 30 |
每一行的rk值都是由这一行与前一行(这里做了rows 1 preceding限制)的score_points和。同样可以根据需求来使用