窗口函数介绍
窗口函数的引入是为了解决想要既显示聚集前的数据,又要显示聚集后的数据;窗口数对一组值进行操作,不需要使用GROUP BY子句对数据进行分组,能够在同一行中同时返回基础行的列和聚合列。
强调:使用MySQL 8.0版本方可实现
基本语法
函数名(列) over(选项) 选项为partition by 列 order by 列
解释:
- over(partition by xxx) 按xxx分组的所有行进行分组
- over(partition by xxx order by aaa) 按列xxx分组,按列aaa排序
- over(order by aaa) 按aaa列排序
- over括号中的partition by和order by的使用根据具体情况选择
-- 需求:计算每个学生的及格科目数
-- 使用聚合函数,类似数据透视表,原有表结构已发生变化
SELECT student_id,count( sid ) FROM score WHERE num >= 60 GROUP BY student_id;
-- 使用窗口函数,不会更改原表结构
SELECT student_id,count( sid ) over ( PARTITION BY student_id ORDER BY student_id ) AS 及格数
FROM score WHERE num >= 60;
聚合窗口函数
语法:聚合函数(列) over(partition by 列 order by 列)
常见的聚合函数:sum() count() avg() max() min()
排序窗口函数
- row_number():仅仅根据行号进行排序,相同结果则排序按照顺序依次排
- rank():排名,与row_number函数不同的是,rank函数考虑到了over子句中排序字段值相同的情况,over子句中排序字段值相同的排序结果是一样的,后面字段值不相同的序号将跳过相同的排名号排下一个。如:11335
- dense_rank():密集排序,用法跟rank类似,唯一不同是当排序结果相同时,排序不跳跃,而是紧跟排下一个。如:11223
- ntile():桶排序,首先,ntile会先根据你的分组依据,然后把每个组的总记录数进行按照你给的ntile(n)里的数字n进行均分,这个数字就是桶数,例如一个组内总共12条记录,若n=6,则等划分成6桶,然后按照num的排序等级划分,12/6=2则每个桶两条记录,也就是112233445566的排序结果,常用于提取前百分之多少的应用场景。
都是排名函数,不同之处在对于名次相同的数据处理方式
-- 对每门课程进行排序
SELECT
s.sid,
s1.sname,
s1.gender,
c.cname,
s.num,
row_number() over ( PARTITION BY c.cname ORDER BY num DESC ) AS row_number排名,
rank() over ( PARTITION BY c.cname ORDER BY num DESC ) AS rank排名,
dense_rank() over ( PARTITION BY c.cname ORDER BY num DESC ) AS dense_rank排名,
ntile( 6 ) over ( PARTITION BY c.cname ORDER BY num DESC ) AS ntile排名
FROM
score AS s
JOIN student AS s1 ON s.student_id = s1.sid
LEFT JOIN course AS c ON s.course_id = c.cid
-- 计算每门课程前三,考虑排名相同的情况
SELECT * FROM (
SELECT
s.sid,
s1.sname,
s1.gender,
c.cname,
s.num,
dense_rank() over ( PARTITION BY c.cname ORDER BY num DESC ) AS dense_rank排名
FROM
score AS s
JOIN student AS s1 ON s.student_id = s1.sid
LEFT JOIN course AS c ON s.course_id = c.cid
) AS a
WHERE
dense_rank排名 <=3
位置移动窗口函数
- lag(col,n):col列名,n行数,用于统计窗口内往上第n行值
- lead(col,n):col列名,n行数,用于统计窗口内往下第n行值
这两个函数可以用于同列中相邻行的数据计算
应用场景:
- 计算作弊次数
-- 需求:对于下面的数据,对于同一用户(uid)如果在2分钟之内重新登陆,则判断为作弊,统计哪些用户有作弊行为,并计算作弊次数
-- 数据代码
CREATE TABLE lead_table (
id INT PRIMARY KEY,
uid INT NOT NULL,
login_time datetime NOT NULL );
INSERT INTO lead_table
VALUES
( 1, 1, "2020-8-26 12:59:00" ),
( 2, 1, "2020-8-26 13:02:23" ),
( 3, 1, "2020-8-26 13:03:34" ),
( 4, 1, "2020-8-26 13:09:00" ),
( 5, 2, "2020-8-26 13:57:00" ),
( 6, 2, "2020-8-26 13:59:00" ),
( 7, 2, "2020-8-26 13:59:45" );
思路:根据题目要求,如果能把相邻两列的下面一列与上面那一列变成同一行,不久能实现相减了么,因此我们可以多生成一列,例如:把uid都为1的第二行记录生成到第一行,以此类推,这就可以用到lead往下移动的操作了
-- 第一步
SELECT id,uid,login_time,
LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time ) lead_time
FROM lead_table;
--第二步
SELECT id,uid,login_time,
LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time ) lead_time,
TIMESTAMPDIFF(
SECOND,login_time,
LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time )) AS 相差秒数
FROM lead_table;
-- 最终代码
SELECT uid,COUNT( 1 ) AS 作弊次数
FROM
(
SELECT id,uid,login_time,
LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time ) lead_time,
TIMESTAMPDIFF(
SECOND,login_time,
lead( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time )) AS 相差秒数
FROM lead_table ) AS e
WHERE 相差秒数 <= 120
GROUP BY uid;
- 计算次日留存率
其他窗口函数
- first_value(column):取分组排序后第一个值
SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,
FIRST_VALUE(num) OVER(PARTITION by c.cname ORDER BY num DESC) AS first_value用法
FROM score AS s
JOIN student AS s1 ON s.student_id = s1.sid
LEFT JOIN course AS c ON s.course_id = c.cid
- last_value(column):取分组排序后最后一个值
SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,
LAST_VALUE(num) OVER(PARTITION by c.cname ORDER BY num DESC) AS last_value用法
FROM score AS s
JOIN student AS s1 ON s.student_id = s1.sid
LEFT JOIN course AS c ON s.course_id = c.cid
为什么和想要的结果不一样呢?
实际上,窗口函数默认统计范围是rows between unbounded preceding and current row,也就是取当前行数据与当前行之前的数据的比较。
修改SQL,在order by条件的后面加上语句:rows between unbounded preceding and unbounded following,可以理解为:当前分组数据中的所有数据进行比较,取最后一条记录。
SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,
LAST_VALUE(num) OVER(PARTITION by c.cname ORDER BY num DESC rows between unbounded preceding and unbounded following) AS last_value用法
FROM score AS s
JOIN student AS s1 ON s.student_id = s1.sid
LEFT JOIN course AS c ON s.course_id = c.cid
详细介绍:
- rows between XXX and XXX
- unbounded 无限制的
- preceding 分区的当前记录的向前偏移量
- current 当前
- following 分区的当前记录的向后偏移量
示例:累计计算每个月的销售额
-- 示例数据
CREATE TABLE sale (
id INT PRIMARY KEY auto_increment,
年份 INT,
月份 INT,
money FLOAT ( 10, 2 ));
INSERT INTO sale ( 年份, 月份, money )
VALUES
( 2020, 1, 5840 ),
( 2020, 2, 5780 ),
( 2020, 3, 4300 ),
( 2020, 4, 4760 ),
( 2020, 5, 3630 ),
( 2020, 6, 4130 ),
( 2020, 7, 4350 );
-- 语句
SELECT *,
sum( money ) over ( ORDER BY 月份
rows between unbounded preceding and current row) AS 累计销售额
FROM sale;
本章示例数据
CREATE DATABASE school;
USE school;
CREATE TABLE class (
cid INT ( 11 ) NOT NULL auto_increment,
caption VARCHAR ( 32 ) NOT NULL,
PRIMARY KEY ( cid )
) ENGINE = INNODB charset = utf8;
INSERT INTO class
VALUES
( 1, '三年二班' ),
( 2, '三年三班' ),
( 3, '二年二班' ),
( 4, '一年二班' ),
( 5, '二年五班' );
CREATE TABLE teacher (
tid INT ( 11 ) NOT NULL auto_increment,
tname VARCHAR ( 32 ) NOT NULL,
PRIMARY KEY ( tid )
) ENGINE = INNODB DEFAULT charset = utf8;
INSERT INTO teacher
VALUES
( 1, '张磊老师' ),
( 2, '李平老师' ),
( 3, '刘兰老师' ),
( 4, '朱朱老师' ),
( 5, '李杰老师' );
CREATE TABLE course (
cid INT ( 11 ) NOT NULL auto_increment,
cname VARCHAR ( 32 ) NOT NULL,
teacher_id INT ( 11 ) NOT NULL,
PRIMARY KEY ( cid ),
KEY fk_couurse_teacher ( teacher_id ),
CONSTRAINT fk_course_teacher FOREIGN KEY ( teacher_id ) REFERENCES teacher ( tid )
) ENGINE = INNODB DEFAULT charset = utf8;
INSERT INTO course
VALUES
( 1, '生物', 1 ),
( 2, '物理', 2 ),
( 3, '体育', 3 ),
( 4, '美术', 2 );
CREATE TABLE student (
sid INT ( 11 ) NOT NULL auto_increment,
gender CHAR ( 1 ) NOT NULL,
class_id INT ( 11 ) NOT NULL,
sname VARCHAR ( 32 ) NOT NULL,
PRIMARY KEY ( sid ),
KEY fk_class ( class_id ),
CONSTRAINT fk_class FOREIGN KEY ( class_id ) REFERENCES class ( cid )
) ENGINE = INNODB DEFAULT charset = utf8;
INSERT INTO student
VALUES
( 1, '男', 1, '理解' ),
( 2, '女', 1, '钢蛋' ),
( 3, '男', 1, '张三' ),
( 4, '男', 1, '张思' ),
( 5, '女', 1, '网易' ),
( 6, '男', 1, '王二' ),
( 7, '男', 2, '铁道' ),
( 8, '男', 2, '李武' ),
( 9, '男', 2, '刘三' ),
( 10, '女', 2, '刘一' ),
( 11, '男', 2, '刘思' ),
( 12, '男', 3, '王三' ),
( 13, '男', 3, '小五' ),
( 14, '男', 3, '小七' ),
( 15, '女', 3, '如花' ),
( 16, '男', 3, '张四' );
CREATE TABLE score (
sid INT ( 11 ) NOT NULL auto_increment,
student_id INT ( 11 ) NOT NULL,
course_id INT ( 11 ) NOT NULL,
num INT ( 11 ) NOT NULL,
PRIMARY KEY ( sid ),
KEY fk_score_student ( student_id ),
KEY fk_score_course ( course_id ),
CONSTRAINT fk_score_course FOREIGN KEY ( course_id ) REFERENCES course ( cid ),
CONSTRAINT fk_score_student FOREIGN KEY ( student_id ) REFERENCES student ( sid )
) ENGINE = INNODB DEFAULT charset = utf8;
INSERT INTO score
VALUES
( 1, 1, 1, 10 ),
( 2, 1, 2, 9 ),
( 5, 1, 4, 66 ),
( 6, 2, 1, 8 ),
( 8, 2, 3, 68 ),
( 9, 2, 4, 99 ),
( 10, 3, 1, 77 ),
( 11, 3, 2, 66 ),
( 12, 3, 3, 87 ),
( 13, 3, 4, 99 ),
( 14, 4, 1, 79 ),
( 15, 4, 2, 11 ),
( 16, 4, 3, 67 ),
( 17, 4, 4, 100 ),
( 18, 5, 1, 79 ),
( 19, 5, 2, 11 ),
( 20, 5, 3, 67 ),
( 21, 5, 4, 100 );
来自: 学习MySQL(五):窗口函数