学习MySQL(五):窗口函数

​​​​​​​窗口函数介绍

窗口函数的引入是为了解决想要既显示聚集前的数据,又要显示聚集后的数据;窗口数对一组值进行操作,不需要使用GROUP BY子句对数据进行分组,能够在同一行中同时返回基础行的列和聚合列。

强调:使用MySQL 8.0版本方可实现

基本语法

函数名(列) over(选项) 选项为partition by 列 order by 列

解释:

  • over(partition by xxx) 按xxx分组的所有行进行分组
  • over(partition by xxx order by aaa) 按列xxx分组,按列aaa排序
  • over(order by aaa) 按aaa列排序
  • over括号中的partition by和order by的使用根据具体情况选择
-- 需求:计算每个学生的及格科目数
-- 使用聚合函数,类似数据透视表,原有表结构已发生变化
SELECT student_id,count( sid ) FROM score WHERE    num >= 60 GROUP BY student_id;


-- 使用窗口函数,不会更改原表结构
SELECT student_id,count( sid ) over ( PARTITION BY student_id ORDER BY student_id ) AS 及格数
FROM score WHERE num >= 60;

聚合窗口函数

语法:聚合函数(列) over(partition by 列 order by 列)

常见的聚合函数:sum() count() avg() max() min()

排序窗口函数

  • row_number():仅仅根据行号进行排序,相同结果则排序按照顺序依次排
  • rank():排名,与row_number函数不同的是,rank函数考虑到了over子句中排序字段值相同的情况,over子句中排序字段值相同的排序结果是一样的,后面字段值不相同的序号将跳过相同的排名号排下一个。如:11335
  • dense_rank():密集排序,用法跟rank类似,唯一不同是当排序结果相同时,排序不跳跃,而是紧跟排下一个。如:11223
  • ntile():桶排序,首先,ntile会先根据你的分组依据,然后把每个组的总记录数进行按照你给的ntile(n)里的数字n进行均分,这个数字就是桶数,例如一个组内总共12条记录,若n=6,则等划分成6桶,然后按照num的排序等级划分,12/6=2则每个桶两条记录,也就是112233445566的排序结果,常用于提取前百分之多少的应用场景。

都是排名函数,不同之处在对于名次相同的数据处理方式

-- 对每门课程进行排序
SELECT
    s.sid,
    s1.sname,
    s1.gender,
    c.cname,
    s.num,
    row_number() over ( PARTITION BY c.cname ORDER BY num DESC ) AS row_number排名,
    rank() over ( PARTITION BY c.cname ORDER BY num DESC ) AS rank排名,
    dense_rank() over ( PARTITION BY c.cname ORDER BY num DESC ) AS dense_rank排名,
    ntile( 6 ) over ( PARTITION BY c.cname ORDER BY num DESC ) AS ntile排名 
FROM
    score AS s
    JOIN student AS s1 ON s.student_id = s1.sid
    LEFT JOIN course AS c ON s.course_id = c.cid


-- 计算每门课程前三,考虑排名相同的情况
SELECT * FROM (
    SELECT 
        s.sid,
        s1.sname,
        s1.gender,
        c.cname,
        s.num,
        dense_rank() over ( PARTITION BY c.cname ORDER BY num DESC ) AS dense_rank排名 
    FROM
        score AS s
        JOIN student AS s1 ON s.student_id = s1.sid
        LEFT JOIN course AS c ON s.course_id = c.cid 
    ) AS a 
WHERE
    dense_rank排名 <=3

位置移动窗口函数

  • lag(col,n):col列名,n行数,用于统计窗口内往上第n行值
  • lead(col,n):col列名,n行数,用于统计窗口内往下第n行值

这两个函数可以用于同列中相邻行的数据计算

应用场景:

  • 计算作弊次数
-- 需求:对于下面的数据,对于同一用户(uid)如果在2分钟之内重新登陆,则判断为作弊,统计哪些用户有作弊行为,并计算作弊次数
-- 数据代码
CREATE TABLE lead_table (
    id INT PRIMARY KEY,
    uid INT NOT NULL,
    login_time datetime NOT NULL );
INSERT INTO lead_table
VALUES
    ( 1, 1, "2020-8-26 12:59:00" ),
    ( 2, 1, "2020-8-26 13:02:23" ),
    ( 3, 1, "2020-8-26 13:03:34" ),
    ( 4, 1, "2020-8-26 13:09:00" ),
    ( 5, 2, "2020-8-26 13:57:00" ),
    ( 6, 2, "2020-8-26 13:59:00" ),
    ( 7, 2, "2020-8-26 13:59:45" );

思路:根据题目要求,如果能把相邻两列的下面一列与上面那一列变成同一行,不久能实现相减了么,因此我们可以多生成一列,例如:把uid都为1的第二行记录生成到第一行,以此类推,这就可以用到lead往下移动的操作了

-- 第一步
SELECT id,uid,login_time,
LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time ) lead_time
FROM lead_table;


--第二步
SELECT id,uid,login_time,
LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time ) lead_time,
TIMESTAMPDIFF(
    SECOND,login_time,
    LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time )) AS 相差秒数 
FROM lead_table;
    
-- 最终代码
SELECT uid,COUNT( 1 ) AS 作弊次数
FROM
    (
    SELECT id,uid,login_time,
 LEAD( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time ) lead_time,
 TIMESTAMPDIFF(
     SECOND,login_time,
     lead( login_time, 1 ) OVER ( PARTITION BY uid ORDER BY login_time )) AS 相差秒数 
FROM lead_table ) AS e 
WHERE 相差秒数 <= 120 
GROUP BY uid;
  • 计算次日留存率

其他窗口函数

  • first_value(column):取分组排序后第一个值
SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,
FIRST_VALUE(num) OVER(PARTITION by c.cname ORDER BY num DESC) AS first_value用法
FROM score AS s
JOIN student AS s1 ON s.student_id = s1.sid
LEFT JOIN course AS c ON s.course_id = c.cid
  • last_value(column):取分组排序后最后一个值
SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,
LAST_VALUE(num) OVER(PARTITION by c.cname ORDER BY num DESC) AS last_value用法
FROM score AS s
JOIN student AS s1 ON s.student_id = s1.sid
LEFT JOIN course AS c ON s.course_id = c.cid

为什么和想要的结果不一样呢?

实际上,窗口函数默认统计范围是rows between unbounded preceding and current row,也就是取当前行数据与当前行之前的数据的比较。

修改SQL,在order by条件的后面加上语句:rows between unbounded preceding and unbounded following,可以理解为:当前分组数据中的所有数据进行比较,取最后一条记录。

SELECT s.sid,s1.sname,s1.gender,c.cname,s.num,
LAST_VALUE(num) OVER(PARTITION by c.cname ORDER BY num DESC rows between unbounded preceding and unbounded following) AS last_value用法
FROM score AS s
JOIN student AS s1 ON s.student_id = s1.sid
LEFT JOIN course AS c ON s.course_id = c.cid

详细介绍:

  • rows between XXX and XXX
  • unbounded 无限制的
  • preceding 分区的当前记录的向前偏移量
  • current 当前
  • following 分区的当前记录的向后偏移量

示例:累计计算每个月的销售额

-- 示例数据
CREATE TABLE sale (
    id INT PRIMARY KEY auto_increment,
    年份 INT,
    月份 INT,
money FLOAT ( 10, 2 ));
INSERT INTO sale ( 年份, 月份, money )
VALUES
    ( 2020, 1, 5840 ),
    ( 2020, 2, 5780 ),
    ( 2020, 3, 4300 ),
    ( 2020, 4, 4760 ),
    ( 2020, 5, 3630 ),
    ( 2020, 6, 4130 ),
    ( 2020, 7, 4350 );
 
-- 语句
SELECT *,
    sum( money ) over ( ORDER BY 月份
    rows between unbounded preceding and current row) AS 累计销售额 
FROM sale;

本章示例数据

CREATE DATABASE school;
USE school;
CREATE TABLE class (
    cid INT ( 11 ) NOT NULL auto_increment,
    caption VARCHAR ( 32 ) NOT NULL,
    PRIMARY KEY ( cid ) 
) ENGINE = INNODB charset = utf8;


INSERT INTO class
VALUES
    ( 1, '三年二班' ),
    ( 2, '三年三班' ),
    ( 3, '二年二班' ),
    ( 4, '一年二班' ),
    ( 5, '二年五班' );
 
CREATE TABLE teacher (
    tid INT ( 11 ) NOT NULL auto_increment,
    tname VARCHAR ( 32 ) NOT NULL,
    PRIMARY KEY ( tid )
) ENGINE = INNODB DEFAULT charset = utf8;


INSERT INTO teacher
VALUES
    ( 1, '张磊老师' ),
    ( 2, '李平老师' ),
    ( 3, '刘兰老师' ),
    ( 4, '朱朱老师' ),
    ( 5, '李杰老师' );
 
CREATE TABLE course (
    cid INT ( 11 ) NOT NULL auto_increment,
    cname VARCHAR ( 32 ) NOT NULL,
    teacher_id INT ( 11 ) NOT NULL,
    PRIMARY KEY ( cid ),
    KEY fk_couurse_teacher ( teacher_id ),
    CONSTRAINT fk_course_teacher FOREIGN KEY ( teacher_id ) REFERENCES teacher ( tid ) 
) ENGINE = INNODB DEFAULT charset = utf8;
INSERT INTO course
VALUES
    ( 1, '生物', 1 ),
    ( 2, '物理', 2 ),
    ( 3, '体育', 3 ),
    ( 4, '美术', 2 );


CREATE TABLE student (
    sid INT ( 11 ) NOT NULL auto_increment,
    gender CHAR ( 1 ) NOT NULL,
    class_id INT ( 11 ) NOT NULL,
    sname VARCHAR ( 32 ) NOT NULL,
    PRIMARY KEY ( sid ),
    KEY fk_class ( class_id ),
    CONSTRAINT fk_class FOREIGN KEY ( class_id ) REFERENCES class ( cid ) 
) ENGINE = INNODB DEFAULT charset = utf8;


INSERT INTO student
VALUES
    ( 1, '男', 1, '理解' ),
    ( 2, '女', 1, '钢蛋' ),
    ( 3, '男', 1, '张三' ),
    ( 4, '男', 1, '张思' ),
    ( 5, '女', 1, '网易' ),
    ( 6, '男', 1, '王二' ),
    ( 7, '男', 2, '铁道' ),
    ( 8, '男', 2, '李武' ),
    ( 9, '男', 2, '刘三' ),
    ( 10, '女', 2, '刘一' ),
    ( 11, '男', 2, '刘思' ),
    ( 12, '男', 3, '王三' ),
    ( 13, '男', 3, '小五' ),
    ( 14, '男', 3, '小七' ),
    ( 15, '女', 3, '如花' ),
    ( 16, '男', 3, '张四' );
 
CREATE TABLE score (
    sid INT ( 11 ) NOT NULL auto_increment,
    student_id INT ( 11 ) NOT NULL,
    course_id INT ( 11 ) NOT NULL,
    num INT ( 11 ) NOT NULL,
    PRIMARY KEY ( sid ),
    KEY fk_score_student ( student_id ),
    KEY fk_score_course ( course_id ),
    CONSTRAINT fk_score_course FOREIGN KEY ( course_id ) REFERENCES course ( cid ),
    CONSTRAINT fk_score_student FOREIGN KEY ( student_id ) REFERENCES student ( sid ) 
) ENGINE = INNODB DEFAULT charset = utf8;


INSERT INTO score
VALUES
    ( 1, 1, 1, 10 ),
    ( 2, 1, 2, 9 ),
    ( 5, 1, 4, 66 ),
    ( 6, 2, 1, 8 ),
    ( 8, 2, 3, 68 ),
    ( 9, 2, 4, 99 ),
    ( 10, 3, 1, 77 ),
    ( 11, 3, 2, 66 ),
    ( 12, 3, 3, 87 ),
    ( 13, 3, 4, 99 ),
    ( 14, 4, 1, 79 ),
    ( 15, 4, 2, 11 ),
    ( 16, 4, 3, 67 ),
    ( 17, 4, 4, 100 ),
    ( 18, 5, 1, 79 ),
    ( 19, 5, 2, 11 ),
    ( 20, 5, 3, 67 ),
    ( 21, 5, 4, 100 );

来自: 学习MySQL(五):窗口函数

  • 7
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Linux技术宅

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值