SQL(进阶实战03)

1. 第二快/慢用时之差试卷时长一半的试卷

现有试卷信息表examination_info(exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间):

idexam_idtagdifficultydurationrelease_time
19001SQLhard602021-09-01 06:00:00
29002C++hard602021-09-01 06:00:00
39003算法medium802021-09-01 10:00:00

试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):

iduidexam_idstart_timesubmit_timescore
1100190012021-09-01 09:01:012021-09-01 09:51:0178
2100190022021-09-01 09:01:012021-09-01 09:31:0081
3100290022021-09-01 12:01:012021-09-01 12:31:0181
4100390012021-09-01 19:01:012021-09-01 19:59:0186
5100390022021-09-01 12:01:012021-09-01 12:31:5189
6100490022021-09-01 19:01:012021-09-01 19:30:0185
7100590012021-09-01 12:01:012021-09-01 12:31:0285
8100690012021-09-07 10:01:012021-09-07 10:21:0184
9100390012021-09-08 12:01:012021-09-08 12:11:0140
10100390022021-09-01 14:01:01(NULL)(NULL)
11100590012021-09-01 14:01:01(NULL)(NULL)
12100390032021-09-08 15:01:01(NULL)(NULL)

找到第二快和第二慢用时之差大于试卷时长的一半的试卷信息,按试卷ID降序排序。由示例数据结果输出如下:

exam_iddurationrelease_time
9001602021-09-01 06:00:00

解释:试卷9001被作答用时有50分钟、50分钟、30分1秒、11分钟、10分钟,第二快和第二慢用时之差为50分钟-11分钟=39分钟,试卷时长为60分钟,因此满足大于试卷时长一半的条件,输出试卷ID、时长、发布时间。

示例1
drop table if exists examination_info,exam_record;
CREATE TABLE examination_info (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    exam_id int UNIQUE NOT NULL COMMENT '试卷ID',
    tag varchar(32) COMMENT '类别标签',
    difficulty varchar(8) COMMENT '难度',
    duration int NOT NULL COMMENT '时长',
    release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

CREATE TABLE exam_record (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid int NOT NULL COMMENT '用户ID',
    exam_id int NOT NULL COMMENT '试卷ID',
    start_time datetime NOT NULL COMMENT '开始时间',
    submit_time datetime COMMENT '提交时间',
    score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES
  (9001, 'SQL', 'hard', 60, '2021-09-01 06:00:00'),
  (9002, 'C++', 'hard', 60, '2021-09-01 06:00:00'),
  (9003, '算法', 'medium', 80, '2021-09-01 10:00:00');

INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2021-09-01 09:01:01', '2021-09-01 09:51:01', 78),
(1001, 9002, '2021-09-01 09:01:01', '2021-09-01 09:31:00', 81),
(1002, 9002, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 81),
(1003, 9001, '2021-09-01 19:01:01', '2021-09-01 19:59:01', 86),
(1003, 9002, '2021-09-01 12:01:01', '2021-09-01 12:31:51', 89),
(1004, 9002, '2021-09-01 19:01:01', '2021-09-01 19:30:01', 85),
(1005, 9001, '2021-09-01 12:01:01', '2021-09-01 12:31:02', 85),
(1006, 9001, '2021-09-07 10:01:01', '2021-09-07 10:12:01', 84),
(1003, 9001, '2021-09-08 12:01:01', '2021-09-08 12:11:01', 40),
(1003, 9002, '2021-09-01 14:01:01', null, null),
(1005, 9001, '2021-09-01 14:01:01', null, null),
(1003, 9003, '2021-09-08 15:01:01', null, null);
输出

9001|60|2021-09-01 06:00:00

思路

第一步:
1.找到每份试卷的ID、限制时间、发布时间以及每份试卷被完成的耗时、耗时降序排名和增序排名。这里的不用去重,也不用分组,因为每一份都要计算耗时:

2.试卷信息与做题信息分布在两个表中,因此要将两个表以exam_id连接。知识点:join...on...

3.只查询有提交时间的时间,筛掉没做完的试卷。知识点:where

4.试卷ID、限制时间、发布时间直接获取,
完成的耗时使用timestampdiff函数根据开始时间和提交时间计算分钟数差值。
知识点:timestampdiff()

5.利用分组聚合排名对每一种试卷的完成耗时分别进行增序排名和降序排名。
知识点:row_number() over partition by

第二步:
1.根据上面筛选出来的信息查询每份试卷的限制时间、发布时间及第二快与第二慢的差值:
2.每张不同的试卷都会有一个数据,因此以试卷ID作为分组。知识点:group by
3.每组试卷的ID、限制时间、发布时间都可以由table1直接查询到。
4.将每组试卷的完成时间累加,只有当最慢排名为2时才加正值,最快排名为2时加负值,其余情况加0.这样刚好是用时第二多减去用时第二少。 
5.sum(case when rank1=2 then time
        when rank2=2 then -time
        else 0 
        end
       ) as sub
知识点:sum()case when...then...when...then...else...end

题解
select 
 exam_id,duration,release_time
from (
	select exam_id,duration,release_time,
	sum(case when rank1=2 then time
        when rank2=2 then -time
        else 0 
        end
       ) as sub
	from 
	(
    select 
	info.exam_id,info.duration,info.release_time,
	row_number()over(partition by record.exam_id order by timestampdiff(minute,start_time,submit_time)desc) as rank1,      
	row_number()over(partition by record.exam_id order by timestampdiff(minute,start_time,submit_time)) as rank2,
timestampdiff(minute,record.start_time,record.submit_time) as time
	from examination_info info
	inner join exam_record record
	on info.exam_id=record.exam_id
	where record.submit_time is not null 
	)t1
	group by exam_id
)t2
where sub>=duration/2
order by exam_id desc

2.连续两次作答试卷的最大时间窗

现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):

iduidexam_idstart_timesubmit_timescore
1100690032021-09-07 10:01:012021-09-07 10:21:0284
2100690012021-09-01 12:11:012021-09-01 12:31:0189
3100690022021-09-06 10:01:012021-09-06 10:21:0181
4100590022021-09-05 10:01:012021-09-05 10:21:0181
5100590012021-09-05 10:31:012021-09-05 10:51:0181

请计算在2021年至少有两天作答过试卷的人中,计算该年连续两次作答试卷的最大时间窗days_window,那么根据该年的历史规律他在days_window天里平均会做多少套试卷,按最大时间窗和平均做答试卷套数倒序排序。由示例数据结果输出如下:

uiddays_windowavg_exam_cnt
100662.57

解释:用户1006分别在20210901、20210906、20210907作答过3次试卷,连续两次作答最大时间窗为6天(1号到6号),他1号到7号这7天里共做了3张试卷,平均每天3/7=0.428571张,那么6天里平均会做0.428571*6=2.57张试卷(保留两位小数);

用户1005在20210905做了两张试卷,但是只有一天的作答记录,过滤掉。

示例1
drop table if exists exam_record;
CREATE TABLE exam_record (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int NOT NULL COMMENT '用户ID',
exam_id int NOT NULL COMMENT '试卷ID',
start_time datetime NOT NULL COMMENT '开始时间',
submit_time datetime COMMENT '提交时间',
score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1006, 9003, '2021-09-07 10:01:01', '2021-09-07 10:21:02', 84),
(1006, 9001, '2021-09-01 12:11:01', '2021-09-01 12:31:01', 89),
(1006, 9002, '2021-09-06 10:01:01', '2021-09-06 10:21:01', 81),
(1005, 9002, '2021-09-05 10:01:01', '2021-09-05 10:21:01', 81),
(1005, 9001, '2021-09-05 10:31:01', '2021-09-05 10:51:01', 81),
(1001, 9003, '2021-09-01 09:01:01', '2021-09-01 09:51:11', 78),
(1001, 9002, '2021-09-01 09:01:01', '2021-09-01 09:31:00', 81),
(1002, 9002, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 81),
(1003, 9003, '2021-09-01 19:01:01', '2021-09-01 19:59:01', 86),
(1004, 9002, '2021-09-01 19:01:01', '2021-09-01 19:10:01', 85),
(1005, 9003, '2021-09-01 12:01:01', '2021-09-01 12:31:02', 85),
(1003, 9003, '2021-09-07 12:01:01', '2021-09-07 12:11:01', 40),
(1003, 9003, '2021-09-08 15:01:01', null, null),
(1003, 9002, '2020-09-01 13:01:01', '2020-09-01 13:58:01', 81);
输出
100372.63
100662.57
100553.00
思路
1.统计2021年每人总作答数、最早最晚相隔天数、最大连续作答间隔,生成子表 
2.生成2021年每次作答试卷的下次作答时间,
筛选2021年的作答记录:WHERE YEAR(start_time)=2021
生成下次作答时间,按用户分区按作答时间升序:
lead(start_time) over(PARTITION BY uid ORDER BY start_time) as next_start_time
按用户分组:GROUP BY uid
统计此人作答的总试卷数:count(start_time) as exam_cnt
统计最早一次作答和最晚一次作答的相差天数:DATEDIFF(max(start_time), min(start_time))+1 as diff_days
统计两次作答的最大时间窗:max(DATEDIFF(next_start_time, start_time))+1 as days_window
筛选最早最晚相差天数大于1,即至少活跃两天的记录:WHERE diff_days > 1
计算平均能做多少套试卷:ROUND(days_window * exam_cnt / diff_days, 2) as avg_exam_cnt


了解lead()函数,以及实战(计算用户的平均次日留存率,拓展)
https://blog.csdn.net/weixin_44464850/article/details/124334316?spm=1001.2014.3001.5502

题解

select uid,days_window,round((days_count*days_window/avg_exam_cnt),2) avg_exam_cnt
from(
	select 
	uid,
	max(datediff(days2,days1)+1) as days_window	,
	count(days1)as days_count,
	datediff(max(days1),min(days1))+1 as avg_exam_cnt
	from 
	(
	select 
	uid,
	exam_id,
	date_format(start_time,"%Y-%m-%d") as days1,
	lead(start_time,1)over (partition by uid order by start_time) as days2
    from exam_record
	where year(start_time)='2021'
	)t
	group by uid
)t1
ORDER BY days_window DESC, avg_exam_cnt DESC;

3.近三个月未完成试卷数为0的用户完成情况

现有试卷作答记录表exam_record(uid:用户ID, exam_id:试卷ID, start_time:开始作答时间, submit_time:交卷时间,为空的话则代表未完成, score:得分):

iduidexam_idstart_timesubmit_timescore
1100690032021-09-06 10:01:012021-09-06 10:21:0284
2100690012021-08-02 12:11:012021-08-02 12:31:0189
3100690022021-06-06 10:01:012021-06-06 10:21:0181
4100690022021-05-06 10:01:012021-05-06 10:21:0181
5100690012021-05-01 12:01:01(NULL)(NULL)
6100190012021-09-05 10:31:012021-09-05 10:51:0181
7100190032021-08-01 09:01:012021-08-01 09:51:1178
8100190022021-07-01 09:01:012021-07-01 09:31:0081
9100190022021-07-01 12:01:012021-07-01 12:31:0181
10100190022021-07-01 12:01:01(NULL)(NULL)

找到每个人近三个有试卷作答记录的月份中没有试卷是未完成状态的用户的试卷作答完成数,按试卷完成数和用户ID降序排名。由示例数据结果输出如下:

uidexam_complete_cnt
10063

解释:用户1006近三个有作答试卷的月份为202109、202108、202106,作答试卷数为3,全部完成;用户1001近三个有作答试卷的月份为202109、202108、202107,作答试卷数为5,完成试卷数为4,因为有未完成试卷,故过滤掉。

示例1
drop table if exists exam_record;
CREATE TABLE exam_record (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid int NOT NULL COMMENT '用户ID',
    exam_id int NOT NULL COMMENT '试卷ID',
    start_time datetime NOT NULL COMMENT '开始时间',
    submit_time datetime COMMENT '提交时间',
    score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1006, 9003, '2021-09-06 10:01:01', '2021-09-06 10:21:02', 84),
(1006, 9001, '2021-08-02 12:11:01', '2021-08-02 12:31:01', 89),
(1006, 9002, '2021-06-06 10:01:01', '2021-06-06 10:21:01', 81),
(1006, 9002, '2021-05-06 10:01:01', '2021-05-06 10:21:01', 81),
(1006, 9001, '2021-05-01 12:01:01', null, null),
(1001, 9001, '2021-09-05 10:31:01', '2021-09-05 10:51:01', 81),
(1001, 9003, '2021-08-01 09:01:01', '2021-08-01 09:51:11', 78),
(1001, 9002, '2021-07-01 09:01:01', '2021-07-01 09:31:00', 81),
(1001, 9002, '2021-07-01 12:01:01', '2021-07-01 12:31:01', 81),
(1001, 9002, '2021-07-01 12:01:01', null, null);
输出

1006|3

备注

按试卷完成数和用户ID降序排名

思路
1.统计每人每次试卷作答记录的月份排名
2.生成当次作答的月份排名,按用户ID分区按年月排序进行窗口连续排名:
DENSE_RANK() OVER
(PARTITION BY uid ORDER BY DATE_FORMAT(start_time, "%Y%m") DESC) as month_rank
3.筛选月份排名小于等于3的记录(每个人近三个月的):WHERE start_month_rank <= 3
4.按用户ID分组:GROUP BY uid
5.筛选都完成了的分组(无未完成试卷):(count(1)不会除去为空的成绩,而是统计共有多少数据)
HAVING COUNT(1) = COUNT(score)
题解
select uid,count(score)as exam_complete_cnt
from 
(
	select 
	uid,
	score,
	start_time,
	submit_time,
	dense_rank()
	over (partition by uid 
			order by date_format(start_time,"%Y%m")desc
		 )recent_month
	from exam_record
)t1
	where recent_month<=3
	group by uid
	having count(1)=count(score)
	order by exam_complete_cnt desc,uid desc

4. 未完成率较高的50%用户近三个月答卷情况 (重点)

现有用户信息表user_info(uid用户ID,nick_name昵称, achievement成就值, level等级, job职业方向, register_time注册时间):

iduidnick_nameachievementleveljobregister_time
11001牛客1号32007算法2020-01-01 10:00:00
21002牛客2号25006算法2020-01-01 10:00:00
31003牛客3号♂22005算法2020-01-01 10:00:00

试卷信息表:

idexam_idtagdifficultydurationrelease_time
19001SQLhard602020-01-01 10:00:00
29002SQLhard802020-01-01 10:00:00
39003算法hard802020-01-01 10:00:00
49004PYTHONmedium702020-01-01 10:00:00

试卷作答记录表:

iduidexam_idstart_timesubmit_timescore
1100190012020-01-01 09:01:012020-01-01 09:21:5990
15100290012020-01-01 18:01:012020-01-01 18:59:0290
13100190012020-01-02 10:01:012020-01-02 10:31:0189
2100290012020-01-20 10:01:01
3100290012020-02-01 12:11:01
5100190012020-03-01 12:01:01
6100290012020-03-01 12:01:012020-03-01 12:41:0190
4100390012020-03-01 19:01:01
7100290012020-05-02 19:01:012020-05-02 19:32:0090
14100190022020-01-01 12:11:01
8100190022020-01-02 19:01:012020-01-02 19:59:0169
9100190022020-02-02 12:01:012020-02-02 12:20:0199
10100290022020-02-02 12:01:01
11100290022020-02-02 12:01:012020-02-02 12:43:0181
12100290022020-03-02 12:11:01
17100190022020-05-05 18:01:01
16100290032020-05-06 12:01:01

请统计SQL试卷上未完成率较高的50%用户中,6级和7级用户在有试卷作答记录的近三个月中,

由示例数据结果输出如下:

uidstart_monthtotal_cntcomplete_cnt
100220200231
100220200321
100220200521

解释:各个用户对SQL试卷的未完成数、作答总数、未完成率如下:

uidincomplete_cnttotal_cntincomplete_rate
1001370.4286
1002480.5000
1003111.0000

1001、1002、1003分别排在1.0、0.5、0.0的位置,因此较高的50%用户(排位<=0.5)为1002、1003;

1003不是6级或7级;

有试卷作答记录的近三个月为202005、202003、202002;

这三个月里1002的作答题数分别为3、2、2,完成数目分别为1、1、1。

示例1
drop table if exists examination_info,user_info,exam_record;
CREATE TABLE examination_info (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    exam_id int UNIQUE NOT NULL COMMENT '试卷ID',
    tag varchar(32) COMMENT '类别标签',
    difficulty varchar(8) COMMENT '难度',
    duration int NOT NULL COMMENT '时长',
    release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

CREATE TABLE user_info (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid int UNIQUE NOT NULL COMMENT '用户ID',
    `nick_name` varchar(64) COMMENT '昵称',
    achievement int COMMENT '成就值',
    level int COMMENT '用户等级',
    job varchar(32) COMMENT '职业方向',
    register_time datetime COMMENT '注册时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

CREATE TABLE exam_record (
    id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid int NOT NULL COMMENT '用户ID',
    exam_id int NOT NULL COMMENT '试卷ID',
    start_time datetime NOT NULL COMMENT '开始时间',
    submit_time datetime COMMENT '提交时间',
    score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO user_info(uid,`nick_name`,achievement,`level`,job,register_time) VALUES
  (1001, '牛客1', 3200, 7, '算法', '2020-01-01 10:00:00'),
  (1002, '牛客2号', 2500, 6, '算法', '2020-01-01 10:00:00'),
  (1003, '牛客3号♂', 2200, 5, '算法', '2020-01-01 10:00:00');

INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES
  (9001, 'SQL', 'hard', 60, '2020-01-01 10:00:00'),
  (9002, 'SQL', 'hard', 80, '2020-01-01 10:00:00'),
  (9003, '算法', 'hard', 80, '2020-01-01 10:00:00'),
  (9004, 'PYTHON', 'medium', 70, '2020-01-01 10:00:00');
	
INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2020-01-01 09:01:01', '2020-01-01 09:21:59', 90),
(1002, 9001, '2020-01-20 10:01:01', null, null),
(1002, 9001, '2020-02-01 12:11:01', null, null),
(1003, 9001, '2020-03-01 19:01:01', null, null),
(1001, 9001, '2020-03-01 12:01:01', null, null),
(1002, 9001, '2020-03-01 12:01:01', '2020-03-01 12:41:01', 90),
(1002, 9001, '2020-05-02 19:01:01', '2020-05-02 19:32:00', 90),
(1001, 9002, '2020-01-02 19:01:01', '2020-01-02 19:59:01', 69),
(1001, 9002, '2020-02-02 12:01:01', '2020-02-02 12:20:01', 99),
(1002, 9002, '2020-02-02 12:01:01', null, null),
(1002, 9002, '2020-02-02 12:01:01', '2020-02-02 12:43:01', 81),
(1002, 9002, '2020-03-02 12:11:01', null, null),
(1001, 9001, '2020-01-02 10:01:01', '2020-01-02 10:31:01', 89),
(1001, 9002, '2020-01-01 12:11:01', null, null),
(1002, 9001, '2020-01-01 18:01:01', '2020-01-01 18:59:02', 90),
(1002, 9003, '2020-05-06 12:01:01', null, null),
(1001, 9002, '2020-05-05 18:01:01', null, null);
输出1

1002|202002|3|1
1002|202003|2|1
1002|202005|2|1

示例2
drop table if exists examination_info,user_info,exam_record;
CREATE TABLE examination_info (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
exam_id int UNIQUE NOT NULL COMMENT '试卷ID',
tag varchar(32) COMMENT '类别标签',
difficulty varchar(8) COMMENT '难度',
duration int NOT NULL COMMENT '时长',
release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

CREATE TABLE user_info (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int UNIQUE NOT NULL COMMENT '用户ID',
`nick_name` varchar(64) COMMENT '昵称',
achievement int COMMENT '成就值',
level int COMMENT '用户等级',
job varchar(32) COMMENT '职业方向',
register_time datetime COMMENT '注册时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

CREATE TABLE exam_record (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int NOT NULL COMMENT '用户ID',
exam_id int NOT NULL COMMENT '试卷ID',
start_time datetime NOT NULL COMMENT '开始时间',
submit_time datetime COMMENT '提交时间',
score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;

INSERT INTO user_info(uid,`nick_name`,achievement,`level`,job,register_time) VALUES
(1001, '牛客1', 3200, 7, '算法', '2020-01-01 10:00:00'),
(1002, '牛客2号', 2500, 6, '算法', '2020-01-01 10:00:00'),
(1003, '牛客3号', 2500, 6, '算法', '2020-01-01 10:00:00'),
(1004, '牛客4号', 2300, 5, '算法', '2020-01-01 10:00:00');

INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES
(9001, 'SQL', 'hard', 60, '2020-01-01 10:00:00'),
(9002, 'SQL', 'hard', 80, '2020-01-01 10:00:00'),
(9003, '算法', 'hard', 80, '2020-01-01 10:00:00'),
(9004, 'PYTHON', 'medium', 70, '2020-01-01 10:00:00');

INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2020-01-01 09:01:01', '2020-01-01 09:21:59', 90),
(1002, 9001, '2020-01-20 10:01:01', null, null),
(1002, 9001, '2020-02-01 12:11:01', null, null),
(1003, 9001, '2020-03-01 19:01:01', null, null),
(1004, 9001, '2020-03-01 19:01:01', null, null),
(1001, 9001, '2020-03-01 12:01:01', null, null),
(1002, 9001, '2020-03-01 12:01:01', '2020-03-01 12:41:01', 90),
(1002, 9001, '2020-05-02 19:01:01', '2020-05-02 19:32:00', 90),
(1001, 9002, '2020-01-02 19:01:01', '2020-01-02 19:59:01', 69),
(1001, 9002, '2020-02-02 12:01:01', '2020-02-02 12:20:01', 99),
(1002, 9002, '2020-02-02 12:01:01', null, null),
(1002, 9002, '2020-02-02 12:01:01', '2020-02-02 12:43:01', 81),
(1002, 9002, '2020-03-02 12:11:01', null, null),
(1001, 9001, '2020-01-02 10:01:01', '2020-01-02 10:31:01', 89),
(1001, 9002, '2020-01-01 12:11:01', null, null),
(1002, 9001, '2020-01-01 18:01:01', '2020-01-01 18:59:02', 90),
(1002, 9003, '2020-05-06 12:01:01', null, null),
(1001, 9002, '2020-05-05 18:01:01', null, null);

预期输出

1003| 202003 |1 | 0

思路
1.统计每次试卷作答记录的月份编号,按年月降序进行窗口连续排名:
dense_rank() over ( PARTITION BY uid ORDER BY DATE_FORMAT( start_time, '%Y%m' ) DESC ) months_rank
2.筛选月份排名小于等于3的记录(近三个月的):WHERE start_month_rank <= 3
3.筛选SQL试卷上未完成率较高的50%用户:
	(1)计算每个用户在SQL试卷上的未完成率百分比排名:
	(2)计算每个用户在SQL试卷上的未完成率:
	(3)筛选SQL试卷
4.统计百分比排名,按未完成率降序uid升序进行百分比排名:
PERCENT_RANK() over(ORDER BY incomplete_rate DESC, uid) as incomp_rate_rank
5.筛选未完成率较高的50%用户(因为按未完成率倒序了,所以<=):WHERE incomp_rate_rank <= 0.5
6.筛选67级用户
7.按用户ID、月份分组:GROUP BY uid, start_month
解题流程
1.各个用户对SQL试卷的未完成数、作答总数、未完成率如下
	SELECT
    uid,
    sum(IF(score IS NULL,1,0)) AS incomplete_cnt,
    count( uid ) AS total_cnt,
    sum(IF(score IS NULL,1,0))/ count( uid ) AS incomplete_rate 
    FROM examination_info
    INNER JOIN exam_record USING ( exam_id ) 
    WHERE tag = 'SQL' 
    GROUP BY uid 

2.未完成率 和 筛选67级用户
  SELECT
				uid
			FROM
				user_info
			INNER JOIN (
				SELECT
					uid,
					PERCENT_RANK () OVER (

						ORDER BY
							incomplete_rate DESC,
							uid
					) AS rank1
				FROM
					(
						SELECT
							uid,
							sum(IF(score IS NULL, 1, 0)) AS incomplete_cnt,
							count(uid) AS total_cnt,
							sum(IF(score IS NULL, 1, 0)) / count(uid) AS incomplete_rate
						FROM
							examination_info
						INNER JOIN exam_record USING (exam_id)
						WHERE
							tag = 'SQL'
						GROUP BY
							uid
					) t1
			) t2 USING (uid)
			WHERE
				`level` >= 6
			AND rank1 <= 0.5
    
3.试卷作答记录的近三个月
SELECT
  * 
 FROM
  (
  SELECT
   uid,
   DATE_FORMAT( start_time, '%Y%m' ) start_month,
   dense_rank() 
   over(PARTITION BY uid
    ORDER BY DATE_FORMAT(  start_time, '%Y%m') DESC
       )months_rank,
   start_time,
   score 
  FROM
   exam_record 
  ) t_record 
 WHERE
  months_rank <= 3 
题解
方法一:
SELECT
	uid,
	start_month,
	count(start_time) total_cnt,
	count(score) complete_cnt
FROM
	(
		SELECT *
		FROM
			(
			SELECT
			uid,
			DATE_FORMAT(start_time, '%Y%m') start_month,
			dense_rank()over (PARTITION BY uid ORDER BY
			DATE_FORMAT(start_time, '%Y%m') DESC) months_rank,
			start_time,
			score
			FROM
			exam_record
			) t_record
		WHERE
			months_rank <= 3
		AND uid IN (
			SELECT
				uid
			FROM
				user_info
			INNER JOIN (
	SELECT
	uid,
	PERCENT_RANK()OVER(ORDER BY incomplete_rate DESC,uid) AS rank1
	FROM
		(
	SELECT
	uid,
	sum(IF(score IS NULL, 1, 0)) AS incomplete_cnt,
	count(uid) AS total_cnt,
	sum(IF(score IS NULL, 1, 0)) / count(uid) AS incomplete_rate
	FROM
	examination_info
	INNER JOIN exam_record USING (exam_id)
	WHERE tag = 'SQL'
	GROUP BY uid
		) t1
	) t2 USING (uid)
	WHERE`level` >= 6 AND rank1 <= 0.5
		)
	) t3
GROUP BY
	uid,start_month
ORDER BY
	uid,start_month


方式二:
    SELECT 
    uid,
    start_month,
    COUNT(1) as exam_cnt, 
    COUNT(submit_time) as complete_cnt
    FROM 
    (
    SELECT
        uid, 
        submit_time,
        DATE_FORMAT(start_time, "%Y%m") as start_month,
        DENSE_RANK()
         over(ORDER BY DATE_FORMAT(start_time, "%Y%m")DESC) as start_month_rank -- 按作答月份降序编号
    FROM exam_record
	) as t_exam_record_month_rank
	WHERE start_month_rank <= 3
	AND uid IN (
    SELECT uid 
        FROM (
        SELECT uid,
            PERCENT_RANK() over(order BY incomp_rate DESC, uid) as incomp_rate_rank
        FROM 
            (
      SELECT 
      uid,
      1 - COUNT(submit_time)/COUNT(1) as incomp_rate -- 此人未完成率
      FROM exam_record
      WHERE exam_id in 
      (SELECT exam_id FROM examination_info WHERE tag='SQL')
            GROUP BY uid
        	) as t_exam_incom_rate -- 在SQL试卷上的未完成率
   		 )as t_exam_incom_rate_rank -- 在SQL试卷上的未完成率百分比排名
    WHERE incomp_rate_rank <= 0.5
) -- 在SQL试卷上未完成率最高的50%用户
AND uid IN (SELECT uid FROM user_info WHERE `level`>=6)
GROUP BY uid, start_month
ORDER BY uid, start_month;
拓展
PERCENT_RANK()函数计算分区或结果集中行的百分位数排名

这PERCENT_RANK()是一个窗口函数,
用于计算分区或结果集中行的百分位数。
以下显示了PERCENT_RANK()函数的语法:

PERCENT_RANK()
    OVER (
        PARTITION BY expr,...
        ORDER BY expr [ASC|DESC],...
    ) 
    
PERCENT_RANK()函数返回一个从01的数字。
对于指定的行,PERCENT_RANK()计算行的等级减1,
除以评估的分区或查询结果集中的行数减1(rank - 1) / (total_rows - 1) 


在此公式中,rank是指定行的等级,total_rows是要计算的行数。

PERCENT_RANK()对于分区或结果集中的第一行,
函数始终返回零。重复的列值将接收相同的PERCENT_RANK()值。

与其他窗口函数类似,PARTITION BY子句将行分配到分区中,
ORDER BY子句指定每个分区中行的逻辑顺序。
PERCENT_RANK()为每个有序分区独立计算函数。

两个PARTITION BYORDER BY子句都是可选项。
但是,它PERCENT_RANK()是一个顺序敏感函数,
因此,您应始终使用ORDER BY子句。


MySQL PERCENT_RANK() 函数示例

让我们创建一个名为新表productLineSales基础上的ordersorderDetails以及products从表中示例数据库

CREATE TABLE productLineSales
SELECT
    productLine,
    YEAR(orderDate) orderYear,
    quantityOrdered * priceEach orderValue
FROM
    orderDetails
        INNER JOIN
    orders USING (orderNumber)
        INNER JOIN
    products USING (productCode)
GROUP BY
    productLine ,
    YEAR(orderDate); 

productLineSales表存储销售数据的摘要,包括产品系列,订单年份和订单值。

+------------------+-----------+------------+
| productLine      | orderYear | orderValue |
+------------------+-----------+------------+
| Vintage Cars     |      2013 |    4080.00 |
| Classic Cars     |      2013 |    5571.80 |
| Trucks and Buses |      2013 |    3284.28 |
| Trains           |      2013 |    2770.95 |
| Ships            |      2013 |    5072.71 |
| Planes           |      2013 |    4825.44 |
| Motorcycles      |      2013 |    2440.50 |
| Classic Cars     |      2014 |    8124.98 |
| Vintage Cars     |      2014 |    2819.28 |
| Trains           |      2014 |    4646.88 |
| Ships            |      2014 |    4301.15 |
| Planes           |      2014 |    2857.35 |
| Motorcycles      |      2014 |    2598.77 |
| Trucks and Buses |      2014 |    4615.64 |
| Motorcycles      |      2015 |    4004.88 |
| Classic Cars     |      2015 |    5971.35 |
| Vintage Cars     |      2015 |    5346.50 |
| Trucks and Buses |      2015 |    6295.03 |
| Trains           |      2015 |    1603.20 |
| Ships            |      2015 |    3774.00 |
| Planes           |      2015 |    4018.00 |
+------------------+-----------+------------+
21 rows in set (0.02 sec)
PERCENT_RANK()在查询结果集上使用MySQL

以下查询按订单值查找每个产品系列的百分位数排名:

WITH t AS (
    SELECT
        productLine,
        SUM(orderValue) orderValue
    FROM
        productLineSales
    GROUP BY
        productLine
)
SELECT
    productLine,
    orderValue,
    ROUND(
       PERCENT_RANK() OVER (
          ORDER BY orderValue
       )
    ,2) percentile_rank
FROM
    t; 

在这个例子中:

  • 首先,我们使用公用表表达式按产品线汇总订单值。
  • 其次,我们用它PERCENT_RANK()来计算每种产品的订单价值的百分等级。此外,我们使用ROUND()函数将值舍入为2十进制,以获得更好的表示。

这是输出:

+------------------+------------+-----------------+
| productLine      | orderValue | percentile_rank |
+------------------+------------+-----------------+
| Trains           |    9021.03 |            0.00 |
| Motorcycles      |    9044.15 |            0.17 |
| Planes           |   11700.79 |            0.33 |
| Vintage Cars     |   12245.78 |            0.50 |
| Ships            |   13147.86 |            0.67 |
| Trucks and Buses |   14194.95 |            0.83 |
| Classic Cars     |   19668.13 |            1.00 |
+------------------+------------+-----------------+
7 rows in set (0.01 sec)

以下是输出中的一些分析:

  • 订单价值Trains并不比任何其他产品线更好,后者用零表示。
  • Vintage Cars 表现优于50%的其他产品。
  • Classic Cars 表现优于任何其他产品系列,因此其百分比等级为1或100%
percent_rank()在分区上使用MySQL

以下语句按年度中的订单值返回产品系列的百分位数排名:

SELECT
    productLine,
    orderYear,
    orderValue,
    ROUND(
    PERCENT_RANK()
    OVER (
        PARTITION BY orderYear
        ORDER BY orderValue
    ),2) percentile_rank
FROM
    productLineSales; 

这是输出:

+------------------+-----------+------------+-----------------+
| productLine      | orderYear | orderValue | percentile_rank |
+------------------+-----------+------------+-----------------+
| Motorcycles      |      2013 |    2440.50 |            0.00 |
| Trains           |      2013 |    2770.95 |            0.17 |
| Trucks and Buses |      2013 |    3284.28 |            0.33 |
| Vintage Cars     |      2013 |    4080.00 |            0.50 |
| Planes           |      2013 |    4825.44 |            0.67 |
| Ships            |      2013 |    5072.71 |            0.83 |
| Classic Cars     |      2013 |    5571.80 |            1.00 |
| Motorcycles      |      2014 |    2598.77 |            0.00 |
| Vintage Cars     |      2014 |    2819.28 |            0.17 |
| Planes           |      2014 |    2857.35 |            0.33 |
| Ships            |      2014 |    4301.15 |            0.50 |
| Trucks and Buses |      2014 |    4615.64 |            0.67 |
| Trains           |      2014 |    4646.88 |            0.83 |
| Classic Cars     |      2014 |    8124.98 |            1.00 |
| Trains           |      2015 |    1603.20 |            0.00 |
| Ships            |      2015 |    3774.00 |            0.17 |
| Motorcycles      |      2015 |    4004.88 |            0.33 |
| Planes           |      2015 |    4018.00 |            0.50 |
| Vintage Cars     |      2015 |    5346.50 |            0.67 |
| Classic Cars     |      2015 |    5971.35 |            0.83 |
| Trucks and Buses |      2015 |    6295.03 |            1.00 |
+------------------+-----------+------------+-----------------+
21 rows in set (0.01 sec)

在此示例中,我们按订单年划分产品线的订单值。将PERCENT_RANK()然后被施加到每个分区。

例如,2013年Vintage Cars 表现优于其他产品线的50%,而在2014年,船舶表现优于其他产品的50%。

在本教程中,您学习了如何使用MySQL PERCENT_RANK()函数计算分区或结果集中行的百分位数。

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值