1.试卷完成数同比2020年的增长率及排名变化
现有试卷信息表examination_info(exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间):
id | exam_id | tag | difficulty | duration | release_time |
---|---|---|---|---|---|
1 | 9001 | SQL | hard | 60 | 2021-01-01 10:00:00 |
2 | 9002 | C++ | hard | 80 | 2021-01-01 10:00:00 |
3 | 9003 | 算法 | hard | 80 | 2021-01-01 10:00:00 |
4 | 9004 | PYTHON | medium | 70 | 2021-01-01 10:00:00 |
试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):
id | uid | exam_id | start_time | submit_time | score |
---|---|---|---|---|---|
1 | 1001 | 9001 | 2020-08-02 10:01:01 | 2020-08-02 10:31:01 | 89 |
2 | 1002 | 9001 | 2020-04-01 18:01:01 | 2020-04-01 18:59:02 | 90 |
3 | 1001 | 9001 | 2020-04-01 09:01:01 | 2020-04-01 09:21:59 | 80 |
5 | 1002 | 9001 | 2021-03-02 19:01:01 | 2021-03-02 19:32:00 | 20 |
8 | 1003 | 9001 | 2021-05-02 12:01:01 | 2021-05-02 12:31:01 | 98 |
13 | 1003 | 9001 | 2020-01-02 10:01:01 | 2020-01-02 10:31:01 | 89 |
9 | 1001 | 9002 | 2020-02-02 12:01:01 | 2020-02-02 12:20:01 | 99 |
10 | 1002 | 9002 | 2021-02-02 12:01:01 | 2020-02-02 12:43:01 | 81 |
11 | 1001 | 9002 | 2020-01-02 19:01:01 | 2020-01-02 19:59:01 | 69 |
16 | 1002 | 9002 | 2020-02-02 12:01:01 | ||
17 | 1002 | 9002 | 2020-03-02 12:11:01 | ||
18 | 1001 | 9002 | 2021-05-05 18:01:01 | ||
4 | 1002 | 9003 | 2021-01-20 10:01:01 | 2021-01-20 10:10:01 | 81 |
6 | 1001 | 9003 | 2021-04-02 19:01:01 | 2021-04-02 19:40:01 | 89 |
15 | 1002 | 9003 | 2021-01-01 18:01:01 | 2021-01-01 18:59:02 | 90 |
7 | 1004 | 9004 | 2020-05-02 12:01:01 | 2020-05-02 12:20:01 | 99 |
12 | 1001 | 9004 | 2021-09-02 12:11:01 | ||
14 | 1002 | 9004 | 2020-01-01 12:11:01 | 2020-01-01 12:31:01 | 83 |
请计算2021年上半年各类试卷的做完次数相比2020年上半年同期的增长率(百分比格式,保留1位小数),以及做完次数排名变化,按增长率和21年排名降序输出。
由示例数据结果输出如下:
tag | exam_cnt_20 | exam_cnt_21 | growth_rate | exam_cnt_rank_20 | exam_cnt_rank_21 | rank_delta |
---|---|---|---|---|---|---|
SQL | 3 | 2 | -33.3% | 1 | 2 | 1 |
解释:2020年上半年有3个tag有作答完成的记录,分别是C++、SQL、PYTHON,它们被做完的次数分别是3、3、2,做完次数排名为1、1(并列)、3;
2021年上半年有2个tag有作答完成的记录,分别是算法、SQL,它们被做完的次数分别是3、2,做完次数排名为1、2;具体如下:
tag | start_year | exam_cnt | exam_cnt_rank |
---|---|---|---|
C++ | 2020 | 3 | 1 |
SQL | 2020 | 3 | 1 |
PYTHON | 2020 | 2 | 3 |
算法 | 2021 | 3 | 1 |
SQL | 2021 | 2 | 2 |
因此能输出同比结果的tag只有SQL,从2020到2021年,做完次数3=>2,减少33.3%(保留1位小数);排名1=>2,后退1名。
示例1
drop table if exists examination_info,exam_record;
CREATE TABLE examination_info (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
exam_id int UNIQUE NOT NULL COMMENT '试卷ID',
tag varchar(32) COMMENT '类别标签',
difficulty varchar(8) COMMENT '难度',
duration int NOT NULL COMMENT '时长',
release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
CREATE TABLE exam_record (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int NOT NULL COMMENT '用户ID',
exam_id int NOT NULL COMMENT '试卷ID',
start_time datetime NOT NULL COMMENT '开始时间',
submit_time datetime COMMENT '提交时间',
score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES
(9001, 'SQL', 'hard', 60, '2020-01-01 10:00:00'),
(9002, 'C++', 'hard', 80, '2020-01-01 10:00:00'),
(9003, '算法', 'hard', 80, '2020-01-01 10:00:00'),
(9004, 'PYTHON', 'medium', 70, '2020-01-01 10:00:00');
INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2020-08-02 10:01:01', '2020-08-02 10:31:01', 89),
(1002, 9001, '2020-04-01 18:01:01', '2020-04-01 18:59:02', 90),
(1001, 9001, '2020-04-01 09:01:01', '2020-04-01 09:21:59', 80),
(1002, 9003, '2021-01-20 10:01:01', '2021-01-20 10:10:01', 81),
(1002, 9001, '2021-03-02 19:01:01', '2021-03-02 19:32:00', 20),
(1001, 9003, '2021-04-02 19:01:01', '2021-04-02 19:40:01', 89),
(1004, 9004, '2020-05-02 12:01:01', '2020-05-02 12:20:01', 99),
(1003, 9001, '2021-05-02 12:01:01', '2021-05-02 12:31:01', 98),
(1001, 9002, '2020-02-02 12:01:01', '2020-02-02 12:20:01', 99),
(1002, 9002, '2020-02-02 12:01:01', '2020-02-02 12:43:01', 81),
(1001, 9002, '2020-01-02 19:01:01', '2020-01-02 19:59:01', 69),
(1001, 9004, '2021-09-02 12:11:01', null, null),
(1003, 9001, '2020-01-02 10:01:01', '2020-01-02 10:31:01', 89),
(1002, 9004, '2020-01-01 12:11:01', '2020-01-01 12:31:01', 83),
(1002, 9003, '2021-01-01 18:01:01', '2021-01-01 18:59:02', 90),
(1002, 9002, '2020-02-02 12:01:01', null, null),
(1002, 9002, '2020-03-02 12:11:01', null, null),
(1001, 9002, '2021-05-05 18:01:01', null, null);
输出
SQL|3|2|-33.3%|1|2|1
思路
1.找到所有tag在2020,2021上半年的完成数
2.计算growth_rate,并生成各tag完成数排名
3.取出所有需要的字段,并计算排名变化(2021-2020年,无需取绝对值)
4.筛选出2020和2021年均有完成记录的tag,并按题目要求排序
题解
方式一:
SELECT #第三步:取出所有需要的字段,并计算排名变化(2021-2020年,无需取绝对值)
tag,
exam_cnt_20,
exam_cnt_21,
growth_rate,
exam_cnt_rank_20,
exam_cnt_rank_21,
CAST(exam_cnt_rank_21 as SIGNED) - CAST(exam_cnt_rank_20 AS SIGNED) rank_delta
FROM (
SELECT #第二步:计算growth_rate,并生成各tag完成数排名
tag,
exam_cnt_20,
exam_cnt_21,
IFNULL(
CONCAT(
ROUND((exam_cnt_21 - exam_cnt_20)/exam_cnt_20 * 100, 1),'%'),0) growth_rate,
RANK() OVER (ORDER BY exam_cnt_20 DESC) exam_cnt_rank_20,
RANK() OVER (ORDER BY exam_cnt_21 DESC) exam_cnt_rank_21
FROM (
SELECT #第一步:找到所有tag在2020,2021上半年的完成数
tag,
SUM(IF(DATE_FORMAT(submit_time, '%Y-%m')
BETWEEN '2020-01' AND '2020-06', 1, 0)) exam_cnt_20,
SUM(IF(DATE_FORMAT(submit_time, '%Y-%m')
BETWEEN '2021-01' AND '2021-06', 1, 0)) exam_cnt_21
FROM exam_record
LEFT JOIN examination_info ei USING(exam_id)
GROUP BY 1) t1
) t2
# 第四步:筛选出2020和2021年均有完成记录的tag,并按题目要求排序
WHERE exam_cnt_20 != 0 AND exam_cnt_21 != 0
ORDER BY 4 desc, 6 desc
方式二:
select
tag,
exam_cnt_20,
exam_cnt_21,
concat(round((exam_cnt_21-exam_cnt_20)/exam_cnt_20*100,1),"%") growth_rate,
exam_cnt_rank_20,
exam_cnt_rank_21,
cast(exam_cnt_rank_21 as SIGNED)-cast(exam_cnt_rank_20 as SIGNED) rank_delta
from
(
select #根据去年获取下一年的试卷次数和排名
start_year,
tag,
exam_cnt as exam_cnt_20,
lead(exam_cnt)over(partition by tag order by start_year) exam_cnt_21,
exam_cnt_rank as exam_cnt_rank_20,
lead(exam_cnt_rank)over(partition by tag order by start_year)exam_cnt_rank_21
from
(
select #每类试卷2020/2021完成数及排名统计
start_year,
tag,
exam_cnt,
rank()over(partition by start_year order by exam_cnt desc) as exam_cnt_rank
from
(
select #获取每年上半年试卷次数
date_format(start_time,"%Y") start_year,
tag,
count(*)exam_cnt
from examination_info
inner join exam_record
using(exam_id)
where score is not null and date_format(start_time,"%m")<=6
group by tag,date_format(start_time,"%Y")
)t
group by tag,start_year
)t1
)t2
WHERE exam_cnt_21 IS NOT NULL
ORDER BY growth_rate DESC, exam_cnt_rank_21 DESC;
拓展
Cast(字段名 as 转换的类型 ),其中类型可以为:
CHAR[(N)] 字符型
DATE 日期型
DATETIME 日期和时间型
DECIMAL float型
SIGNED int
TIME 时间型
示例1:
作用: 四舍五入
SELECT CAST('9.0' AS DECIMAL) FROM DUAL; -- 9
SELECT CAST('9.5' AS DECIMAL) FROM DUAL; -- 10
SELECT CAST('9.6' AS DECIMAL) FROM DUAL; -- 10
SELECT CAST('9.3' AS DECIMAL) FROM DUAL; -- 9
2.对试卷得分做min-max归一化
现有试卷信息表examination_info(exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间):
id | exam_id | tag | difficulty | duration | release_time |
---|---|---|---|---|---|
1 | 9001 | SQL | hard | 60 | 2020-01-01 10:00:00 |
2 | 9002 | C++ | hard | 80 | 2020-01-01 10:00:00 |
3 | 9003 | 算法 | hard | 80 | 2020-01-01 10:00:00 |
4 | 9004 | PYTHON | medium | 70 | 2020-01-01 10:00:00 |
试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分)
id | uid | exam_id | start_time | submit_time | score |
---|---|---|---|---|---|
6 | 1003 | 9001 | 2020-01-02 12:01:01 | 2020-01-02 12:31:01 | 68 |
9 | 1001 | 9001 | 2020-01-02 10:01:01 | 2020-01-02 10:31:01 | 89 |
1 | 1001 | 9001 | 2020-01-01 09:01:01 | 2020-01-01 09:21:59 | 90 |
12 | 1002 | 9002 | 2021-05-05 18:01:01 | (NULL) | (NULL) |
3 | 1004 | 9002 | 2020-01-01 12:01:01 | 2020-01-01 12:11:01 | 60 |
2 | 1003 | 9002 | 2020-01-01 19:01:01 | 2020-01-01 19:30:01 | 75 |
7 | 1001 | 9002 | 2020-01-02 12:01:01 | 2020-01-02 12:43:01 | 81 |
10 | 1002 | 9002 | 2020-01-01 12:11:01 | 2020-01-01 12:31:01 | 83 |
4 | 1003 | 9002 | 2020-01-01 12:01:01 | 2020-01-01 12:41:01 | 90 |
5 | 1002 | 9002 | 2020-01-02 19:01:01 | 2020-01-02 19:32:00 | 90 |
11 | 1002 | 9004 | 2021-09-06 12:01:01 | (NULL) | (NULL) |
8 | 1001 | 9005 | 2020-01-02 12:11:01 | (NULL) | (NULL) |
在物理学及统计学数据计算时,有个概念叫min-max标准化,也被称为离差标准化,是对原始数据的线性变换,使结果值映射到[0 - 1]之间。
转换函数为:
请你将用户作答高难度试卷的得分在每份试卷作答记录内执行min-max归一化后缩放到[0,100]区间,并输出用户ID、试卷ID、归一化后分数平均值;最后按照试卷ID升序、归一化分数降序输出。(注:得分区间默认为[0,100],如果某个试卷作答记录中只有一个得分,那么无需使用公式,归一化并缩放后分数仍为原分数)。
由示例数据结果输出如下:
uid | exam_id | avg_new_score |
---|---|---|
1001 | 9001 | 98 |
1003 | 9001 | 0 |
1002 | 9002 | 88 |
1003 | 9002 | 75 |
1001 | 9002 | 70 |
1004 | 9002 | 0 |
解释:高难度试卷有9001、9002、9003;
作答了9001的记录有3条,分数分别为68、89、90,按给定公式归一化后分数为:0、95、100,而后两个得分都是用户1001作答的,因此用户1001对试卷9001的新得分为(95+100)/2≈98(只保留整数部分),用户1003对于试卷9001的新得分为0。最后结果按照试卷ID升序、归一化分数降序输出。
示例1
drop table if exists examination_info,exam_record;
CREATE TABLE examination_info (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
exam_id int UNIQUE NOT NULL COMMENT '试卷ID',
tag varchar(32) COMMENT '类别标签',
difficulty varchar(8) COMMENT '难度',
duration int NOT NULL COMMENT '时长',
release_time datetime COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;
CREATE TABLE exam_record (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int NOT NULL COMMENT '用户ID',
exam_id int NOT NULL COMMENT '试卷ID',
start_time datetime NOT NULL COMMENT '开始时间',
submit_time datetime COMMENT '提交时间',
score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO examination_info(exam_id,tag,difficulty,duration,release_time) VALUES
(9001, 'SQL', 'hard', 60, '2020-01-01 10:00:00'),
(9002, 'C++', 'hard', 80, '2020-01-01 10:00:00'),
(9003, '算法', 'hard', 80, '2020-01-01 10:00:00'),
(9004, 'PYTHON', 'medium', 70, '2020-01-01 10:00:00'),
(9005, 'WEB', 'hard', 80, '2020-01-01 10:00:00'),
(9006, 'PYTHON', 'hard', 80, '2020-01-01 10:00:00'),
(9007, 'web', 'hard', 80, '2020-01-01 10:00:00'),
(9008, 'Web', 'medium', 70, '2020-01-01 10:00:00'),
(9009, 'WEB', 'medium', 70, '2020-01-01 10:00:00'),
(9010, 'SQL', 'medium', 70, '2020-01-01 10:00:00');
INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2020-01-01 09:01:01', '2020-01-01 09:21:59', 90),
(1003, 9002, '2020-01-01 19:01:01', '2020-01-01 19:30:01', 75),
(1004, 9002, '2020-01-01 12:01:01', '2020-01-01 12:11:01', 60),
(1003, 9002, '2020-01-01 12:01:01', '2020-01-01 12:41:01', 90),
(1002, 9002, '2020-01-02 19:01:01', '2020-01-02 19:32:00', 90),
(1003, 9001, '2020-01-02 12:01:01', '2020-01-02 12:31:01', 68),
(1001, 9002, '2020-01-02 12:01:01', '2020-01-02 12:43:01', 81),
(1001, 9005, '2020-01-02 12:11:01', null, null),
(1001, 9001, '2020-01-02 10:01:01', '2020-01-02 10:31:01', 89),
(1002, 9002, '2020-01-01 12:11:01', '2020-01-01 12:31:01', 83),
(1002, 9004, '2021-09-06 12:01:01', null, null),
(1002, 9002, '2021-05-05 18:01:01', null, null);
说明:分数归一化
假设集合A 收入的最小值和最大值分别为 12,000 美元和 98,000 美元。我们想将收入映射到范围 [1, 100]。那么73,600 美元映射到多少?
思路
1.统计每次高难度试卷被完成的分数以及该卷的最高最低分
2.关联试卷作答表和试卷信息表:exam_record JOIN examination_info USING(exam_id)
3.筛选做完了高难度的记录:WHERE score IS NOT NULL AND difficulty="hard"
4.按用户ID和试卷ID分组:GROUP BY uid, exam_id
5.计算归一化后分数,需分情况讨论:
如果最高分和最低分相等,即只有一个分值:直接返回原分值
否则归一化后取整:ROUND(AVG((score - min_score) / (max_score - min_score) * 100))
题解
方式一:
select
uid,
exam_id,
if(is_score=1,mix_score,
round(avg(((score-mix_score)/(max_score-mix_score))*100),0)) avg_new_score
from exam_record
inner join
(
select exam_id, min(score) mix_score,max(score) max_score, if(min(score)=max(score),1,0) is_score
from exam_record
inner join examination_info
using(exam_id)
where difficulty='hard' and score IS NOT NULL
group by exam_id
) t1
using(exam_id)
where score is not null
group by uid,exam_id
order by exam_id,avg_new_score desc
方式二:
SELECT uid,exam_id,
ROUND(SUM(max_min)/COUNT(max_min),0) avg_new_score
FROM (
SELECT exam_id,uid,score,
IF(min_x=max_x,score,(score-min_x)*100/(max_x-min_x))max_min/*如果某试卷只有一个得分*/
FROM (
SELECT uid,a.exam_id,score,
MIN(score) OVER(PARTITION BY exam_id) min_x, #求每类试卷的得分最小值
MAX(score)OVER(PARTITION BY exam_id) max_x #求每类试卷的得分最大值
FROM exam_record a
LEFT JOIN examination_info b ON a.exam_id=b.exam_id
WHERE difficulty='hard' #难度为'hard'
AND score IS NOT NULL #分数不为空
)t1
)t2
GROUP BY exam_id,uid
ORDER BY exam_id,avg_new_score DESC;#按照试卷ID升序,评价分降序排序
3.每份试卷每月作答数和截止当月的作答总数
现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):
id | uid | exam_id | start_time | submit_time | score |
---|---|---|---|---|---|
1 | 1001 | 9001 | 2020-01-01 09:01:01 | 2020-01-01 09:21:59 | 90 |
2 | 1002 | 9001 | 2020-01-20 10:01:01 | 2020-01-20 10:10:01 | 89 |
3 | 1002 | 9001 | 2020-02-01 12:11:01 | 2020-02-01 12:31:01 | 83 |
4 | 1003 | 9001 | 2020-03-01 19:01:01 | 2020-03-01 19:30:01 | 75 |
5 | 1004 | 9001 | 2020-03-01 12:01:01 | 2020-03-01 12:11:01 | 60 |
6 | 1003 | 9001 | 2020-03-01 12:01:01 | 2020-03-01 12:41:01 | 90 |
7 | 1002 | 9001 | 2020-05-02 19:01:01 | 2020-05-02 19:32:00 | 90 |
8 | 1001 | 9002 | 2020-01-02 19:01:01 | 2020-01-02 19:59:01 | 69 |
9 | 1004 | 9002 | 2020-02-02 12:01:01 | 2020-02-02 12:20:01 | 99 |
10 | 1003 | 9002 | 2020-02-02 12:01:01 | 2020-02-02 12:31:01 | 68 |
11 | 1001 | 9002 | 2020-02-02 12:01:01 | 2020-02-02 12:43:01 | 81 |
12 | 1001 | 9002 | 2020-03-02 12:11:01 | (NULL) | (NULL) |
请输出每份试卷每月作答数和截止当月的作答总数。
示例1
drop table if exists exam_record;
CREATE TABLE exam_record (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int NOT NULL COMMENT '用户ID',
exam_id int NOT NULL COMMENT '试卷ID',
start_time datetime NOT NULL COMMENT '开始时间',
submit_time datetime COMMENT '提交时间',
score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2020-01-01 09:01:01', '2020-01-01 09:21:59', 90),
(1002, 9001, '2020-01-20 10:01:01', '2020-01-20 10:10:01', 89),
(1002, 9001, '2020-02-01 12:11:01', '2020-02-01 12:31:01', 83),
(1003, 9001, '2020-03-01 19:01:01', '2020-03-01 19:30:01', 75),
(1004, 9001, '2020-03-01 12:01:01', '2020-03-01 12:11:01', 60),
(1003, 9001, '2020-03-01 12:01:01', '2020-03-01 12:41:01', 90),
(1002, 9001, '2020-05-02 19:01:01', '2020-05-02 19:32:00', 90),
(1001, 9002, '2020-01-02 19:01:01', '2020-01-02 19:59:01', 69),
(1004, 9002, '2020-02-02 12:01:01', '2020-02-02 12:20:01', 99),
(1003, 9002, '2020-02-02 12:01:01', '2020-02-02 12:31:01', 68),
(1001, 9002, '2020-02-02 12:01:01', '2020-02-02 12:43:01', 81),
(1001, 9002, '2020-03-02 12:11:01', null, null);
输出
9001|202001|2|2
9001|202002|1|3
9001|202003|3|6
9001|202005|1|7
9002|202001|1|1
9002|202002|3|4
9002|202003|1|5
思路
1.求每份试卷每个月的作答数,要以exam_id,开始作答时间的月份进行分组。直接获取试卷ID、作答月份。
2.对每组开始作答时间计数得到每份试卷每个月的作答数
3.以试卷分组,月份递增获取每份试卷每个月的累计和。sum() over() as cum_exam_cnt
题解
方式一:
select
exam_id,
start_month,
month_cnt,
SUM(month_cnt) over(
partition by exam_id
order by start_month) as cum_exam_cnt
from
(
select exam_id,
DATE_FORMAT(start_time, "%Y%m")as start_month,
count(start_time) month_cnt
FROM exam_record
group by exam_id, DATE_FORMAT(start_time, "%Y%m")-- 每份试卷被作答月份
)t
方式二:
SELECT DISTINCT
exam_id,
DATE_FORMAT(start_time,'%Y%m') start_month,
count(start_time) over(
partition by exam_id,DATE_FORMAT(start_time,'%Y%m')
) month_cnt,
count(start_time) over(
partition by exam_id
order by DATE_FORMAT(start_time,'%Y%m')
) cum_exam_cnt
FROM
exam_record
4. 每月及截止当月的答题情况
现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):
id | uid | exam_id | start_time | submit_time | score |
---|---|---|---|---|---|
1 | 1001 | 9001 | 2020-01-01 09:01:01 | 2020-01-01 09:21:59 | 90 |
2 | 1002 | 9001 | 2020-01-20 10:01:01 | 2020-01-20 10:10:01 | 89 |
3 | 1002 | 9001 | 2020-02-01 12:11:01 | 2020-02-01 12:31:01 | 83 |
4 | 1003 | 9001 | 2020-03-01 19:01:01 | 2020-03-01 19:30:01 | 75 |
5 | 1004 | 9001 | 2020-03-01 12:01:01 | 2020-03-01 12:11:01 | 60 |
6 | 1003 | 9001 | 2020-03-01 12:01:01 | 2020-03-01 12:41:01 | 90 |
7 | 1002 | 9001 | 2020-05-02 19:01:01 | 2020-05-02 19:32:00 | 90 |
8 | 1001 | 9002 | 2020-01-02 19:01:01 | 2020-01-02 19:59:01 | 69 |
9 | 1004 | 9002 | 2020-02-02 12:01:01 | 2020-02-02 12:20:01 | 99 |
10 | 1003 | 9002 | 2020-02-02 12:01:01 | 2020-02-02 12:31:01 | 68 |
11 | 1001 | 9002 | 2020-01-02 19:01:01 | 2020-02-02 12:43:01 | 81 |
12 | 1001 | 9002 | 2020-03-02 12:11:01 | (NULL) | (NULL) |
请输出自从有用户作答记录以来,每月的试卷作答记录中月活用户数、新增用户数、截止当月的单月最大新增用户数、截止当月的累积用户数。结果按月份升序输出。
由示例数据结果输出如下:
start_month | mau | month_add_uv | max_month_add_uv | cum_sum_uv |
---|---|---|---|---|
202001 | 2 | 2 | 2 | 2 |
202002 | 4 | 2 | 2 | 4 |
202003 | 3 | 0 | 2 | 4 |
202005 | 1 | 0 | 2 | 4 |
说明
month | 1001 | 1002 | 1003 | 1004 |
---|---|---|---|---|
202001 | 1 | 1 | ||
202002 | 1 | 1 | 1 | 1 |
202003 | 1 | 1 | 1 | |
202005 | 1 |
由上述矩阵可以看出,2020年1月有2个用户活跃(mau=2),当月新增用户数为2;
2020年2月有4个用户活跃,当月新增用户数为2,最大单月新增用户数为2,当前累积用户数为4。
实例1
drop table if exists exam_record;
CREATE TABLE exam_record (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int NOT NULL COMMENT '用户ID',
exam_id int NOT NULL COMMENT '试卷ID',
start_time datetime NOT NULL COMMENT '开始时间',
submit_time datetime COMMENT '提交时间',
score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2020-01-01 09:01:01', '2020-01-01 09:21:59', 90),
(1002, 9001, '2020-01-20 10:01:01', '2020-01-20 10:10:01', 89),
(1002, 9001, '2020-02-01 12:11:01', '2020-02-01 12:31:01', 83),
(1003, 9001, '2020-03-01 19:01:01', '2020-03-01 19:30:01', 75),
(1004, 9001, '2020-03-01 12:01:01', '2020-03-01 12:11:01', 60),
(1003, 9001, '2020-03-01 12:01:01', '2020-03-01 12:41:01', 90),
(1002, 9001, '2020-05-02 19:01:01', '2020-05-02 19:32:00', 90),
(1001, 9002, '2020-01-02 19:01:01', '2020-01-02 19:59:01', 69),
(1004, 9002, '2020-02-02 12:01:01', '2020-02-02 12:20:01', 99),
(1003, 9002, '2020-02-02 12:01:01', '2020-02-02 12:31:01', 68),
(1001, 9002, '2020-02-02 12:01:01', '2020-02-02 12:43:01', 81),
(1001, 9002, '2020-03-02 12:11:01', null, null);
输出
202001|2|2|2|2
202002|4|2|2|4
202003|3|0|2|4
202005|1|0|2|4
思路
1.统计每月用户第一次出现的月份 min()over()
2.统计用户作答的次数和最大的用户作答数
count(distinct uid) as mau,
count(distinct case when start_month=first_month then uid else null end)as month_add_uv
3.计算用户作答月份最大的数和截止当月的累积用户数
max(month_add_uv)over(order by start_month) max_month_add_uv,
sum(month_add_uv)over(order by start_month) cum_sum_uv
题解
select
start_month,mau,month_add_uv,
max(month_add_uv)over(order by start_month) max_month_add_uv,
sum(month_add_uv)over(order by start_month) cum_sum_uv
from (
select
start_month,
count(distinct uid) as mau,
count(distinct case
when start_month=first_month then uid
else null end) as month_add_uv
from (
select
uid,
date_format(start_time,"%Y%m")as start_month,
min(date_format(start_time,"%Y%m"))
over(partition by uid
order by date_format(start_time,"%Y%m"))first_month
from exam_record
)t
group by start_month
)t1
5.统计有未完成状态的试卷的未完成数和未完成率
现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分),数据如下:
id | uid | exam_id | start_time | submit_time | score |
---|---|---|---|---|---|
1 | 1001 | 9001 | 2020-01-02 09:01:01 | 2020-01-02 09:21:01 | 80 |
2 | 1001 | 9001 | 2021-05-02 10:01:01 | 2021-05-02 10:30:01 | 81 |
3 | 1001 | 9001 | 2021-09-02 12:01:01 | (NULL) | (NULL) |
请统计有未完成状态的试卷的未完成数incomplete_cnt和未完成率incomplete_rate。由示例数据结果输出如下:
exam_id | incomplete_cnt | complete_rate |
---|---|---|
9001 | 1 | 0.333 |
解释:试卷9001有3次被作答的记录,其中两次完成,1次未完成,因此未完成数为1,未完成率为0.333(保留3位小数)
示例1
drop table if exists exam_record;
CREATE TABLE exam_record (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int NOT NULL COMMENT '用户ID',
exam_id int NOT NULL COMMENT '试卷ID',
start_time datetime NOT NULL COMMENT '开始时间',
submit_time datetime COMMENT '提交时间',
score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2020-01-02 09:01:01', '2020-01-02 09:21:01', 80),
(1001, 9001, '2021-05-02 10:01:01', '2021-05-02 10:30:01', 81),
(1001, 9001, '2021-09-02 12:01:01', null, null);
题解
select exam_id,
sum(if(submit_time is null,1,0))incomplete_cnt,
round(sum(if(submit_time is null,1,0))/count(exam_id),3) complete_rate
from exam_record
group by exam_id
having incomplete_cnt>0