牛客SQL题 SQL126 平均活跃天数和月活人数
题目
用户在牛客试卷作答区作答记录存储在表exam_record中,内容如下:
exam_record表(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分)
(和原题的数据相比我加了一条数据)
请计算2021年每个月里试卷作答区用户平均月活跃天数avg_active_days和月度活跃人数mau,上面数据的示例输出如下:
month | avg_active_days | mau |
---|---|---|
202107 | 1.50 | 2 |
2021009 | 1.25 | 4 |
解释:2021年7月有2人活跃,共活跃了3天,其中1001活跃1天(2021-07-02),1002活跃2天(2021-07-02和2021-07-05,2021-07-02当天活跃两次),平均活跃天数1.5;2021年9月有4人活跃,共活跃了5天,平均活跃天数1.25,结果保留2位小数。
注:此处活跃指有交卷行为。
建表语句放在这里,方便大家测试
drop table if exists exam_record;
CREATE TABLE exam_record (
id int PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
uid int NOT NULL COMMENT '用户ID',
exam_id int NOT NULL COMMENT '试卷ID',
start_time datetime NOT NULL COMMENT '开始时间',
submit_time datetime COMMENT '提交时间',
score tinyint COMMENT '得分'
)CHARACTER SET utf8 COLLATE utf8_general_ci;
INSERT INTO exam_record(uid,exam_id,start_time,submit_time,score) VALUES
(1001, 9001, '2021-07-02 09:01:01', '2021-07-02 09:21:01', 80),
(1002, 9001, '2021-09-05 19:01:01', '2021-09-05 19:40:01', 81),
(1002, 9002, '2021-09-02 12:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', null, null),
(1002, 9001, '2021-07-02 19:01:01', '2021-07-02 19:30:01', 82),
(1002, 9002, '2021-07-05 18:01:01', '2021-07-05 18:59:02', 90),
(1003, 9002, '2021-07-06 12:01:01', null, null),
(1003, 9003, '2021-09-07 10:01:01', '2021-09-07 10:31:01', 86),
(1004, 9003, '2021-09-06 12:01:01', null, null),
(1002, 9003, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 81),
(1005, 9001, '2021-09-01 12:01:01', '2021-09-01 12:31:01', 88),
(1006, 9002, '2021-09-02 12:11:01', '2021-09-02 12:31:01', 89),
(1007, 9002, '2020-09-02 12:11:01', '2020-09-02 12:31:01', 89),
(1002, 9003, '2021-07-02 20:01:01', '2021-07-02 20:30:01', 60);
分析
根据题目要求需要筛选出2021年的记录;活跃指有交卷行为,所以分数不为null
先写个大致结构
select
as month,
as avg_active_days,
as mau
from exam_record
where year(submit_time) = 2021 and submit_time is not null
查询结果是2021年每个月里试卷作答区用户平均月活跃天数avg_active_days和月度活跃人数mau,所以要分组group by
,查询结果第一列是年月可以使用date_format
函数
select
date_format(submit_time, '%Y%m') as month,
as avg_active_days,
as mau
from exam_record
where year(submit_time) = 2021 and submit_time is not null
group by date_format(submit_time, '%Y%m')
查询结果第三列是月度活跃人数,count
去重后的uid即可
select
date_format(submit_time, '%Y%m') as month,
as avg_active_days,
count(distinct uid) as mau
from exam_record
where year(submit_time) = 2021 and submit_time is not null
group by date_format(submit_time, '%Y%m')
剩下第二列用户平均月活跃天数,需要知道月度活跃人数和共计活跃天数。
月度活跃人数结果的第三列count(distinct uid)
可以求出
共计活跃天数可以使用uid和时间两个约束来求
count(distinct uid, date_format(submit_time, '%Y%m%d'))
原本满足条件的有4条数据:
(1001, 2021-07-02)
(1002, 2021-07-02)
(1002, 2021-07-05)
(1002, 2021-07-02)
distinct
的作用是对数据库表中一个或者多个字段重复的数据进行过滤,只返回其中的一条数据给用户,distinct
uid和年月日之后剩下3条数据:
(1001, 2021-07-02)
(1002, 2021-07-02)
(1002, 2021-07-05)
用户平均月活跃天数 = 共计活跃天数 / 月度活跃人数
结果保留两位小数
round(count(distinct uid, date_format(submit_time, '%Y%m%d'))/count(distinct uid), 2)
最终代码:
select
date_format(submit_time, '%Y%m') as month,
round(count(distinct uid, date_format(submit_time, '%Y%m%d'))/count(distinct uid), 2) as avg_active_days,
count(distinct uid) as mau
from exam_record
where year(submit_time) = 2021 and submit_time is not null
group by date_format(submit_time, '%Y%m')