描述
现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分),示例数据如下:
id | uid | exam_id | start_time | submit_time | score |
1 | 1001 | 9001 | 2021-07-02 09:01:01 | 2021-07-02 09:21:01 | 80 |
2 | 1002 | 9001 | 2021-09-05 19:01:01 | 2021-09-05 19:40:01 | 81 |
3 | 1002 | 9002 | 2021-09-02 12:01:01 | (NULL) | (NULL) |
4 | 1002 | 9003 | 2021-09-01 12:01:01 | (NULL) | (NULL) |
5 | 1002 | 9001 | 2021-07-02 19:01:01 | 2021-07-02 19:30:01 | 82 |
6 | 1002 | 9002 | 2021-07-05 18:01:01 | 2021-07-05 18:59:02 | 90 |
7 | 1003 | 9002 | 2021-07-06 12:01:01 | (NULL) | (NULL) |
8 | 1003 | 9003 | 2021-09-07 10:01:01 | 2021-09-07 10:31:01 | 86 |
9 | 1004 | 9003 | 2021-09-06 12:01:01 | (NULL) | (NULL) |
10 | 1002 | 9003 | 2021-09-01 12:01:01 | 2021-09-01 12:31:01 | 81 |
11 | 1005 | 9001 | 2021-09-01 12:01:01 | 2021-09-01 12:31:01 | 88 |
12 | 1005 | 9002 | 2021-09-01 12:01:01 | 2021-09-01 12:31:01 | 88 |
13 | 1006 | 9002 | 2021-09-02 12:11:01 | 2021-09-02 12:31:01 | 89 |
还有一张试卷信息表examination_info(exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间),示例数据如下:
id | exam_id | tag | difficulty | duration | release_time |
1 | 9001 | SQL | hard | 60 | 2020-01-01 10:00:00 |
2 | 9002 | SQL | easy | 60 | 2020-02-01 10:00:00 |
3 | 9003 | 算法 | medium | 80 | 2020-08-02 10:00:00 |
请统计2021年每个未完成试卷作答数大于1的有效用户的数据(有效用户指完成试卷作答数至少为1且未完成数小于5),输出用户ID、未完成试卷作答数、完成试卷作答数、作答过的试卷tag集合,按未完成试卷数量由多到少排序。示例数据的输出结果如下:
uid | incomplete_cnt | complete_cnt | detail |
1002 | 2 | 4 | 2021-09-01:算法;2021-07-02:SQL;2021-09-02:SQL;2021-09-05:SQL;2021-07-05:SQL |
解释:2021年的作答记录中,除了1004,其他用户均满足有效用户定义,但只有1002未完成试卷数大于1,因此只输出1002,detail中是1002作答过的试卷{日期:tag}集合,日期和tag间用:连接,多元素间用;连接。
问题分解:
- 关联作答记录和试卷信息:left join examination_info on using(exam_id);(题中exam_record中的exam_id在examination_info均存在,所以用left join和inner join效果一样)
- 筛选2021年的记录:where year(start_time)=2021
- 获取各用户的tag,start_time及未完成标记和已完成标记,如果该作答记录交卷了则已完成标记为1,未完成标记为0,否则相反:if(submit_time is null, 1, null) as incomplete
- 按用户分组:group by uid
- 统计未完成试卷作答数和已完成试卷作答数:count(incomplete) as incomplete_cnt
- 统计作答过的tag集合:
- 对于每条作答tag,用:连接日期和tag:
concat_ws(':',date(start_time),tag)
- 对于一个人(组内)的多条作答,用;连接去重后的作答记录:
group_concat(distinct concat_ws(':',date(start_time), tag) separator ';')
- 对于每条作答tag,用:连接日期和tag:
- 筛选未完成试卷作答数大于1的有效用户:
having complete_cnt >= 1 and incomplete_cnt between 2 and 4
- 完成试卷作答数至少为1:complete_cnt > 0
- 未完成数小于5:incomplete_cnt < 5
- 未完成试卷作答数大于1:incomplete_cnt > 1
with cte as(
select
uid,exam_id,tag,start_time,
if(submit_time is null,0,1) as sub_num
from
exam_record left join examination_info
using(exam_id)
where left(start_time,4)=2021
)
select
uid,
(count(sub_num)-sum(sub_num)) as incomplete_cnt,
sum(sub_num) as complete_cnt,
group_concat(distinct concat_ws(':', date(start_time), tag) separator ';') as detail
from cte
group by uid
having incomplete_cnt between 2 and 4 and complete_cnt>0
order by incomplete_cnt desc