描述
现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):
id | uid | exam_id | start_time | submit_time | score |
1 | 1006 | 9003 | 2021-09-07 10:01:01 | 2021-09-07 10:21:02 | 84 |
2 | 1006 | 9001 | 2021-09-01 12:11:01 | 2021-09-01 12:31:01 | 89 |
3 | 1006 | 9002 | 2021-09-06 10:01:01 | 2021-09-06 10:21:01 | 81 |
4 | 1005 | 9002 | 2021-09-05 10:01:01 | 2021-09-05 10:21:01 | 81 |
5 | 1005 | 9001 | 2021-09-05 10:31:01 | 2021-09-05 10:51:01 | 8 |
请计算在2021年至少有两天作答过试卷的人中,计算该年连续两次作答试卷的最大时间窗days_window,那么根据该年的历史规律他在days_window天里平均会做多少套试卷,按最大时间窗和平均做答试卷套数倒序排序。由示例数据结果输出如下:
uid | days_window | avg_exam_cnt |
1006 | 6 | 2.57 |
解释:用户1006分别在20210901、20210906、20210907作答过3次试卷,连续两次作答最大时间窗为6天(1号到6号),他1号到7号这7天里共做了3张试卷,平均每天3/7=0.428571张,那么6天里平均会做0.428571*6=2.57张试卷(保留两位小数);
用户1005在20210905做了两张试卷,但是只有一天的作答记录,过滤掉。
with cte1 as (
select
uid,date_format(start_time,'%Y%m%d') as act_time,
dense_rank() over(partition by uid order by date_format(start_time,'%Y%m%d') desc) as rnk
from
exam_record
where uid in(
select uid from exam_record
where left(start_time,4)=2021
group by uid
having count(distinct left(start_time,10))>=2)
and left(start_time,4)=2021
),
cte2 as (
select
uid,count(act_time)/(datediff(max(act_time),min(act_time))+1) as rate
from cte1
group by uid
),
cte3 as (
select
a.uid as uid,
round(max(datediff(a.act_time,b.act_time)+1),2) as days_window
from
cte1 a left join cte1 b
on a.uid=b.uid and a.rnk+1=b.rnk
group by a.uid
)
select
uid,days_window,round(days_window*rate,2) as avg_exam_cnt
from cte2 left join cte3 using(uid)
order by days_window desc,avg_exam_cnt desc
ps:在做连接查询的时候使用cte2 left join cte3没有问题,使用cte3 left join cte2会报错,有知道这是什么问题的友友可以在评论区留言,感谢😭