1. 每月及截止当月的答题情况
【题目】:
现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分)
请输出自从有用户作答记录以来,每月的试卷作答记录中月活用户数、新增用户数、截止当月的单月最大新增用户数、截止当月的累积用户数:
【解题思路】:
Keywords:活跃用户数、新增用户数、截止当月单月新增最大、截止当月的累积用户数
- 活跃用户数:每月有提交记录的用户
COUNT(DISTINCT uid) AS mau
- 新增用户数:到该月为用户最早记录所在月
# 最早记录
MIN(DATE_FORMAT(start_time,'%Y%m')) OVER(PARTITION BY uid ORDER BY DATE_FORMAT(start_time,'%Y%m')) AS first_month
# 最早记录等于开始作答即为该月新增
COUNT(DISTINCT CASE WHEN first_month = start_time THEN uid ELSE Null END) AS month_add_uv
- 截止当月单月新增最大和累积:窗口函数的排序参数,不需要分组参数加
partition by
MAX(month_add_uv) OVER(ORDER BY start_month) AS max_month_add_uv
SUM(month_add_uv) OVER(ORDER BY start_month) AS cum_sum_uv
【问题拆解】:
反向思考由结果出发,要想计算截止当月的最大新增用户数和累计用户数:
- 需要先计算每月的新增用户数,然后进行窗口函数排序操作实现截止效果
- 新增用户 = 用户最早记录月份就为当月,就需要计算用户最早答题记录月份
因此,顺序为:
最早答题记录月份 —> 计算新增用户(最早答题记录等于当月) —> 计算截止当月的(窗口函数排序操作) —> 拼接
最早答题记录月份,记为t1
SELECT *,
DATE_FORMAT(start_time,'%Y%m') AS start_month,
MIN(DATE_FORMAT(start_time,'%Y%m')) OVER(PARTITION BY uid ORDER BY DATE_FORMTA(start_time,'%Y%m')) AS first_month
FROM exam_record
计算新增用户,记为t2
SELECT start_month,
COUNT(DISTINCT uid) AS mau,
COUNT(DISTINCT CASE WHEN first_month = start_month THEN uid ELSE Null END) AS month_add_uv
FROM t1
GROUP BY start_month
计算截止当月的
SELECT start_month,mau,month_add_uv,
MAX(month_add_uv) OVER(ORDER BY start_month) AS max_month_add_uv,
SUM(month_add_uv) OVER(ORDER BY start_month) AS cum_sum_uv
FROM t2
拼接
WITH t1 AS
(
SELECT *,
DATE_FORMAT(start_time,'%Y%m') AS start_month,
MIN(DATE_FORMAT(start_time,'%Y%m')) OVER(PARTITION BY uid ORDER BY DATE_FORMTA(start_time,'%Y%m')) AS first_month
FROM exam_record
)
SELECT start_month,mau,month_add_uv,
MAX(month_add_uv) OVER(ORDER BY start_month) AS max_month_add_uv,
SUM(month_add_uv) OVER(ORDER BY start_month) AS cum_sum_uv
FROM(
SELECT start_month,
COUNT(DISTINCT uid) AS mau,
COUNT(DISTINCT CASE WHEN first_month = start_month THEN uid ELSE Null END) AS month_add_uv
FROM t1
GROUP BY start_month
)
2.试卷完成数同比2020年的增长率及排名变化
【题目】:
请计算2021年上半年各类试卷的做完次数相比2020年上半年同期的增长率(百分比格式,保留1位小数),以及做完次数排名变化,按增长率和21年排名降序输出
结果输出如下:
【解题思路】:
Keywords:上半年、增长率、排名变化
- 上半年的筛选:将日期转换成月份,筛选1月到6月的数据
日期函数DATE_FORMAT(submit,'%Y-%m) BETWEEN '2020-01' AND '2020-06
- 增长率的计算:
growth_rate = (exam_cnt_21 - exam_cnt_20) - exam_cnt_20
- 排名变化的计算:
rank_delta = exam_cnt_rank_21 - exam_cnt_rank_20
问题拆解:
- 增长率和排名变化的计算都需要用到
exam_cnt_21
和exam_cnt_20
(20年和21年各类试卷完成次数),因此需要先筛选出来 - 筛选完之后,计算各类试卷增长率及排名
- 通过排名计算排名变化,并形成最终输出
细节:
- 计算排名需要用到窗口函数
RANK() OVER(ORDER BY exam_cnt_20 DESC)
,这里是对整个数据集范围内排名,因此不需要分组参数PARTITION BY
- 增长率为百分数,小数位为1,这里需要用到
CONCAT
()和COUNT
():
ROUND((exam_cnt_21 - exam_cnt_20) / exam_cnt_20 * 100) # 记为a
CONCAT(a,'%') # 拼接,记为b
- 两年完成次数相同时,会产生空值,需要
IFNULL()
进行处理
IFNULL(b,0)
【完整代码】:
第一步 筛选出各类试卷完成次数
SELECT tag,
SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2020-01' AND '2020-06',1,0)) AS exam_cnt_20,
SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2021-01' AND '2021-06',1,0)) AS exam_cn_21
FROM exam_record
LEFT JOIN examination_info USING(exam_id)
GROUP BY tag
第二步,计算各类试卷增长率及排名,记为t1
SELECT tag,
exam_cnt_20, exam_cnt_21,
IFNULL(CONCAT(ROUND((exam_cnt_21-exam_cnt_20)/exam_cnt_20 * 100),'%'),0) AS growth_rate,
RANK() OVER(ORDER BY exam_cnt_20 DESC) AS exam_cnt_rank_20,
RANK() OVER(ORDER BY exam_cnt_21 DESC) AS exam_cnt_rank_21
FROM(
SELECT tag,
SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2020-01' AND '2020-06',1,0)) AS exam_cnt_20,
SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2021-01' AND '2021-06',1,0)) AS exam_cn_21
FROM exam_record
LEFT JOIN examination_info USING(exam_id)
GROUP BY tag
) t1
通过排名计算降序,形成最终输出
SELECT tag,
exam_cnt_20, exam_cnt_21,
growth_rate,
exam_cnt_rank_20, exam_cnt_rank_21,
CAST(exam_cnt_rank_21 AS SIGNED) - CAST(exam_cnt_rank_20 AS SIGNED) AS rank_delta
FROM(
SELECT tag,
exam_cnt_20, exam_cnt_21,
IFNULL(CONCAT(ROUND((exam_cnt_21-exam_cnt_20)/exam_cnt_20 * 100),'%'),0) AS growth_rate,
RANK() OVER(ORDER BY exam_cnt_20 DESC) AS exam_cnt_rank_20,
RANK() OVER(ORDER BY exam_cnt_21 DESC) AS exam_cnt_rank_21
FROM(
SELECT tag,
SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2020-01' AND '2020-06',1,0)) AS exam_cnt_20,
SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2021-01' AND '2021-06',1,0)) AS exam_cn_21
FROM exam_record
LEFT JOIN examination_info USING(exam_id)
GROUP BY tag
) t1
) t2
WHERE exam_cnt_20 != 0 AND exam_cnt_21 != 0
ORDER by growth_rate DESC, exam_cnt_21 DESC
3. 最大同时在线人数
【题目】:
统计每篇文章同一时刻最大在看人数,如果同一时刻有进入也有离开时,先记录用户数增加再记录减少,结果按最大人数降序
【解题思路】:
先算出每组artical_id
的所有时刻的观看人数,再group by
找出其中最大观看人数
求每个时刻的观看人数
- 开始时刻记为 +1,表示进入
- 结束时刻记为 -1,表示离开
union all
进行拼接:时刻dt有两种,in_time和out_time;人数变化tag也有两种,+1和-1- 人数变化量按时间顺序的累加就可以算出每个时刻在看的人数:采用窗口函数
有个细节:题目要求先记录用户数增加再记录减少,所以窗口函数中order by
后还要加个tag desc
字段
【完整代码如下】:
SELECT artical_id,
MAX(uv) AS max_uv
FROM(SELECT artical_id,
SUM(tag) OVER(PARTITION BY artical_id ORDER BY dt,tag DESC) AS uv
FROM(SELECT artical_id,uid,
in_time AS dt,
1 AS tag
FROM tb_user_log
WHERE artical_id != 0
UNION ALL
SELECT artical_id,uid,
out_time AS dt,
-1 AS tag
FROM tb_user_log
) AS t1
) AS t2
GROUP BY artical_id
ORDER BY max_uv DESC
5. 连续签到
【题目】:
计算每个用户2021年7月以来每月获得的金币数(该活动到10月底结束,11月1日开始的签到不再获得金币)。结果按月份、ID升序排序。
注:如果签到记录的in_time-进入时间和out_time-离开时间跨天了,也只记作in_time对应的日期签到了
- 从2021年7月7日0点开始,用户每天签到可以领1金币,并可以开始累积签到天数,连续签到的第3、7天分别可额外领2、6金币。
- 每连续签到7天后重新累积签到天数(即重置签到天数:连续第8天签到时记为新的一轮签到的第一天,领1金币)
【解题思路】:
- 首先获得签到表
- 签到日期为2021-07-07 0点到2021-10-31 24点
- artrical_id = 0
- sign_in = 1
SELECT uid, DATE(in_time) dt
FROM tb_user_log
WHERE DATE(in_time) BETWEEN '2021-07-07' AND '2021-10-31' -- 条件1
AND artical_id = 0 -- 条件2
AND sign_in = 1 -- 条件3
- 其次计算连续签到天数
连续签到 = (时间 - 排序号)相同
SELECT uid,dt,
MOD(ROW_NUMBER() OVER(PARTITION BY uid, start_day ORDER BY dt),7) days -- 日期对应的连续签到天数
FROM(SELECT uid, DATE(in_time) dt,
DATE_SUB(DATE(in_time), INTERVAL ROW_NUMBER() OVER(PARTITION BY uid ORDER BY DATE(in_time)) DAY)
AS start_day, -- 连续签到 = (时间 - 排序号) is the same
FROM tb_user_log
WHERE DATE(in_time) BETWEEN '2021-07-07' AND '2021-10-31' -- 条件1
AND artical_id = 0 -- 条件2
AND sign_in = 1 -- 条件3
) t1
- 最后根据连续签到天数计算金币数
WITH t2 AS
SELECT(SELECT uid,dt,
MOD(ROW_NUMBER() OVER(PARTITION BY uid, start_day ORDER BY dt),7) days -- 日期对应的连续签到天数
FROM(SELECT uid, DATE(in_time) dt,
DATE_SUB(DATE(in_time), INTERVAL ROW_NUMBER() OVER(PARTITION BY uid ORDER BY DATE(in_time)) DAY)
AS start_day, -- 连续签到 = (时间 - 排序号) is the same
FROM tb_user_log
WHERE DATE(in_time) BETWEEN '2021-07-07' AND '2021-10-31' -- 条件1
AND artical_id = 0 -- 条件2
AND sign_in = 1 -- 条件3
) t1
)
SELECT uid,DATE_FORMAT(dt, '%Y%m') month,
SUM(CASE WHEN days = 0 THEN 7
WHEN days = 3 THEN 3
ELSE 1 END)
AS coin
FROM t2
GROUP BY uid, month
ORDER BY month, uid
这里,按照连续签到天数7的余数mod()
来处理即可
- 当签到天数%7=3 则领取3金币
- 当签到天数%7=0 则领取7金币
- 其余情况,领取1金币