Sql之面试题总结

1. 每月及截止当月的答题情况

【题目】:

现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分)
请输出自从有用户作答记录以来,每月的试卷作答记录中月活用户数新增用户数截止当月的单月最大新增用户数、截止当月的累积用户数:
在这里插入图片描述
【解题思路】

Keywords:活跃用户数、新增用户数、截止当月单月新增最大、截止当月的累积用户数

  1. 活跃用户数:每月有提交记录的用户
COUNT(DISTINCT uid) AS mau
  1. 新增用户数:到该月为用户最早记录所在月
# 最早记录
MIN(DATE_FORMAT(start_time,'%Y%m')) OVER(PARTITION BY uid ORDER BY DATE_FORMAT(start_time,'%Y%m'))	AS first_month

# 最早记录等于开始作答即为该月新增
COUNT(DISTINCT CASE WHEN first_month = start_time THEN uid ELSE Null END) AS month_add_uv
  1. 截止当月单月新增最大和累积:窗口函数的排序参数,不需要分组参数加partition by
MAX(month_add_uv) OVER(ORDER BY start_month) AS max_month_add_uv

SUM(month_add_uv) OVER(ORDER BY start_month) AS cum_sum_uv

【问题拆解】

反向思考由结果出发,要想计算截止当月的最大新增用户数和累计用户数:

  • 需要先计算每月的新增用户数,然后进行窗口函数排序操作实现截止效果
  • 新增用户 = 用户最早记录月份就为当月,就需要计算用户最早答题记录月份

因此,顺序为

最早答题记录月份 —> 计算新增用户(最早答题记录等于当月) —> 计算截止当月的(窗口函数排序操作) —> 拼接

最早答题记录月份,记为t1

SELECT *,
	   DATE_FORMAT(start_time,'%Y%m') AS start_month,
	   MIN(DATE_FORMAT(start_time,'%Y%m')) OVER(PARTITION BY uid ORDER BY DATE_FORMTA(start_time,'%Y%m')) AS 		first_month
FROM exam_record

计算新增用户,记为t2

SELECT start_month,
	   COUNT(DISTINCT uid) AS mau,
	   COUNT(DISTINCT CASE WHEN first_month = start_month THEN uid ELSE Null END) AS month_add_uv
FROM t1
GROUP BY start_month

计算截止当月的

SELECT start_month,mau,month_add_uv,
	   MAX(month_add_uv) OVER(ORDER BY start_month) AS max_month_add_uv,
	   SUM(month_add_uv) OVER(ORDER BY start_month) AS cum_sum_uv
FROM t2

拼接

WITH t1 AS
(
    SELECT *,
	   DATE_FORMAT(start_time,'%Y%m') AS start_month,
	   MIN(DATE_FORMAT(start_time,'%Y%m')) OVER(PARTITION BY uid ORDER BY DATE_FORMTA(start_time,'%Y%m')) AS 		first_month
FROM exam_record
)

SELECT start_month,mau,month_add_uv,
	   MAX(month_add_uv) OVER(ORDER BY start_month) AS max_month_add_uv,
	   SUM(month_add_uv) OVER(ORDER BY start_month) AS cum_sum_uv
FROM(
    SELECT start_month,
           COUNT(DISTINCT uid) AS mau,
           COUNT(DISTINCT CASE WHEN first_month = start_month THEN uid ELSE Null END) AS month_add_uv
    FROM t1
    GROUP BY start_month
	)

2.试卷完成数同比2020年的增长率及排名变化

【题目】:

请计算2021年上半年各类试卷的做完次数相比2020年上半年同期的增长率(百分比格式,保留1位小数),以及做完次数排名变化,按增长率和21年排名降序输出

结果输出如下:
在这里插入图片描述

【解题思路】

Keywords:上半年、增长率、排名变化

  1. 上半年的筛选:将日期转换成月份,筛选1月到6月的数据

日期函数DATE_FORMAT(submit,'%Y-%m) BETWEEN '2020-01' AND '2020-06

  1. 增长率的计算:growth_rate = (exam_cnt_21 - exam_cnt_20) - exam_cnt_20
  2. 排名变化的计算:rank_delta = exam_cnt_rank_21 - exam_cnt_rank_20

问题拆解

  • 增长率和排名变化的计算都需要用到exam_cnt_21exam_cnt_20(20年和21年各类试卷完成次数),因此需要先筛选出来
  • 筛选完之后,计算各类试卷增长率及排名
  • 通过排名计算排名变化,并形成最终输出

细节

  • 计算排名需要用到窗口函数RANK() OVER(ORDER BY exam_cnt_20 DESC),这里是对整个数据集范围内排名,因此不需要分组参数PARTITION BY
  • 增长率为百分数,小数位为1,这里需要用到CONCAT()和COUNT():
ROUND((exam_cnt_21 - exam_cnt_20) / exam_cnt_20 * 100) # 记为a
CONCAT(a,'%')  # 拼接,记为b
  • 两年完成次数相同时,会产生空值,需要IFNULL()进行处理
IFNULL(b,0) 

【完整代码】

第一步 筛选出各类试卷完成次数

SELECT tag,
	   SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2020-01' AND '2020-06',1,0)) AS exam_cnt_20,
	   SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2021-01' AND '2021-06',1,0)) AS exam_cn_21
FROM exam_record
LEFT JOIN  examination_info USING(exam_id)
GROUP BY tag

第二步,计算各类试卷增长率及排名,记为t1

SELECT tag,
	   exam_cnt_20, exam_cnt_21,
	   IFNULL(CONCAT(ROUND((exam_cnt_21-exam_cnt_20)/exam_cnt_20 * 100),'%'),0) AS growth_rate,
	   RANK() OVER(ORDER BY exam_cnt_20 DESC) AS exam_cnt_rank_20,
	   RANK() OVER(ORDER BY exam_cnt_21 DESC) AS exam_cnt_rank_21
FROM(
	SELECT tag,
           SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2020-01' AND '2020-06',1,0)) AS exam_cnt_20,
           SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2021-01' AND '2021-06',1,0)) AS exam_cn_21
    FROM exam_record
    LEFT JOIN  examination_info USING(exam_id)
    GROUP BY tag	
	) t1

通过排名计算降序,形成最终输出

SELECT tag,
	   exam_cnt_20, exam_cnt_21,
	   growth_rate,
	   exam_cnt_rank_20, exam_cnt_rank_21,
	   CAST(exam_cnt_rank_21 AS SIGNED) - CAST(exam_cnt_rank_20 AS SIGNED) AS rank_delta
FROM(
	SELECT tag,
           exam_cnt_20, exam_cnt_21,
           IFNULL(CONCAT(ROUND((exam_cnt_21-exam_cnt_20)/exam_cnt_20 * 100),'%'),0) AS growth_rate,
           RANK() OVER(ORDER BY exam_cnt_20 DESC) AS exam_cnt_rank_20,
           RANK() OVER(ORDER BY exam_cnt_21 DESC) AS exam_cnt_rank_21
    FROM(
        SELECT tag,
               SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2020-01' AND '2020-06',1,0)) AS exam_cnt_20,
               SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2021-01' AND '2021-06',1,0)) AS exam_cn_21
        FROM exam_record
        LEFT JOIN  examination_info USING(exam_id)
        GROUP BY tag	
        ) t1
	) t2 
WHERE exam_cnt_20 != 0 AND exam_cnt_21 != 0
ORDER by growth_rate DESC, exam_cnt_21 DESC

3. 最大同时在线人数

【题目】:

统计每篇文章同一时刻最大在看人数,如果同一时刻有进入也有离开时,先记录用户数增加再记录减少,结果按最大人数降序
在这里插入图片描述

【解题思路】:

先算出每组artical_id的所有时刻的观看人数,再group by找出其中最大观看人数

求每个时刻的观看人数

  • 开始时刻记为 +1,表示进入
  • 结束时刻记为 -1,表示离开
  • union all进行拼接:时刻dt有两种,in_time和out_time;人数变化tag也有两种,+1和-1
  • 人数变化量按时间顺序的累加就可以算出每个时刻在看的人数:采用窗口函数

有个细节:题目要求先记录用户数增加再记录减少,所以窗口函数中order by后还要加个tag desc字段

【完整代码如下】:

SELECT artical_id,
	   MAX(uv) AS max_uv
FROM(SELECT artical_id,
            SUM(tag) OVER(PARTITION BY artical_id ORDER BY dt,tag DESC) AS uv
     FROM(SELECT artical_id,uid,
                 in_time AS dt,
                 1 AS tag
          FROM tb_user_log
          WHERE artical_id != 0
          UNION ALL
          SELECT artical_id,uid,
                 out_time AS dt,
                 -1 AS tag
          FROM tb_user_log
          ) AS t1
      ) AS t2
GROUP BY artical_id
ORDER BY max_uv DESC

5. 连续签到

【题目】:

计算每个用户2021年7月以来每月获得的金币数(该活动到10月底结束,11月1日开始的签到不再获得金币)。结果按月份、ID升序排序。

:如果签到记录的in_time-进入时间和out_time-离开时间跨天了,也只记作in_time对应的日期签到了

  • 从2021年7月7日0点开始,用户每天签到可以领1金币,并可以开始累积签到天数,连续签到的第3、7天分别可额外领2、6金币。
  • 每连续签到7天后重新累积签到天数(即重置签到天数:连续第8天签到时记为新的一轮签到的第一天,领1金币)

【解题思路】:

  1. 首先获得签到表
  • 签到日期为2021-07-07 0点到2021-10-31 24点
  • artrical_id = 0
  • sign_in = 1
SELECT uid, DATE(in_time) dt
FROM tb_user_log
WHERE DATE(in_time) BETWEEN '2021-07-07' AND '2021-10-31'    -- 条件1
AND artical_id = 0											 -- 条件2
AND sign_in = 1												 -- 条件3
  1. 其次计算连续签到天数

连续签到 = (时间 - 排序号)相同

SELECT uid,dt,
	   MOD(ROW_NUMBER() OVER(PARTITION BY uid, start_day ORDER BY dt),7) days       -- 日期对应的连续签到天数
FROM(SELECT uid, DATE(in_time) dt,
            DATE_SUB(DATE(in_time), INTERVAL ROW_NUMBER() OVER(PARTITION BY uid ORDER BY DATE(in_time)) DAY) 
            AS start_day,                                            -- 连续签到 = (时间 - 排序号) is the same 
     FROM tb_user_log
     WHERE DATE(in_time) BETWEEN '2021-07-07' AND '2021-10-31'       -- 条件1
     AND artical_id = 0											     -- 条件2
     AND sign_in = 1												 -- 条件3
     ) t1
  1. 最后根据连续签到天数计算金币数
WITH t2 AS
SELECT(SELECT uid,dt,
           MOD(ROW_NUMBER() OVER(PARTITION BY uid, start_day ORDER BY dt),7) days  -- 日期对应的连续签到天数
       FROM(SELECT uid, DATE(in_time) dt,
            DATE_SUB(DATE(in_time), INTERVAL ROW_NUMBER() OVER(PARTITION BY uid ORDER BY DATE(in_time)) DAY) 
            AS start_day,                            -- 连续签到 = (时间 - 排序号) is the same 
         FROM tb_user_log
         WHERE DATE(in_time) BETWEEN '2021-07-07' AND '2021-10-31'       -- 条件1
         AND artical_id = 0											     -- 条件2
         AND sign_in = 1	 											 -- 条件3
         ) t1
     )

SELECT uid,DATE_FORMAT(dt, '%Y%m') month,
	   SUM(CASE WHEN days = 0 THEN 7
           		WHEN days = 3 THEN 3
          		ELSE 1 END)
       AS coin
FROM t2
GROUP BY uid, month
ORDER BY month, uid

这里,按照连续签到天数7的余数mod()来处理即可

  • 当签到天数%7=3 则领取3金币
  • 当签到天数%7=0 则领取7金币
  • 其余情况,领取1金币
  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值