【数据库】窗口函数实战（二）

G皮T

已于 2023-10-06 20:20:41 修改

阅读量291

点赞数 4

分类专栏： # 数据库文章标签：数据库 sql mysql 窗口函数 SQL编程牛客面试

于 2022-06-12 21:26:55 首次发布

本文链接：https://blog.csdn.net/Be_racle/article/details/125249085

版权

数据库专栏收录该内容

18 篇文章 10 订阅

订阅专栏

本系列包含：

窗口函数实战（二）

本篇文章是笔者在牛客网上摘选的几道比较有挑战性的SQL窗口函数编程题。

1.每月及截止当月的答题情况（原题链接）

我们首先来分析一下题目的要求：

用户有作答记录，所以 submit_time 即使为空也是满足要求的。
月活跃用户数。这个比较好实现，注意去重。
月新增用户数。这个要求是这一题的难点。我们可以生成一列，用于记录该用户是否为新增用户，其判断条件就是对于某用户的某条试卷作答记录，该用户是否有更早的作答时间。
截止当月的单月最大新增用户数、截止当月的累积用户数。这两个都可以根据月新增用户数得到，比较容易实现。

下面提供一种解法。

select start_month, count(distinct uid) as mau,
       sum(new_customer) as month_add_uv,
       max(sum(new_customer)) over (order by start_month) as max_month_add_uv,
       sum(sum(new_customer)) over (order by start_month) as cum_sum_uv
from (
    select uid, date_format(start_time,'%Y%m') as start_month,
           if(start_time = min(start_time) over (partition by uid), 1, 0) as new_customer
    from exam_record
) a
group by start_month
order by start_month

2.每份试卷每月作答数和截止当月的作答总数（原题链接）

这一题只给了一个表，问法也比较简单：每份试卷每月作答数和截止当月的作答总数。

有作答记录的月份，start_time 不为空即可。
每月的试卷作答数。
截至当月的作答总数。

以下是我第一次解答时的做法。

with b as (
    select exam_id, start_month,
           count(start_month) over (partition by exam_id, start_month order by exam_id, start_month) as count_,
           row_number() over (partition by exam_id, start_month) as row_number_
    from (
        select exam_id, date_format(start_time,'%Y%m') as start_month
        from exam_record
    ) a
)
select exam_id, start_month, count_,
       sum(count_) over (partition by exam_id order by exam_id, start_month)
from b
where row_number_ = 1

思考了一下，觉得还是比较繁（因为包含了一个删除重复行的操作），应该在一开始就使用 group by，改写后如下。

select exam_id, 
       date_format(start_time,'%Y%m') as start_month,
       count(start_time) as month_cnt,
       sum(count(start_time)) over (partition by exam_id order by exam_id, date_format(start_time,'%Y%m'))
from exam_record
group by exam_id, date_format(start_time,'%Y%m')

3.对试卷得分做min-max归一化（原题链接）

这一道题的查询条件比较多，我们一起来分析一下。

试卷的难度为 difficult。
将 score 标准化， $\frac{x-min(x_i)}{max(x_i)-min(x_i)}$ ，所以 score 不能为空。
标准化的条件是：在每份试卷作答记录内进行标准化，说明要对 exam_id 分组。
如果某个试卷作答记录中只有一个得分，那么无需使用公式，归一化并缩放后分数仍为原分数。
输出用户ID、试卷ID、归一化后分数平均值（某用户可能多次作答同一份试卷，计算的是这个平均值）。

下面给出一种解法。

select uid, exam_id, round(avg(standard_score)) as avg_new_score
from(
    select *, ( case when max_score != min_score then (score - min_score)/(max_score - min_score) * 100
                else score end ) as standard_score
    from (
        select uid, exam_id, score,
               max(score) over (partition by exam_id) as max_score,
               min(score) over (partition by exam_id) as min_score
        from exam_record
        left outer join examination_info using(exam_id)
        where difficulty = 'hard' and score is not null
    ) a
) b
group by uid, exam_id
order by exam_id, avg_new_score desc

判断是否 “某个试卷作答记录中只有一个得分” ，除了可以用 case when，还可以像下面这么写。

if( max_score = min_score, score, (score - min_score)/(max_score - min_score) * 100 ) as standard_score

当我们练过一些牛客上的 SQL 题会发现，即使是困难的题目，只要充分理解题意，罗列清楚查询条件，通过窗口函数、多层嵌套、表连接、条件语句等知识组合，我们也能很顺利地取到想要的数。

除了准，还要快，所以当我们能确保取数的正确性后，就要进一步考虑如何能更快速地取数，也就是如何优化你的 SQL。道阻且长，与君共勉！

G皮T

关注

4
点赞
踩
8

收藏

觉得还不错? 一键收藏
打赏
1
评论
【数据库】窗口函数实战（二）

本篇文章是笔者在牛客网上摘选的几道比较有挑战性的SQL窗口函数编程题。
复制链接

扫一扫

专栏目录