补上「窗口函数」的知识空缺,整理「哔哩哔哩一面」的SQL操作。
数据是自己生成的,代码是自己码的。如有错误,请评论指正,感谢!
❤️「完整真题」
哔哩哔哩2020届秋招数据分析师面试第一轮(2019.8.8)
一、第一题
已知日志表log,有3个字段:Date、user_ID(ID都是唯一的)、Flag(会员与否,0/1变量)。
问:统计每天新增会员数量。
- 「思路」
- 先过滤,只考虑是会员的记录;
- 每日新增会员,是指当日的该会员不在以往出现过。所以,用NOT IN排除以往的user_id。
- SQL实现
step1:
CREATE VIEW flag1 AS
SELECT * FROM log
WHERE flag = 1;
step2 :
SELECT t1.date, COUNT(user_id)
FROM flag1 AS t1
WHERE t1.user_id NOT IN
(SELECT t2.user_id FROM flag1 AS t2
WHERE t2.date < t1.date)
GROUP BY t1.date;
- 完整SQL代码
CREATE VIEW flag1 AS
SELECT * FROM log
WHERE flag = 1;
SELECT t1.date, COUNT(user_id)
FROM flag1 AS t1
WHERE t1.user_id NOT IN
(SELECT t2.user_id FROM flag1 AS t2
WHERE t2.date < t1.date)
GROUP BY t1.date;
发散思维
- 如果要每日新增会员表呢?
SELECT t1.date, user_id
FROM flag1 AS t1
WHERE t1.user_id NOT IN
(SELECT t2.user_id FROM flag1 AS t2
WHERE t2.date < t1.date);
- 如果要把每日新增会员名字和数量合并在一张表呢?
SELECT t1.date, user_id,
COUNT(user_id) OVER (PARTITION BY t1.date) AS cum_count
FROM flag1 AS t1
WHERE t1.user_id NOT IN
(SELECT t2.user_id FROM flag1 AS t2
WHERE t2.date < t1.date);
二、第二题
已知节目表,3个字段:节目id,节目分类,播放量。
输出:每个分类的播放量前3的节目。
- SQL 实现
SELECT category, show_id, video_view FROM
(SELECT *,
rank() OVER (PARTITION BY category ORDER BY video_view DESC) AS t
FROM show_table) new_table
WHERE new_table.t <= 3;