题目描述
问题:
统计2021年国庆头3天每类视频每天的近一周总点赞量和一周内最大单天转发量,结果按视频类别降序、日期升序排序。假设数据库中数据足够多,至少每个类别下国庆头3天及之前一周的每天都有播放记录。
国庆头3天(10.01 ~ 10.03)里10.01的近7天(9.25 ~ 10.01)总点赞量为5次,单天最大转发量为2次(9月25那天最大);同理可得10.02和10.03的两个指标。
用户-视频互动表tb_user_video_log:
id | uid | video_id | start_time | end_time | if_follow | if_like | if_retweet | comment_id |
---|---|---|---|---|---|---|---|---|
1 | 101 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:20 | 0 | 1 | 1 | NULL |
2 | 102 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:15 | 0 | 0 | 1 | NULL |
3 | 103 | 2001 | 2021-10-01 11:00:50 | 2021-10-01 11:01:15 | 0 | 1 | 0 | 1732526 |
4 | 102 | 2002 | 2021-09-10 11:00:00 | 2021-09-10 11:00:30 | 1 | 0 | 1 | NULL |
5 | 103 | 2002 | 2021-10-01 10:59:05 | 2021-10-01 11:00:05 | 1 | 0 | 0 | NULL |
6 | 102 | 2002 | 2021-09-28 11:00:00 | 2021-09-28 11:00:30 | 1 | 0 | 1 | NULL |
7 | 103 | 2002 | 2021-09-29 11:00:00 | 2021-10-02 11:00:30 | 1 | 0 | 1 | NULL |
8 | 102 | 2002 | 2021-09-30 11:00:00 | 2021-09-30 11:00:30 | 1 | 1 | 1 | NULL |
9 | 101 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:20 | 1 | 1 | 0 | NULL |
10 | 102 | 2001 | 2021-10-01 10:00:00 | 2021-10-01 10:00:15 | 0 | 0 | 1 | NULL |
11 | 103 | 2001 | 2021-10-01 11:00:50 | 2021-10-01 11:01:15 | 1 | 1 | 0 | 1732526 |
12 | 106 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 2 | 0 | 1 | NULL |
13 | 107 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 1 | 0 | 1 | NULL |
14 | 108 | 2002 | 2021-10-02 10:59:05 | 2021-10-02 11:00:05 | 1 | 1 | 1 | NULL |
15 | 109 | 2002 | 2021-10-03 10:59:05 | 2021-10-03 11:00:05 | 0 | 1 | 0 | NULL |
- uid:用户ID
- video_id:视频ID
- start_time:开始观看时间
- end_time:结束观看时间
- if_follow:是否关注
- if_like:是否点赞
- if_retweet:是否转发
- comment_id:评论ID
短视频信息表tb_video_info:
id | video_id | author | tag | duration | release_time |
---|---|---|---|---|---|
1 | 2001 | 901 | 影视 | 30 | 2021-01-01 07:00:00 |
2 | 2002 | 901 | 美食 | 60 | 2021-01-01 07:00:00 |
3 | 2003 | 902 | 旅游 | 90 | 2020-01-01 07:00:00 |
- video_id:视频ID
- author:创作者ID
- tag:类别标签
- duration:视频时长
- release_time:发布时间
最终输出样式:
tag | dt | sum_like_cnt_7d | max_retweet_cnt_7d |
---|---|---|---|
旅游 | 2021-10-01 | 5 | 2 |
旅游 | 2021-10-02 | 5 | 3 |
旅游 | 2021-10-03 | 6 | 3 |
题目分析
题目理解:
- 统计纬度:2021年国庆头3天每类视频每天的数据
- 2021年内:筛选start_time 在2021年内;
- 国庆头3天:筛选start_time 在 [ 2021-10-01,2021-10-03];
- 每类视频每天:按视频类别和日期分组; - 统计指标:近一周总点赞量和一周内最大单天转发量
- 近一周总点赞量:统计日+前6天的点赞量
- 一周内最大单天转发量:统计日+前6天的最大转发量 - 排序格式:
- 按视频类别降序、日期升序排序
实现方式:
- 统计纬度:
- 用YEAR()函数获取日期中年份作为筛选条件:WHERE YEAR(start_time) = 2021
- 用BETWEEN…AND…筛选国庆头3天的数据:BETWEEN 2021-10-01 AND 2021-10-03
- 用GROUP BY 分组视频类别和日期:GROUP BY tap,start_time
- 统计指标:
- 近一周总点赞量:滑动窗口函数,按视频类别分类,日期排序滑动统计前6天+统计日的点赞量求和:SUM (like_cnt ) OVER ( PARTITION BY tag ORDER BY date1 ROWS BETWEEN 6 PRECEDING AND CURRENT ROW )
- 一周内最大单天转发量:滑动窗口函数,按视频类别分类,日期排序滑动统计前6天+统计日的最大转发量:MAX (retweet_cnt ) OVER ( PARTITION BY tag ORDER BY date1 ROWS BETWEEN 6 PRECEDING AND CURRENT ROW )
- 排序格式:
-ORDER BY tag DESC, dt ASC
SQL实现
SELECT tag,dt,sum_like_cnt_7d,max_retweet_cnt_7d
FROM
(SELECT tag,date_start AS dt,
SUM(like_cnt) OVER(PARTITION BY tag ORDER BY date_start ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) sum_like_cnt_7d,
MAX(retweet1) OVER(PARTITION BY tag ORDER BY date_start ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) max_retweet_cnt_7d
FROM(
SELECT t2.tag,
DATE_FORMAT(t1.start_time,'%Y-%m-%d') AS date_start,
SUM(if_like) AS like_cnt,
SUM(a.if_retweet) AS retweet_cnt
FROM tb_user_video_log t1
LEFT JOIN tb_video_info t2
USING(video_id)
WHERE YEAR(t1.start_time)=2021
GROUP BY t2.tag,date_start) table1
GROUP BY tag,date1) table2
WHERE dt BETWEEN '2021-10-01' AND '2021-10-03'
GROUP BY tag,dt
ORDER BY tag DESC,dt ASC