Sql刷题(1)

某音短视频

描述

用户-视频互动表tb_user_video_log

iduidvideo_idstart_timeend_timeif_followif_likeif_retweetcomment_id
110120012021-10-01 10:00:002021-10-01 10:00:20011NULL
210220012021-10-01 10:00:002021-10-01 10:00:15001NULL
310320012021-10-01 11:00:502021-10-01 11:01:150101732526
410220022021-09-10 11:00:002021-09-10 11:00:30101NULL
510320022021-10-01 10:59:052021-10-01 11:00:05100NULL

(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)

短视频信息表tb_video_info

idvideo_idauthortagdurationrelease_time
12001901影视302021-01-01 07:00:00
22002901美食602021-01-01 07:00:00
32003902旅游902020-01-01 07:00:00

(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)

SQL 156 各个视频的平均完播率

问题:计算2021年里有播放记录的每个视频的完播率(结果保留三位小数),并按完播率降序排序

:视频完播率是指完成播放次数占总播放次数的比例。简单起见,结束观看时间与开始播放时间的差>=视频时长时,视为完成播放。

输出示例

示例数据的结果如下:

video_idavg_comp_play_rate
20010.667
20020.000

解释:

视频2001在2021年10月有3次播放记录,观看时长分别为30秒、24秒、34秒,视频时长30秒,因此有两次是被认为完成播放了的,故完播率为0.667;

视频2002在2021年9月和10月共2次播放记录,观看时长分别为42秒、30秒,视频时长60秒,故完播率为0.000。

数据示例
输入:
DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    video_id INT NOT NULL COMMENT '视频ID',
    start_time datetime COMMENT '开始观看时间',
    end_time datetime COMMENT '结束观看时间',
    if_follow TINYINT COMMENT '是否关注',
    if_like TINYINT COMMENT '是否点赞',
    if_retweet TINYINT COMMENT '是否转发',
    comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE tb_video_info (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    video_id INT UNIQUE NOT NULL COMMENT '视频ID',
    author INT NOT NULL COMMENT '创作者ID',
    tag VARCHAR(16) NOT NULL COMMENT '类别标签',
    duration INT NOT NULL COMMENT '视频时长(秒数)',
    release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
  (101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:30', 0, 1, 1, null),
  (102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:24', 0, 0, 1, null),
  (103, 2001, '2021-10-01 11:00:00', '2021-10-01 11:00:34', 0, 1, 0, 1732526),
  (101, 2002, '2021-09-01 10:00:00', '2021-09-01 10:00:42', 1, 0, 1, null),
  (102, 2002, '2021-10-01 11:00:00', '2021-10-01 11:00:30', 1, 0, 1, null);

INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
  (2001, 901, '影视', 30, '2021-01-01 7:00:00'),
  (2002, 901, '美食', 60, '2021-01-01 7:00:00'),
  (2003, 902, '旅游', 90, '2021-01-01 7:00:00');
复制
输出:
2001|0.667
2002|0.000
解答
select a.video_id,
       round(sum(if(end_time-start_time>=duration,1,0)) /count(start_time),3) as avg_play_progress
    from tb_user_video_log a
    left join tb_video_info tvi on a.video_id = tvi.video_id
    and year(start_time) = 2021
    group by a.video_id
    order by avg_play_progress DESC ;
涉及知识点

if xxx 1,0 以及时间戳相加减得到的结果为数字,可以直接做比较

SQL157 平均播放进度大于60%的视频类别

问题:计算各类视频的平均播放进度,将进度大于60%的类别输出。

  • 播放进度=播放时长÷视频时长*100%,当播放时长大于视频时长时,播放进度均记为100%。
  • 结果保留两位小数,并按播放进度倒序排序。

输出示例

示例数据的输出结果如下:

tagavg_play_progress
影视90.00%
美食75.00%

解释:

影视类视频2001被用户101、102、103看过,播放进度分别为:30秒(100%)、21秒(70%)、30秒(100%),平均播放进度为90.00%(保留两位小数);

美食类视频2002被用户102、103看过,播放进度分别为:30秒(50%)、60秒(100%),平均播放进度为75.00%(保留两位小数);

数据示例
输入:
DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    video_id INT NOT NULL COMMENT '视频ID',
    start_time datetime COMMENT '开始观看时间',
    end_time datetime COMMENT '结束观看时间',
    if_follow TINYINT COMMENT '是否关注',
    if_like TINYINT COMMENT '是否点赞',
    if_retweet TINYINT COMMENT '是否转发',
    comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE tb_video_info (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    video_id INT UNIQUE NOT NULL COMMENT '视频ID',
    author INT NOT NULL COMMENT '创作者ID',
    tag VARCHAR(16) NOT NULL COMMENT '类别标签',
    duration INT NOT NULL COMMENT '视频时长(秒数)',
    release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
  (101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:30', 0, 1, 1, null),
  (102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:21', 0, 0, 1, null),
  (103, 2001, '2021-10-01 11:00:50', '2021-10-01 11:01:20', 0, 1, 0, 1732526),
  (102, 2002, '2021-10-01 11:00:00', '2021-10-01 11:00:30', 1, 0, 1, null),
  (103, 2002, '2021-10-01 10:59:05', '2021-10-01 11:00:05', 1, 0, 1, null);

INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
  (2001, 901, '影视', 30, '2021-01-01 7:00:00'),
  (2002, 901, '美食', 60, '2021-01-01 7:00:00'),
  (2003, 902, '旅游', 90, '2020-01-01 7:00:00');
复制
输出:
影视|90.00%
美食|75.00%
解答
select tag,
       concat(round((sum(case when TIMESTAMPDIFF(SECOND,start_time,end_time) >= duration then 1
           else TIMESTAMPDIFF(second ,start_time,end_time)/duration end)/count(*))*100,2),'%') as  avg_comp_play_rate
    from tb_user_video_log a
    left join tb_video_info b on a.video_id = b.video_id
    group by tag
    having SUBSTRING_INDEX(avg_comp_play_rate,'%',1)>60
    order by avg_comp_play_rate desc ;
涉及知识点

TIMESTAMPDIFF函数

​ TIMESTAMPDIFF(unit,begin,end),其中begin和end是DATE或DATETIME表达式。TIMESTAMPDIFF函数允许其参数具有混合类型,例如,begin是DATE值,end可以是DATETIME值。 如果使用DATE值,则TIMESTAMPDIFF函数将其视为时间部分为“00:00:00”的DATETIME值。

​ unit参数是确定(end-begin)的结果的单位,表示为整数。 以下是有效单位:

MICROSECOND  微秒
SECONDMINUTE  分钟
HOUR  小时
DAY  天
WEEK  周
MONTH  月份
QUARTER 
YEAR  年份

示例

select TIMESTAMPDIFF(day ,'2020-01-02','2020-01-04') as result;

结果为

2

SUBSTRING_INDEX 函数

​ SUBSTRING_INDEX(STR,DELIM,COUNT)

str:要处理的字符串 delim:分隔符 count:计数

示例

select SUBSTRING_INDEX('2020-02-01','-',1) as result;
select SUBSTRING_INDEX('2020-02-01','-',2) as result;

结果分别为i

2020  
2020-02
SQL158 每类视频近一个月的转发量/率

问题:统计在有用户互动的最近一个月(按包含当天在内的近30天算,比如10月31日的近30天为10.2~10.31之间的数据)中,每类视频的转发量和转发率(保留3位小数)。

:转发率=转发量÷播放量。结果按转发率降序排序。

输出示例

示例数据的输出结果如下

tagretweet_cutretweet_rate
影视20.667
美食10.500

解释:

由表tb_user_video_log的数据可得,数据转储当天为2021年10月1日。近30天内,影视类视频2001共有3次播放记录,被转发2次,转发率为0.667;美食类视频2002共有2次播放记录,1次被转发,转发率为0.500。

数据示例
DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    video_id INT NOT NULL COMMENT '视频ID',
    start_time datetime COMMENT '开始观看时间',
    end_time datetime COMMENT '结束观看时间',
    if_follow TINYINT COMMENT '是否关注',
    if_like TINYINT COMMENT '是否点赞',
    if_retweet TINYINT COMMENT '是否转发',
    comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE tb_video_info (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    video_id INT UNIQUE NOT NULL COMMENT '视频ID',
    author INT NOT NULL COMMENT '创作者ID',
    tag VARCHAR(16) NOT NULL COMMENT '类别标签',
    duration INT NOT NULL COMMENT '视频时长(秒数)',
    release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
   (101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:20', 0, 1, 1, null)
  ,(102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:15', 0, 0, 1, null)
  ,(103, 2001, '2021-10-01 11:00:50', '2021-10-01 11:01:15', 0, 1, 0, 1732526)
  ,(102, 2002, '2021-09-10 11:00:00', '2021-09-10 11:00:30', 1, 0, 1, null)
  ,(103, 2002, '2021-10-01 10:59:05', '2021-10-01 11:00:05', 1, 0, 0, null);

INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
   (2001, 901, '影视', 30, '2021-01-01 7:00:00')
  ,(2002, 901, '美食', 60, '2021-01-01 7:00:00')
  ,(2003, 902, '旅游', 90, '2020-01-01 7:00:00');
复制
输出:
影视|2|0.667
美食|1|0.500
解答
select tag,
       sum(if(if_retweet =1,1,0)) as retweet_cut,
        round(sum(if(if_retweet =1,1,0)) /count(if_retweet),3) as retweet_rate
    from tb_user_video_log a
    left join tb_video_info b on a.video_id = b.video_id
    where datediff(date((select max(start_time) from tb_user_video_log)),date(start_time)) <= 29
    group by tag
    order by retweet_rate desc ;
涉及知识点

datediff函数

​ 由于本题提出了前提,要用户互动的最近一个月求出转发率与播放率,所以我们必须要对条件进行限制

datediff函数 返回两个日期时间之差

datediff(date1,date2)

SELECT DATEDIFF('2010-6-30','2010-6-26') AS DiffDate

结果为

4
SQL 159 每个创作者每月的涨粉率及截止当前的总粉丝量

用户-视频互动表tb_user_video_log

iduidvideo_idstart_timeend_timeif_followif_likeif_retweetcomment_id
110120012021-09-01 10:00:002021-09-01 10:00:20011NULL
210520022021-09-10 11:00:002021-09-10 11:00:30101NULL
310120012021-10-01 10:00:002021-10-01 10:00:20111NULL
410220012021-10-01 10:00:002021-10-01 10:00:15001NULL
510320012021-10-01 11:00:502021-10-01 11:01:151101732526
610620022021-10-01 10:59:052021-10-01 11:00:05200NULL

(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)

短视频信息表tb_video_info

idvideo_idauthortagdurationrelease_time
12001901影视302021-01-01 07:00:00
22002901美食602021-01-01 07:00:00
32003902旅游902020-01-01 07:00:00
42004902美女902020-01-01 08:00:00

(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)

问题:计算2021年里每个创作者每月的涨粉率及截止当月的总粉丝量

  • 涨粉率=(加粉量 - 掉粉量) / 播放量。结果按创作者ID、总粉丝量升序排序。
  • if_follow-是否关注为1表示用户观看视频中关注了视频创作者,为0表示此次互动前后关注状态未发生变化,为2表示本次观看过程中取消了关注。

输出示例

示例数据的输出结果如下

authormonthfans_growth_ratetotal_fans
9012021-090.5001
9012021-100.2502

解释:

示例数据中表tb_user_video_log里只有视频2001和2002的播放记录,都来自创作者901,播放时间在2021年9月和10月;其中9月里加粉量为1,掉粉量为0,播放量为2,因此涨粉率为0.500(保留3位小数);其中10月里加粉量为2,掉份量为1,播放量为4,因此涨粉率为0.250,截止当前总粉丝数为2。

数据示例
输入:
DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    video_id INT NOT NULL COMMENT '视频ID',
    start_time datetime COMMENT '开始观看时间',
    end_time datetime COMMENT '结束观看时间',
    if_follow TINYINT COMMENT '是否关注',
    if_like TINYINT COMMENT '是否点赞',
    if_retweet TINYINT COMMENT '是否转发',
    comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE tb_video_info (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    video_id INT UNIQUE NOT NULL COMMENT '视频ID',
    author INT NOT NULL COMMENT '创作者ID',
    tag VARCHAR(16) NOT NULL COMMENT '类别标签',
    duration INT NOT NULL COMMENT '视频时长(秒数)',
    release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
   (101, 2001, '2021-09-01 10:00:00', '2021-09-01 10:00:20', 0, 1, 1, null)
  ,(105, 2002, '2021-09-10 11:00:00', '2021-09-10 11:00:30', 1, 0, 1, null)
  ,(101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:20', 1, 1, 1, null)
  ,(102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:15', 0, 0, 1, null)
  ,(103, 2001, '2021-10-01 11:00:50', '2021-10-01 11:01:15', 1, 1, 0, 1732526)
  ,(106, 2002, '2021-10-01 10:59:05', '2021-10-01 11:00:05', 2, 0, 0, null);

INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
   (2001, 901, '影视', 30, '2021-01-01 7:00:00')
  ,(2002, 901, '影视', 60, '2021-01-01 7:00:00')
  ,(2003, 902, '旅游', 90, '2020-01-01 7:00:00')
  ,(2004, 902, '美女', 90, '2020-01-01 8:00:00');
复制
输出:
901|2021-09|0.500|1
901|2021-10|0.250|2
解法1
select author,substring(start_time,1,7) month,
        round(sum(case when if_follow =1 then 1
                 when if_follow =2 then -1
                 else 0 end ) /count(t1.author),3) as fans_growth_rate,
        sum(sum(case when if_follow =1 then 1
                     when if_follow =2 then -1
                     else 0 end )) over (partition by author order by substring(start_time,1,7)) fans_total
from tb_video_info t1
left join tb_user_video_log t2 on t1.video_id = t2.video_id
where substring(start_time,1,4) = 2021
and substring(end_time,1,4) = 2021
group by author,month
order by author,fans_total;
解法2
SELECT B.AUTHOR AS AUTHOR,DATE_FORMAT(A.start_time,'%Y-%m') AS MONTH ,
ROUND((COUNT(CASE WHEN A.if_follow=1 THEN 1 END ) - COUNT(CASE WHEN A.if_follow=2 THEN 1 END ))/COUNT(1) ,3) AS FANS_GROWTH_RATE,
sum(sum(case when A.if_follow = 1 then 1
         when A.if_follow = 2 then -1
         else 0 end) ) over (partition by B.author order by date_format(A.start_time,'%Y-%m')) fans_total
FROM tb_user_video_log A
LEFT JOIN tb_video_info B
ON A.VIDEO_ID=B.video_id
WHERE year(A.start_time)=2021
and year(A.end_time)=2021
GROUP BY B.AUTHOR,DATE_FORMAT(A.start_time,'%Y-%m')
ORDER BY AUTHOR,fans_total
涉及知识点

date_format(start_time,‘%Y-%m’)

select date_format('2021-09-01 10:00:00','%Y-%m')
结果为:
2021-09
select date_format('2021-09-01 10:00:00','%Y-%m-%d %H:%i:%s')
结果为
2021-09-01 10:00:00

substring前面说过

case when 语法

case when xxx then xx
	 when xxx then xxx
	 else xx end

over (partition by)

如果在partition结果上聚合,千万注意聚合函数是逐条累计运行结果的!

而在group by后的结果集上使用聚合函数,会作用在分组下的所有记录上。

因为题目的要求是截止10月的粉丝总数,即包括了9月的粉丝,如果我们用group by 必须要对月份进行分组,那么就只能统计每一个月内的粉丝总数,但如果我们使用开窗函数通过partition by authr进行开窗,再进行累积结果就不会出现上述问题

以下就是group by 与partition by的区别

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Ptx4DVIo-1670167877867)(E:\资料汇总\图片\605.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-X73kkKNl-1670167877868)(E:\资料汇总\图片\606.png)]

SQL160 国庆期间每类视频点赞量和转发量

描述

用户-视频互动表tb_user_video_log

iduidvideo_idstart_timeend_timeif_followif_likeif_retweetcomment_id
110120012021-09-24 10:00:002021-09-24 10:00:20110NULL
210520022021-09-25 11:00:002021-09-25 11:00:30001NULL
310220022021-09-25 11:00:002021-09-25 11:00:30111NULL
410120022021-09-26 11:00:002021-09-26 11:00:30101NULL
510120022021-09-27 11:00:002021-09-27 11:00:30110NULL
610220022021-09-28 11:00:002021-09-28 11:00:30101NULL
710320022021-09-29 11:00:002021-10-02 11:00:30101NULL
810220022021-09-30 11:00:002021-09-30 11:00:30111NULL
910120012021-10-01 10:00:002021-10-01 10:00:20110NULL
1010220012021-10-01 10:00:002021-10-01 10:00:15001NULL
1110320012021-10-01 11:00:502021-10-01 11:01:151101732526
1210620022021-10-02 10:59:052021-10-02 11:00:05201NULL
1310720022021-10-02 10:59:052021-10-02 11:00:05101NULL
1410820022021-10-02 10:59:052021-10-02 11:00:05111NULL
1510920022021-10-03 10:59:052021-10-03 11:00:05010NULL

(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)

短视频信息表tb_video_info

idvideo_idauthortagdurationrelease_time
12001901旅游302020-01-01 07:00:00
22002901旅游602021-01-01 07:00:00
32003902影视902020-01-01 07:00:00
42004902美女902020-01-01 08:00:00

(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)

问题:统计2021年国庆头3天每类视频每天的近一周总点赞量和一周内最大单天转发量,结果按视频类别降序、日期升序排序。假设数据库中数据足够多,至少每个类别下国庆头3天及之前一周的每天都有播放记录。

输出示例

示例数据的输出结果如下

tagdtsum_like_cnt_7dmax_retweet_cnt_7d
旅游2021-10-0152
旅游2021-10-0253
旅游2021-10-0363

解释:

由表tb_user_video_log里的数据可得只有旅游类视频的播放,2021年9月25到10月3日每天的点赞量和转发量如下:

tagdtlike_cntretweet_cnt
旅游2021-09-2512
旅游2021-09-2601
旅游2021-09-2710
旅游2021-09-2801
旅游2021-09-2901
旅游2021-09-3011
旅游2021-10-0121
旅游2021-10-0213
旅游2021-10-0310

因此国庆头3天(10.0110.03)里10.01的近7天(9.2510.01)总点赞量为5次,单天最大转发量为2次(9月25那天最大);同理可得10.02和10.03的两个指标。

数据示例

输入:

DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    video_id INT NOT NULL COMMENT '视频ID',
    start_time datetime COMMENT '开始观看时间',
    end_time datetime COMMENT '结束观看时间',
    if_follow TINYINT COMMENT '是否关注',
    if_like TINYINT COMMENT '是否点赞',
    if_retweet TINYINT COMMENT '是否转发',
    comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE tb_video_info (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    video_id INT UNIQUE NOT NULL COMMENT '视频ID',
    author INT NOT NULL COMMENT '创作者ID',
    tag VARCHAR(16) NOT NULL COMMENT '类别标签',
    duration INT NOT NULL COMMENT '视频时长(秒数)',
    release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
   (101, 2001, '2021-09-24 10:00:00', '2021-09-24 10:00:20', 1, 1, 0, null)
  ,(105, 2002, '2021-09-25 11:00:00', '2021-09-25 11:00:30', 0, 0, 1, null)
  ,(102, 2002, '2021-09-25 11:00:00', '2021-09-25 11:00:30', 1, 1, 1, null)
  ,(101, 2002, '2021-09-26 11:00:00', '2021-09-26 11:00:30', 1, 0, 1, null)
  ,(101, 2002, '2021-09-27 11:00:00', '2021-09-27 11:00:30', 1, 1, 0, null)
  ,(102, 2002, '2021-09-28 11:00:00', '2021-09-28 11:00:30', 1, 0, 1, null)
  ,(103, 2002, '2021-09-29 11:00:00', '2021-09-29 11:00:30', 1, 0, 1, null)
  ,(102, 2002, '2021-09-30 11:00:00', '2021-09-30 11:00:30', 1, 1, 1, null)
  ,(101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:20', 1, 1, 0, null)
  ,(102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:15', 0, 0, 1, null)
  ,(103, 2001, '2021-10-01 11:00:50', '2021-10-01 11:01:15', 1, 1, 0, 1732526)
  ,(106, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 2, 0, 1, null)
  ,(107, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 1, 0, 1, null)
  ,(108, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 1, 1, 1, null)
  ,(109, 2002, '2021-10-03 10:59:05', '2021-10-03 11:00:05', 0, 1, 0, null);

INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
   (2001, 901, '旅游', 30, '2020-01-01 7:00:00')
  ,(2002, 901, '旅游', 60, '2021-01-01 7:00:00')
  ,(2003, 902, '影视', 90, '2020-01-01 7:00:00')
  ,(2004, 902, '美女', 90, '2020-01-01 8:00:00');

输出:

旅游|2021-10-01|5|2
旅游|2021-10-02|5|3
旅游|2021-10-03|6|3
解法1
select
  *
    from (
             select tag,
                    time,
                    SUM(like_cnt) OVER w    sum_like_cnt_7d,
                    MAX(retweet_cnt) OVER w sum_retweet_cnt_7d
             from (
                      select tag,
                             date_format(start_time, '%Y-%m-%d') time,
                             sum(if_like)                        like_cnt,
                             sum(if_retweet)                     retweet_cnt
                      from tb_video_info t1
                               left join tb_user_video_log t2 on t1.video_id = t2.video_id
                      where date_format(start_time, '%Y-%m-%d') between '2021-09-25' and '2021-10-03'
                      group by tag, date_format(start_time, '%Y-%m-%d')
                  ) a1
                 WINDOW w AS (PARTITION BY tag ORDER BY time DESC ROWS BETWEEN CURRENT ROW AND 6 FOLLOWING)
         )a2
where time between '2021-10-01' and '2021-10-03'
group by tag,time
order by tag desc,time asc ;
解法2
select t2.*
from (select t1.tag,t1.d
      ,sum(t1.if_like_sum)over(partition by t1.tag order by t1.d rows 6 preceding)
      ,max(t1.if_retweet_sum)over(partition by t1.tag order by t1.d rows 6 preceding)
      from (select tag,date(start_time) d
            ,sum(if_like) if_like_sum
            ,sum(if_retweet) if_retweet_sum
            from tb_user_video_log tvl,tb_video_info tvi
            where tvl.video_id=tvi.video_id
            group by tag,d) as t1
     ) as t2
where t2.d between '2021-10-01' and '2021-10-03'
order by t2.tag desc,t2.d
涉及知识点

​ **1. partition by xx order by xxx rows 6 preceding **

​ 按照xx进行分区 xxx进行排序,preceding代表当前order by的日期往前提6个,即提出一周的总和点赞量

2. window w as (同1)

​ 总体开一个窗口,当我们需要用这个窗口的时候,over w 就可以进行开窗,最后采用相应的聚合函数就ok

3. DATE

select DATE('2020-11-11 11;12;13')
结果为2020-11-11

​ 其有点类似与DATE_FORMAT(time,‘%Y-%m-%d’) 相同,都是提取完整时间戳的年月日

SQL161 近一个月发布的视频中热度最高的top3视频

描述

现有用户-视频互动表tb_user_video_log

iduidvideo_idstart_timeend_timeif_followif_likeif_retweetcomment_id
110120012021-09-24 10:00:002021-09-24 10:00:30111NULL
210120012021-10-01 10:00:002021-10-01 10:00:31110NULL
310220012021-10-01 10:00:002021-10-01 10:00:35001NULL
410320012021-10-03 11:00:502021-10-03 10:00:351101732526
510620022021-10-02 11:00:052021-10-02 11:01:04201NULL
610720022021-10-02 10:59:052021-10-02 11:00:06100NULL
710820022021-10-02 10:59:052021-10-02 11:00:05111NULL
810920022021-10-03 10:59:052021-10-03 11:00:01010NULL
910520022021-09-25 11:00:002021-09-25 11:00:30101NULL
1010120032021-09-26 11:00:002021-09-26 11:00:30100NULL
1110120032021-09-30 11:00:002021-09-30 11:00:30110NULL

(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)

短视频信息表tb_video_info

idvideo_idauthortagdurationrelease_time
12001901旅游302021-09-05 07:00:00
22002901旅游602021-09-05 07:00:00
32003902影视902021-09-05 07:00:00
42004902影视902021-09-05 08:00:00

(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)

问题:找出近一个月发布的视频中热度最高的top3视频。

  • 热度=(a视频完播率+b点赞数+c评论数+d转发数)*新鲜度;
  • 新鲜度=1/(最近无播放天数+1);
  • 当前配置的参数a,b,c,d分别为100、5、3、2。
  • 最近播放日期以end_time-结束观看时间为准,假设为T,则最近一个月按[T-29, T]闭区间统计。
  • 结果中热度保留为整数,并按热度降序排序。

输出示例

示例数据的输出结果如下

video_idhot_index
2001122
200256
20031

解释:

最近播放日期为2021-10-03,记作当天日期;近一个月(2021-09-04及之后)发布的视频有2001、2002、2003、2004,不过2004暂时还没有播放记录;

视频2001完播率1.0(被播放次数4次,完成播放4次),被点赞3次,评论1次,转发2次,最近无播放天数为0,因此热度为:(1001.0+53+31+22)/(0+1)=122

同理,视频2003完播率0,被点赞数1,评论和转发均为0,最近无播放天数为3,因此热度为:(1000+51+30+20)/(3+1)=1(1.2保留为整数)。

数据示例

输入:

DROP TABLE IF EXISTS tb_user_video_log, tb_video_info;
CREATE TABLE tb_user_video_log (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    uid INT NOT NULL COMMENT '用户ID',
    video_id INT NOT NULL COMMENT '视频ID',
    start_time datetime COMMENT '开始观看时间',
    end_time datetime COMMENT '结束观看时间',
    if_follow TINYINT COMMENT '是否关注',
    if_like TINYINT COMMENT '是否点赞',
    if_retweet TINYINT COMMENT '是否转发',
    comment_id INT COMMENT '评论ID'
) CHARACTER SET utf8 COLLATE utf8_bin;

CREATE TABLE tb_video_info (
    id INT PRIMARY KEY AUTO_INCREMENT COMMENT '自增ID',
    video_id INT UNIQUE NOT NULL COMMENT '视频ID',
    author INT NOT NULL COMMENT '创作者ID',
    tag VARCHAR(16) NOT NULL COMMENT '类别标签',
    duration INT NOT NULL COMMENT '视频时长(秒数)',
    release_time datetime NOT NULL COMMENT '发布时间'
)CHARACTER SET utf8 COLLATE utf8_bin;

INSERT INTO tb_user_video_log(uid, video_id, start_time, end_time, if_follow, if_like, if_retweet, comment_id) VALUES
   (101, 2001, '2021-09-24 10:00:00', '2021-09-24 10:00:30', 1, 1, 1, null)
  ,(101, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:31', 1, 1, 0, null)
  ,(102, 2001, '2021-10-01 10:00:00', '2021-10-01 10:00:35', 0, 0, 1, null)
  ,(103, 2001, '2021-10-03 11:00:50', '2021-10-03 11:01:35', 1, 1, 0, 1732526)
  ,(106, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:04', 2, 0, 1, null)
  ,(107, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:06', 1, 0, 0, null)
  ,(108, 2002, '2021-10-02 10:59:05', '2021-10-02 11:00:05', 1, 1, 1, null)
  ,(109, 2002, '2021-10-03 10:59:05', '2021-10-03 11:00:01', 0, 1, 0, null)
  ,(105, 2002, '2021-09-25 11:00:00', '2021-09-25 11:00:30', 1, 0, 1, null)
  ,(101, 2003, '2021-09-26 11:00:00', '2021-09-26 11:00:30', 1, 0, 0, null)
  ,(101, 2003, '2021-09-30 11:00:00', '2021-09-30 11:00:30', 1, 1, 0, null);

INSERT INTO tb_video_info(video_id, author, tag, duration, release_time) VALUES
   (2001, 901, '旅游', 30, '2021-09-05 7:00:00')
  ,(2002, 901, '旅游', 60, '2021-09-05 7:00:00')
  ,(2003, 902, '影视', 90, '2021-09-05 7:00:00')
  ,(2004, 902, '影视', 90, '2021-09-05 8:00:00');

复制

输出:

2001|122
2002|56
2003|1
解法1
select
       a1.video_id,
       round((wanbo_rate*100+like_sum*5+comment_sum*3+retweet_sum*2)/(fresh+1),0) hot_index
from (    select t1.video_id,
               sum(if(TIMESTAMPDIFF(SECOND,start_time,end_time)>=duration,1,0))/count(t1.video_id) wanbo_rate,
               sum(t1.if_like) like_sum,
               sum(if(t1.comment_id is not null,1,0)) comment_sum,
               sum(t1.if_retweet) retweet_sum,
               if(count(t1.video_id)=0,datediff(date((select max(end_time) from tb_user_video_log)),date(t2.release_time)),
                   datediff(date((select max(end_time) from tb_user_video_log)),max(date(t1.end_time)))) fresh
        from test_1.tb_user_video_log t1
            left join tb_video_info t2 on t1.video_id = t2.video_id
        where datediff(date((select max(end_time) from tb_user_video_log)),date(t2.release_time)) <=29
        group by t1.video_id) a1
order by hot_index desc
limit 3
解法2
SELECT video_id,
    ROUND((100 * comp_play_rate + 5 * like_cnt + 3 * comment_cnt + 2 * retweet_cnt)
        / (TIMESTAMPDIFF(DAY, recently_end_date, cur_date) + 1), 0) as hot_index
FROM (
    SELECT video_id,
        AVG(IF(
            TIMESTAMPDIFF(SECOND, start_time, end_time)>=duration, 1, 0
        )) as comp_play_rate,
        SUM(if_like) as like_cnt,
        COUNT(comment_id) as comment_cnt,
        SUM(if_retweet) as retweet_cnt,
        MAX(DATE(end_time)) as recently_end_date,  -- 最近被播放日期
        MAX(DATE(release_time)) as release_date,  -- 发布日期
        MAX(cur_date) as cur_date  -- 非分组列,加MAX避免语法错误
    FROM tb_user_video_log
    JOIN tb_video_info USING(video_id)
    LEFT JOIN (
        SELECT MAX(DATE(end_time)) as cur_date FROM tb_user_video_log
    ) as t_max_date ON 1
    GROUP BY video_id
    HAVING TIMESTAMPDIFF(DAY, release_date, cur_date) < 30
) as t_video_info
ORDER BY hot_index DESC
LIMIT 3;
涉及知识点

在 一开始对评论的统计数sum(),我采用的是!=null,但是并不对,得到的结果是119,上网查询后如下

​ 在SQL中,NULL是一种特有的数据类型,其等价于没知有任何值、是未知数。NULL与0、空道字符串、空格都不同。SQL默认情况下对WHERE XX!= Null的判断会永远返回0行,却不会提示语法错误。内容

​ 非ANSI SQL标准中data=NULL等同于data IS NULL,data<>NULL等同于data IS NOT NULL。
所以:默认情况下做比较条件时使用关键字“is null”和“is not null”。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值