sql_day06(知乎问答最大连续回答问题天数大于等于3天的用户及其对应等级)

描述:知乎问答最大连续回答问题天数大于等于3天的用户及对应等级

现有某乎问答创作者信息表author_tb如下(其中author_id表示创作者编号、author_level表示创作者级别,共1-6六个级别、sex表示创作者性别):

author_id   

author_level

sex

101

6

m

102

1

f

103

1

m

104

3

m

105

4

f

106

2

f

107

2

m

108

5

f

109

6

f

110

5

m

创作者回答情况表answer_tb如下(其中answer_date表示创作日期、author_id指创作者编号、issue_id指回答问题编号、char_len表示回答字数):

请你统计最大连续回答问题的天数大于等于3天的用户及其等级(若有多条符合条件的数据,按author_id升序排序),以上例子的输出结果如下

author_id

author_level

days_cnt

101

6

3

 数据准备:

drop database if exists db_1;
create database db_1;
use db_1;

drop table if exists author_tb;
CREATE TABLE author_tb
(
    author_id    int(10) NOT NULL,
    author_level int(10) NOT NULL,
    sex          char(10) NOT NULL
);
INSERT INTO author_tb
VALUES
    (101, 6, 'm'),
    (102, 1, 'f'),
    (103, 1, 'm'),
    (104, 3, 'm'),
    (105, 4, 'f'),
    (106, 2, 'f'),
    (107, 2, 'm'),
    (108, 5, 'f'),
    (109, 6, 'f'),
    (110, 5, 'm');

drop table if exists answer_tb;
CREATE TABLE answer_tb
(
    answer_date date     NOT NULL,
    author_id   int(10) NOT NULL,
    issue_id    char(10) NOT NULL,
    char_len    int(10) NOT NULL
);

INSERT INTO answer_tb
VALUES
    ('2021-11-1', 101, 'E001', 150),
    ('2021-11-1', 101, 'E002', 200),
    ('2021-11-1', 102, 'C003', 50),
    ('2021-11-1', 103, 'P001', 35),
    ('2021-11-1', 104, 'C003', 120),
    ('2021-11-1', 105, 'P001', 125),
    ('2021-11-1', 102, 'P002', 105),
    ('2021-11-2', 101, 'P001', 201),
    ('2021-11-2', 110, 'C002', 200),
    ('2021-11-2', 110, 'C001', 225),
    ('2021-11-2', 110, 'C002', 220),
    ('2021-11-3', 101, 'C002', 180),
    ('2021-11-4', 109, 'E003', 130),
    ('2021-11-4', 109, 'E001', 123),
    ('2021-11-5', 108, 'C001', 160),
    ('2021-11-5', 108, 'C002', 120),
    ('2021-11-5', 110, 'P001', 180),
    ('2021-11-5', 106, 'P002', 45),
    ('2021-11-5', 107, 'E003', 56),
    ('2021-11-7', 101, 'C002', 200),
    ('2021-11-8', 101, 'C001', 225);

 分析:

①通过读题发现我们主要用answer_tb表,分析该表发现有重复数据(同一用户在同一天可能回答多次),那么第一步可以先去重,同时方便查看,我们只取author_id,answer_date列

这里提供三种去重方法:

1.1

select distinct answer_date, author_id from answer_tb

1.2

select
    answer_date, author_id
from answer_tb
group by answer_date, author_id
;

1.3

with t1 as (
    select
        answer_date, author_id, issue_id, char_len,
        row_number() over (partition by answer_date, author_id) as rn
    from answer_tb
)
select * from t1
where rn=1
;

怎样判断是回答连续?此处很巧妙,我们需要根据author_id分组,根据answe_date排序,求排名,利用answer_date和这个排名做差(利用等差数列相关知识),如果得出的日期是相同的那么就是连续的日期,如下图

with t1 as(
select  answer_date, author_id,
                row_number() over (partition by author_id order by answer_date)r1
from answer_tb
group by author_id, answer_date)
select *,date_sub(answer_date,interval r1 day) r2 from t1 

③计算各用户连续回答天数:根据author_id和r2分组,利用count()求出用户及连续登录天数

select author_id,count(r2)r4 from (with t1 as(
select answer_date, author_id,
                row_number() over (partition by author_id order by answer_date)r1
from answer_tb
group by author_id, answer_date)
select *,date_sub(answer_date,interval r1 day) r2 from t1 )t3
group by r2,author_id

④对结果进行完善,过滤数据,通过author_id分组,排序,取连续回答天数大于等于三的,用max()取最大的连续回答天数,通过连接等级表,查出author_id对应的等级。

select author_id,
       (select author_level
        from author_tb
        where author_id = t5.author_id) as author_level,
       max(r4)                             days_cnt
from (select author_id, count(r2) r4
      from (with t1 as
                     (select answer_date,
                             author_id,
                             row_number() over (partition by author_id order by answer_date) r1
                      from answer_tb
                      group by author_id, answer_date)
            select *, date_sub(answer_date, interval r1 day) r2
            from t1) t3
      group by r2, author_id
      having r4 >= 3) t5
group by author_id
order by author_id;

 代码:

#美化整理后

# 去重
with t1 as (
    select distinct answer_date, author_id from answer_tb
)
# 排名 根据人分组, 再根据日期 排名
, t2 as (
    select
        author_id, answer_date,
        row_number() over (partition by author_id order by answer_date) as r1
    from t1
)
# 求差值 = 回复日期 - 排名
, t3 as (
    select
        author_id, answer_date, r1,
 # date_sub 求日期 减 N天: date_sub(日期, 单位) 如:date_sub('2024-08-01', interval 1 day)
        date_sub(answer_date, interval r1 day) as r2
    from t2
)
,
# 求连续登录的天数
# 根据 作者编号, 日期差 分组
# 求行数 即 连续登录的天数
t4 as (
    select
        author_id, r2,
        count(r2) days
    from t3
    group by author_id, r2
)
,
# 求每个人连续登录的最大天数
t5 as (
    select
        author_id,
        max(days) days_cnt
    from t4
    where days>=3
    group by author_id
    order by author_id
)
select
    author_id,
    (select author_level from author_tb a where a.author_id=t5.author_id) as author_level,
    days_cnt
from t5
;

 总结:

连续问题中连续天数可以使用date_sub(日期,internal n day)得出一个日期,如果同一id的这个日期相同就说明这几天是连续
  • 13
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值