题目
求解某一id对应的最大连续登录天数,允许间隔一天。例如id为1的同学在1月1日登录了,1月2日没有登录,1月3日登录了,则认为其是连续登录了3天。
数据
create table if not exists login(id int,dt varchar(20));
insert into login values(1,'2024-01-01'),
(1,'2024-01-02'),
(1,'2024-01-04'),
(2,'2024-01-02'),
(2,'2024-01-03'),
(1,'2024-01-07'),
(1,'2024-01-08'),
(1,'2024-01-10'),
(1,'2024-01-11'),
(3,'2024-01-01'),
(3,'2024-01-03'),
(4,'2024-01-12');
在本示例数据下,id为1的用户最大连续登录天数应该为5天(2024-01-07到2024-01-11)。
解决思路
在解决标准的连续n问题时,使用日期减去排序后的顺序值即可。
1、 排序, row_number() over(partition by id order by dt) as rn
2、 分组,使用 id 和 date_sub(dt,interval rn day)
但是在解决允许间隔一天的连续n问题场景就不适用了。在解决问题前,首先对于数据进行排序。
select id,
dt,
row_number() over(partition by id order by dt) as rn
from login
分析排序后的数据,可以先照用传统连续n问题求解思路,将dt与rn进行差值计算,得到新的时间记为sub1.
with t1 as (
select id,
dt,
row_number() over(partition by id order by dt) as rn
from login
)
# select * from t1;
,
t2 as (
select id,
dt,
rn,
date_sub(dt,interval rn day) as sub1
from t1
)
select * from t2;
观察后不难发现,红框中的两部分数据应该是连续的,这里可以考虑进行二次排序,但是不适合选用row_number,而应该选用dense_rank,再将sub1与二次排序后的值做时间差,得到的日期sub2就可以作为group by的条件了。
with t1 as (
select id,
dt,
row_number() over(partition by id order by dt) as rn
from login
)
# select * from t1;
,
t2 as (
select id,
dt,
rn,
date_sub(dt,interval rn day) as sub1
from t1
)
# select * from t2;
,
t3 as (
select id,
dt,
rn,
sub1,
dense_rank() over(partition by id order by sub1) as rk
from t2
),
t4 as (
select id,
dt,
rn,
sub1,
rk,
date_sub(sub1,interval rk day) as sub2
from t3
)
select * from t4;
至此,根据 id 和sub2 进行分组即可得到符合题意得窗口。要计算连续登录天数,取出窗口中最大日期与最小日期,做差即可。同一id可能存在多个连续登录得窗口,按id分组选最大登录天数即可。完整代码如下
with t1 as (
select id,
dt,
row_number() over(partition by id order by dt) as rn
from login
)
# select * from t1;
,
t2 as (
select id,
dt,
rn,
date_sub(dt,interval rn day) as sub1
from t1
)
# select * from t2;
,
t3 as (
select id,
dt,
rn,
sub1,
dense_rank() over(partition by id order by sub1) as rk
from t2
),
t4 as (
select id,
dt,
rn,
sub1,
rk,
date_sub(sub1,interval rk day) as sub2
from t3
)
# select * from t4;
,
t5 as (
select id,
datediff(max(dt),min(dt)) + 1 as days
from t4
group by id,sub2
)
select id,
max(days)
from t5
group by id;
结果展示如图: