解题思路
利用等差数列的特性,先对用户登陆日期进行row number 排序,得到一个等差序列;使用登陆日期减去等差序列,如果是连续的登陆日期,得到的值是一样的;最后分组汇总,得到结果
数据准备:
user_id | login_date |
A | 2022-5-1 |
A | 2022-5-2 |
A | 2022-5-3 |
A | 2022-5-10 |
A | 2022-5-11 |
B | 2022-5-1 |
B | 2022-5-2 |
C | 2022-5-7 |
C | 2022-5-9 |
C | 2022-5-10 |
C | 2022-5-11 |
C | 2022-5-12 |
C | 2022-5-13 |
PS:实际工作中,用户登陆实际是精确到 .sss;需要提前对数据进行清洗加工;比如转为-dd,且进行分组;
1、对数据进行row_number 排序
select
user_id
,login_date
,row_number() over(partition by user_id order by cast(login_date as date)) as login_row
from dt.login
order by user_Id, cast(login_date as date)
--我这里login_date 为字符串,先进行类型转换
执行结果如下:
2、使用等差数列的特性,计算结果
select
user_id
,login_date
,row_number() over(partition by user_id order by cast(login_date as date)) as login_row
,date_sub(cast(login_date as date),row_number() over(partition by user_id order by cast(login_date as date))) as dt_sub
from dt.login
order by user_Id, cast(login_date as date)
执行结果如下:
3、根据用户,dt sub 分组汇总
select user_id, dt_sub, count(user_id) as cnt
from (
select
user_id
,login_date
,row_number() over(partition by user_id order by cast(login_date as date)) as login_row
,date_sub(cast(login_date as date),row_number() over(partition by user_id order by cast(login_date as date))) as dt_sub
from dt.login
-- order by user_Id, cast(login_date as date)
) z
group by user_id, dt_sub
执行结果如下:
最终结果
select user_id, max(cnt) as max_login_days
from (
select user_id, dt_sub, count(user_id) as cnt
from (
select
user_id
,login_date
,row_number() over(partition by user_id order by cast(login_date as date)) as login_row
,date_sub(cast(login_date as date),row_number() over(partition by user_id order by cast(login_date as date))) as dt_sub
from dt.login
-- order by user_Id, cast(login_date as date)
) z
group by user_id, dt_sub
) x
group by user_id
执行结果如下:
PS:问题延伸,如何得到最大连续登陆天数的日期区间