最近面试的时候被面试官问到“近30天用户连续登陆的最大天数”的SQL,没有答上来,在网上看到其他人的答案后醍醐灌顶,解法竟然如此美妙,分享给大家。
假设我们有一张用户登录日志表login,有用户(user_id)和登陆日期(login_date)两列数据如下:
user_id | login_date |
---|---|
A | 2022-05-01 |
A | 2022-05-02 |
A | 2022-05-03 |
A | 2022-05-10 |
A | 2022-05-11 |
B | 2022-05-01 |
B | 2022-05-02 |
C | 2022-05-07 |
C | 2022-05-09 |
我们先对每个用户按访问时间进行排序生成login_rank,然后将login_rank列和login_date列做差,如果是连续登陆,这个差值会是固定的。按用户和差值进行分组计数再取最大值,就得到了每个用户连续登陆的天数的最大值。
user_id | login_date | login_rank |
---|---|---|
A | 2022-05-01 | 1 |
A | 2022-05-02 | 2 |
A | 2022-05-03 | 3 |
A | 2022-05-10 | 4 |
A | 2022-05-11 | 5 |
B | 2022-05-01 | 1 |
B | 2022-05-02 | 2 |
C | 2022-05-07 | 1 |
C | 2022-05-09 | 2 |
select
user_id
,max(days) as max_days
from
(
select
user_id
,diff
,count(distinct login_date) as days
from
(
select
user_id
,login_date
,row_number() over(partition by user_id order by login_date) as login_rank
,row_number() over(partition by user_id order by login_date)-login_date as diff
from login
where datediff(current_date,login_date)<=30
)a
group by
user_id
,diff
)b
group by
user_id