用户连续登录最大天数–【MySQL以及pandas解题思路】
常见面试题:用户连续登录天数问题
常见解题思路:
- 首先对用户重复登录数据进行去重操作,
- 根据用户id进行分组,对登录日期排序,
- 计算登录日期与组内排序的差值,
- 统计得到的差值次数即为登录天数。
数据类型
字段名称 | 字段类型 |
---|---|
id | INT |
time | DATETIME |
MySQL【row_number】
select id, continuous_days
from (select id, continuous_days, row_number() over (partition by id order by continuous_days desc) rn2
# 选出登录天数最大值
from (select id, count(*) as continuous_days
# 统计差值次数【若统计登录天数>=N,使用 having continuous_days >=2】
from (select time
, id
, subdate(time, row_number() over (partition by id order by time )) as rn1
# 计算差值
from (select distinct date(time) as time
, id
# 删除重复数据
from continous) t1) t2
group by rn1, id) t3) t4
where rn2 = 1;
Python-pandas
# 去重去除掉每日重复登录数据
df.drop_duplicates(inplace=True)
# 对用户登录日期进行排序操作
df['rn1'] = df['date'].groupby(df['id']).rank()
# 计算登录日期与组内排序的差值
df['date_sub'] = df['date'] - pd.to_timedelta(df['rn1'], unit='d')
# 统计差值次数得到连续登录天数
data = df.groupby(['id', 'date_sub']).count().reset_index()
data = data[['id', 'date_sub', 'rn1']].rename(columns={'rn1': '连续登录天数'})
# 排序得到的第一个值即为最大登录天数
data = data.sort_values(by='连续登录天数', ascending=False).groupby('id').first().reset_index()
> data
【补充MySQL使用lead函数计算连续登录N天】
以下为使用lead函数计算连续登录三天的信息,使用lag函数同理。
select id,
time,
rn1,
# 计算的是两个日期之间的天数差
case when rn1 is not null then datediff(rn1, time) else null end as rn2
from (select id,
time,
# 获取每个日期的第三个后续日期
lead(time, 2, null) over (partition by id order by time) as rn1
from (select distinct date(time) as time,
id
from continous
group by id, DATE(time)) t1) t2
having rn2 = 2;