计算连续登陆最大天数
原始数据:
u0001 2019-10-10
u0001 2019-10-11
u0001 2019-10-12
u0001 2019-10-14
u0001 2019-10-15
u0001 2019-10-17
u0001 2019-10-18
u0001 2019-10-19
u0001 2019-10-20
u0002 2019-10-20
说明:数据是简化版,两列分别是user_id,log_in_date。现实情况需要从采集数据经过去重,转换得到以上形式数据
算法:
核心是按访问时间排序,登陆时间列减去排序后的序列号,得到一个日期值,按这个值分组计数即可。
Step1:排序
select
user_id,
log_in_date,
row_number() over(partitioned by user_id order by log_in_date desc) as rank
from
user_log
结果:
u0001 2019-10-10 1
u0001 2019-10-11 2
u0001 2019-10-12 3
u0001 2019-10-14 4
u0001 2019-10-15 5
u0001 2019-10-17 6
u0001 2019-10-18 7
u0001 2019-10-19 8
u0001 2019-10-20 9
u0002 2019-10-20 1
Step2 :第二列与第三列做日期差值
select
user_id
, date_sub(log_in_date, rank) dts
from
(
select
user_id
, log_in_date
, row_number() over(
partitioned by user_id
order by
log_in_date desc
) as rank
from
user_log
) t
结果:
u0001 2019-10-09
u0001 2019-10-09
u0001 2019-10-09
u0001 2019-10-10
u0001 2019-10-10
u0001 2019-10-11
u0001 2019-10-11
u0001 2019-10-11
u0001 2019-10-11
u0002 2019-10-19
Step3:按第二列分组求和
select
user_id
, count(1) as num
from
(
select
user_id
, date_sub(log_in_date, rank) dts
from
(
select
user_id
, log_in_date
, row_number() over(
partitioned by user_id
order by
log_in_date desc
) as rank
from
user_log
) t
) a
group by dts
结果:
u0001 2019-10-09 3
u0001 2019-10-10 2
u0001 2019-10-11 4
u0002 2019-10-19 1
Step4:求最大次数
略