题目:
这张表记录了用户id为1001和1002的登陆日期,
求:用户各自连续登陆的天数最大值?
思路:
- 类似求连续3天访问的用户id,
需要增加以 user_id开窗,计算时间的排名(如果时间有重复用DENSE_RANK
),将结果作为新的字段 rank - 用访问日期减去排名rank,得到一个时间
tmp
,如果用户是连续访问的,那么这个时间差 tmp就是一样的! - 对 tmp 进行 count 记数
- 找到最大的 count 即为答案。
(1)先按照用户id(usr_id)对访问日期(log_date)进行排名
SELECT user_id, log_date, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY log_date) AS rank
FROM user_logging_format;
(2) 得到排名后,用log_date 减去 rank序号!得到差值 tmp
,
tmp相同则代表是连续的!;
SELECT user_id,
log_date,
DATE_SUB(log_date,cast(rn AS INT) AS tmp
FROM(
SELECT user_id, log_date, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY log_date) AS rank FROM user_logging_format)as q1;
(3)根据user_id 和 tmp 联合分组并统计tmp,tmp相同的被统计,即得到连续的天数
SELECT
user_id,
tmp,
count(*) AS count
FROM
(SELECT user_id,
log_date,
DATE_SUB(log_date,cast(rn AS INT)) AS tmp
FROM(SELECT user_id, log_date, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY log_date) AS rank FROM user_logging_format)q1)q2 GROUP BY user_id, tmp;
(4) 最后求出每个user_id 连续登录的最大天数,即对相同tmp记数后的最大值;
SELECT user_id,MAX(cc)
FROM
(SELECT
user_id,
tmp,
count(*) AS cc
FROM
(SELECT user_id,
log_date,
DATE_SUB(log_date,cast(rn AS INT)) AS tmp
FROM(SELECT user_id, log_date, ROW_NUMBER() OVER(PARTITION BY user_id ORDER BY log_date) AS rank FROM user_logging_format)a1)q2 GROUP BY user_id, tmp)q3
GROUP BY user_id;
注意:如果原始数据有重复,则使用DENSE_RANK
而不是 ROW_NUMBER
;