大数据SQL题43 间断连续登录用户问题

原题链接:http://practice.atguigu.cn/#/question/43/desc?qType=SQL

题目需求

现有各用户的登录记录表(login_events)如下,表中每行数据表达的信息是一个用户何时登录了平台。

user_idlogin_datetime
1002021-12-01 19:00:00
1002021-12-01 19:30:00
1002021-12-02 21:01:00

现要求统计各用户最长的连续登录天数,间断一天也算作连续,例如:一个用户在1,3,5,6登录,则视为连续6天登录。期望结果如下:

user_id(用户id)max_day_count(最大连续天数)
1003
1016
1023
1043
1051

解题思路

  1. 使用类似于42题的区间划分方法
SELECT  user_id,
        MAX(day_count) AS max_day_count
FROM
(
	SELECT  user_id,
	        DATEDIFF(MAX(login_date),MIN(login_date)) + 1 AS day_count
	FROM
	(
		SELECT  user_id,
		        login_date,
		        SUM(is_first) OVER (PARTITION BY user_id ORDER BY  login_date) AS interval_id
		FROM
		(
			SELECT  user_id,
			        login_date,
			        IF(DATEDIFF(login_date,last_login_date) > 2,1,0) AS is_first
			FROM
			(
				SELECT  user_id,
				        login_date,
				        LAG(login_date,1,'1970-01-01') OVER (PARTITION BY user_id ORDER BY  login_date) AS last_login_date
				FROM
				(
					SELECT  user_id,
					        date(login_datetime) AS login_date
					FROM login_events
					GROUP BY  user_id,
					          date(login_datetime)
				) t1
			) t2
		) t3
	) t4
	GROUP BY  user_id,
	          interval_id
) t5
GROUP BY  user_id
  1. 将符合条件的间断天数补齐
-- 将符合条件的间断天数补齐
WITH new_login AS
(
	SELECT  user_id,
	        new_login_date AS login_date
	FROM
	(
		SELECT  user_id,
		        login_date,
		        lead(login_date,1,'9999-12-31') OVER (PARTITION BY user_id ORDER BY  login_date) AS next_login_date
		FROM
		(
			SELECT  user_id,
			        date_format(login_datetime,'yyyy-MM-dd') login_date
			FROM login_events
			GROUP BY  user_id,
			          date_format(login_datetime,'yyyy-MM-dd') --按照用户和日期去重
		) t1
	) t2 LATERAL VIEW explode(IF(DATEDIFF(next_login_date, login_date) = 2, array(login_date, date_add(login_date, 1)), array(login_date))) tmp AS new_login_date
) 
-- 随后再使用其他连续天数划分的处理方法
SELECT  user_id,
        MAX(day_count) AS max_day_count
FROM
(
	SELECT  user_id,
	        DATEDIFF(MAX(login_date),MIN(login_date)) + 1 AS day_count
	FROM
	(
		SELECT  user_id,
		        login_date,
		        SUM(is_first) OVER (PARTITION BY user_id ORDER BY  login_date) AS interval_id
		FROM
		(
			SELECT  user_id,
			        login_date,
			        IF(DATEDIFF(login_date,last_login_date) > 1,1,0) AS is_first
			FROM
			(
				SELECT  user_id,
				        login_date,
				        LAG(login_date,1,'1970-01-01') OVER (PARTITION BY user_id ORDER BY  login_date) AS last_login_date
				FROM new_login
			) t1
		) t2
	) t3
	GROUP BY  user_id,
	          interval_id
) t4
GROUP BY  user_id
  • 6
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值