大数据SQL题28 查询有新注册用户的当天的新用户数量、新用户的第一天留存率

原题链接:http://practice.atguigu.cn/#/question/28/desc?qType=SQL

题目需求

从用户登录明细表(user_login_detail)中首次登录算作当天新增,第二天也登录了算作一日留存

期望结果如下:

first_login(注册时间)register(新增用户数)retention<decimal(16,2)>(留存率)
2021-09-2110.00
2021-09-2210.00
2021-09-2310.00
2021-09-2410.00
2021-09-2510.00
2021-09-2610.00
2021-09-2710.00
2021-10-0420.50
2021-10-0610.00

需要用到的表:

用户登录明细表:user_login_detail

user_id(用户id)ip_address(ip地址)login_ts(登录时间)logout_ts(登出时间)
101180.149.130.1612021-09-21 08:00:002021-09-27 08:30:00
102120.245.11.22021-09-22 09:00:002021-09-27 09:30:00
10327.184.97.32021-09-23 10:00:002021-09-27 10:30:00

解题思路

本题与第05题类似,由于该题需要统计的是每天新用户数量、新用户的第一天留存率,因此可以简单的把每个用户的首次登录日期查询出来,随后把所有登录日期与首单日期进行作差对比,差值=1则说明存在1日留存。1和2是该思路下的解法:

1.笛卡尔积计算

SELECT  t1.first_login,
        COUNT(DISTINCT t1.user_id)                                                   AS register,
        cast(COUNT(DISTINCT t2.user_id)/COUNT(DISTINCT t1.user_id) AS decimal(16,2)) AS retention
FROM
(
	SELECT  user_id,
	        MIN(date(login_ts)) AS first_login
	FROM user_login_detail
	GROUP BY  user_id
) t1
LEFT JOIN
(
	SELECT  user_id,
	        date(login_ts) AS login_date
	FROM user_login_detail
	GROUP BY  user_id,
	          date(login_ts)
) t2
ON t1.user_id = t2.user_id AND DATEDIFF(t2.login_date, t1.first_login) = 1
GROUP BY  t1.first_login

2.开窗取出首单日期对所有记录作差

SELECT  first_login,
        COUNT(DISTINCT IF(login_date = first_login,user_id,NULL))                                                             AS register,
        cast(COUNT(DISTINCT IF(DATEDIFF(login_date,first_login) = 1,user_id,NULL)) /COUNT(DISTINCT user_id) AS decimal(16,2)) AS retention
FROM
(
	SELECT  user_id,
	        login_date,
	        MIN(login_date) OVER (PARTITION BY user_id ORDER BY  login_date) AS first_login -- first_value(create_date)也可以
	FROM
	(
		SELECT  user_id,
		        date(login_ts) AS login_date
		FROM user_login_detail
		GROUP BY  user_id,
		          date(login_ts)
	) t1
) t2
GROUP BY  first_login

除常规思路外,本题实质上是连续区间/留存的问题,因此可以使用3、4两种连续区间的处理方法。

3.lead()/lag()开窗取前后n条并作差

可以取出每个用户的第一条记录,使用lead()向后取第2条,也可以取出每个用户的第二条记录,使用lag()向前取第一条,若第1、2两次登录是连续两天,则日期差值为1。

SELECT  login_date                                                                                                          AS first_login,
        COUNT(DISTINCT user_id)                                                                                             AS register,
        cast(COUNT(DISTINCT IF(DATEDIFF(next_date,login_date) = 1,user_id,NULL)) /COUNT(DISTINCT user_id) AS decimal(16,2)) AS retention
FROM
(
	SELECT  user_id,
	        login_date,
	        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY  login_date ASC)                   AS rn,
	        lead(login_date,1,'9999-12-31') OVER (PARTITION BY user_id ORDER BY login_date ASC) AS next_date
	FROM
	(
		SELECT  user_id,
		        date(login_ts) AS login_date
		FROM user_login_detail
		GROUP BY  user_id,
		          date(login_ts)
	) t1
) t2
WHERE rn = 1
GROUP BY  login_date

4.row_number()开窗做标记取出连续区间

通过row_number()函数可以得到每个用户每个登录日期的次序号,以此为偏移量对登录日期进行处理,得到一个基准日期flag,若存在连续日期的情况,则基准日期会相同,可以通过flag分组内记录条数判断是否存在连续登录行为。

在本题中,由于需要考虑注册时间,因此还需要将基准日期和注册日期做对比。

SELECT  t1.first_login,
        COUNT(DISTINCT t1.user_id)                                                   AS register,
        cast(COUNT(DISTINCT t4.user_id)/COUNT(DISTINCT t1.user_id) AS decimal(16,2)) AS retention
FROM
(
	SELECT  user_id,
	        MIN(date(login_ts)) AS first_login
	FROM user_login_detail
	GROUP BY  user_id
) t1
LEFT JOIN
(
	SELECT  user_id,
	        DATE_SUB(login_date,rn - 1) AS flag
	FROM
	(
		SELECT  user_id,
		        login_date,
		        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY  login_date ASC) AS rn
		FROM
		(
			SELECT  user_id,
			        date(login_ts) AS login_date
			FROM user_login_detail
			GROUP BY  user_id,
			          date(login_ts)
		) t2
	) t3
	GROUP BY  user_id,
	          DATE_SUB(login_date,rn - 1)
	HAVING COUNT(1) >= 2
) t4
ON t1.user_id = t4.user_id AND t1.first_login = t4.flag
GROUP BY  t1.first_login

5.row_number()开窗取出前两次登录日期并作差

解法5是对解法4的简化,可以直接限制每个用户的前n条下单记录,随后将第1条和第n条作差,假如差值为n-1,则说明用户自注册日期开始连续n天存在登录行为。

SELECT  first_date                                                                                                           AS first_login,
        COUNT(DISTINCT user_id)                                                                                              AS register,
        cast(COUNT(DISTINCT IF(DATEDIFF(second_date,first_date) = 1,user_id,NULL))/COUNT(DISTINCT user_id) AS decimal(16,2)) AS retention
FROM
(
	SELECT  user_id,
	        MIN(login_date) AS first_date,
	        MAX(login_date) AS second_date
	FROM
	(
		SELECT  user_id,
		        login_date,
		        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY  login_date ASC) AS rn
		FROM
		(
			SELECT  user_id,
			        date(login_ts) AS login_date
			FROM user_login_detail
			GROUP BY  user_id,
			          date(login_ts)
		) t1
	) t2
	WHERE rn <= 2
	GROUP BY  user_id
) t3
GROUP BY  first_date
  • 17
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值