首先明确DAU和留存的定义
DAU即Daily Active User,每日活跃的用户,简称日活
留存通常指用户在某段时间内开始使用应用,经过一段时间后,仍然继续使用该应用的用户,也就是“有多少用户留下来了”。留存用户和留存率体现了应用的质量和保留用户的能力
明确了定义,那我们来尝试解决DAU和次日留存率的问题
运行环境:MySQL8.0版本、SQLyog
首先添加一些虚拟的数据
建表:
CREATE TABLE IF NOT EXISTS yhlc(
dt DATE NOT NULL,
id INT NOT NULL
);
插入数据:
INSERT INTO yhlc VALUES
('2020/8/1','1'),
('2020/8/2','1'),
('2020/8/3','1'),
('2020/8/4','1'),
('2020/8/6','1'),
('2020/8/7','1'),
('2020/8/8','1'),
('2020/8/10','1'),
('2020/8/11','1'),
('2020/8/14','1'),
('2020/8/1','2'),
('2020/8/3','2'),
('2020/8/5','2'),
('2020/8/7','2'),
('2020/8/8','2'),
('2020/8/11','2'),
('2020/8/12','2'),
('2020/8/14','2'),
('2020/8/1','3'),
('2020/8/3','3'),
('2020/8/5','3'),
('2020/8/6','3'),
('2020/8/7','3'),
('2020/8/9','3'),
('2020/8/10','3'),
('2020/8/11','3'),
('2020/8/12','3'),
('2020/8/13','3'),
('2020/8/1','4'),
('2020/8/7','4'),
('2020/8/11','4'),
('2020/8/14','4'),
('2020/8/1','5'),
('2020/8/5','5'),
('2020/8/6','5'),
('2020/8/8','5'),
('2020/8/10','5'),
('2020/8/11','5'),
('2020/8/14','5'),
('2020/8/1','6'),
('2020/8/6','6'),
('2020/8/14','6'),
('2020/8/1','7'),
('2020/8/2','7'),
('2020/8/3','7'),
('2020/8/4','7'),
('2020/8/5','7'),
('2020/8/6','7'),
('2020/8/7','7'),
('2020/8/8','7'),
('2020/8/9','7'),
('2020/8/10','7'),
('2020/8/11','7'),
('2020/8/12','7'),
('2020/8/13','7'),
('2020/8/14','7');
数据只有两列,dt代表登陆日期,id代表用户id,显示了7位用户在8.1到8.14的登录情况
1.计算DAU
用到count函数,按dt分组,计算去重id
SELECT dt,COUNT(DISTINCT id) AS c
FROM yhlc
GROUP BY dt;
2.计算次日的用户留存数
这里可以有两种写法
写法一:用到timestampdiff和count
SELECT t1dt,COUNT(DISTINCT id) AS c
FROM(
SELECT t1.id
,t1.dt AS t1dt
,t2.dt AS t2dt
FROM yhlc t1
JOIN yhlc t2
ON t1.id=t2.id
WHERE TIMESTAMPDIFF(DAY,t1.dt,t2.dt)=1
) t
GROUP BY t1dt;
写法二:用到timestampdiff和count、case when
SELECT t1dt,COUNT(DISTINCT CASE WHEN tsd=1 THEN id ELSE NULL END) AS c
FROM(
SELECT t1.id
,t1.dt AS t1dt
,t2.dt AS t2dt
,TIMESTAMPDIFF(DAY,t1.dt,t2.dt) AS tsd
FROM yhlc t1
JOIN yhlc t2
ON t1.id=t2.id
) t
GROUP BY t1dt;
两种写法的区别不大,写法二显示出了8.14的次日用户数,因为数据截至到8.14,因此8.14的次日用户数为0
而写法一在筛选条件“TIMESTAMPDIFF(DAY,t1.dt,t2.dt)=1”的筛选下,剔除了8.14这一行
3.计算次日留存率
接着上面的两种写法
写法一:
SELECT ttt.dt,c/ca
FROM(
SELECT t1dt
,COUNT(DISTINCT id) AS c
FROM(
SELECT t1.id
,t1.dt AS t1dt
,t2.dt AS t2dt
FROM yhlc t1
JOIN yhlc t2
ON t1.id=t2.id
WHERE TIMESTAMPDIFF(DAY,t1.dt,t2.dt)=1
) t
GROUP BY t1dt
) tt
JOIN(
SELECT dt,COUNT(DISTINCT id) AS ca
FROM yhlc
GROUP BY dt
) ttt
ON tt.t1dt=ttt.dt;
写法二:
SELECT t1dt
,COUNT(DISTINCT CASE WHEN tsd=1 THEN id ELSE NULL END)/COUNT(DISTINCT id) AS c
FROM(
SELECT t1.id
,t1.dt AS t1dt
,t2.dt AS t2dt
,TIMESTAMPDIFF(DAY,t1.dt,t2.dt) AS tsd
FROM yhlc t1
JOIN yhlc t2
ON t1.id=t2.id
) t
GROUP BY t1dt;
3日/7日/N日用户留存率同理
另外如果留存率希望以百分数形式呈现,则需要用到concat和round函数进行嵌套使用
写法一:CONCAT(ROUND((c/ca)*100,2),'%') AS '次日留存率'
写法二:CONCAT(ROUND((COUNT(DISTINCT CASE WHEN tsd=1 THEN id ELSE NULL END)
/COUNT(DISTINCT id))*100,2),'%') AS '次日留存率'写法一的效果
写法二的效果
完成!
如有错误欢迎指正喔!有更好的解法也可以在评论区交流探讨!谢谢!