一段sql完成一个用户留存率计算报表,这就是clickhouse的魅力

在这里插入图片描述

今天有个需求,需要完成这个报表功能,咱们先看看这个报表其实就是需要计算出用户不同时间段的用户留存率,何为留存率呢,先给大家科普一下,就是当天注册的用户,在接下来的时间段里面,还有继续登录,称为留存用户,留存率 = 继续登录的用户/注册的总用户数,结合图解释次留就是今天注册的这批用户,明天还有继续登录的,两者相除就是次留率,同理三留就是今天注册的用户,第三天还有登录的用户数,四留五留等等都是以此类推。知道怎么计算了之后,大致思路就是,先统计出当天注册的总人数,然后在计算这批注册用户每天登录的人数。so easy!

在这里插入图片描述

如果是一年前的我,那我肯定是写个定时任务脚本,每天统计出当天的注册人数,然后在后续每天统计出留存的用户数,最后把数据存到一张留存表,这样查询起来可以直接显示数据,快不少,但是这段时间我们开始使用了clickhouse,我们会在用户注册,登录时候,将数据用队列,落地到clickhouse一份,clickhouse字段比较多,就不全部显示出来了:

在这里插入图片描述

看到这里,估计就很多人都心里有了答案了,知道前面留存率怎么计算了,那我们只需要计算login_date 与register_date 相差的天数,那就是对应的留存数,直接上主要代码:
$keep_day = [1, 3, 4, 5, 6, 7, 14, 21, 28, 30, 60, 90, 120, 150, 180, 210, 240, 270, 
300, 330, 365];
$field = "count(distinct(username)) as register";
foreach ($keep_day as $v) {
         $field .= ",round(count(distinct if(dateDiff('day',register_date,login_date
         )={$v},username,null))/register,4)*100 as day{$v}";
}
$keep_data = DB::connection('stat')->select("select register_date,{$field} from 
raw_logins where {$where_sql} group by register_date order by register_date asc");

#####count(distinct(username)) as register 我们计算出当天注册的人数
#####username因为一天可能多次登录,所以要去重。
#####dateDiff(‘day’,date1,date2) = num dateDiff是clickhouse求时间差的函数,day代表date2与date1相差num 天,注意date2要大于date1,不然num会为负数
#####count(distinct if(cond,username,null)) 这里为啥这么写,因为cond代表条件,为true的时候获取username,否则为空,因为username要去重,所以这样我们就能计算出各个时间段的留存数了。

执行sql:

 - select register_date,count(distinct(username)) as
   register,round(count(distinct 
   if(dateDiff('day',register_date,login_date)=1,username,null))/register,4)*100
   as  day1,round(count(distinct
   if(dateDiff('day',register_date,login_date)=3,username,
   null))/register,4)*100 as day3,round(count(distinct
   if(dateDiff('day',register_date,
   login_date)=4,username,null))/register,4)*100 as
   day4,round(count(distinct 
   if(dateDiff('day',register_date,login_date)=5,username,null))/register,4)*100
   as day5,round(count(distinct
   if(dateDiff('day',register_date,login_date)=6,username,null))/register,4)*100
   as day6,round(count(distinct
   if(dateDiff('day',register_date,login_date)=7,username,null))/register,4)*100
   as day7,round(count(distinct
   if(dateDiff('day',register_date,login_date)=14,username,null))/register,4)*100
   as day14,round(count(distinct
   if(dateDiff('day',register_date,login_date)=21,username,null))/register,4)*100
   as day21,round(count(distinct
   if(dateDiff('day',register_date,login_date)=28,username,null))/register,4)*100
   as day28,round(count(distinct
   if(dateDiff('day',register_date,login_date)=30,username,null))/register,4)*100
   as day30,round(count(distinct
   if(dateDiff('day',register_date,login_date)=60,username,null))/register,4)*100
   as day60,round(count(distinct
   if(dateDiff('day',register_date,login_date)=90,username,null))/register,4)*100
   as day90,round(count(distinct
   if(dateDiff('day',register_date,login_date)=120,username,null))/register,4)*100
   as day120,round(count(distinct
   if(dateDiff('day',register_date,login_date)=150,username,null))/register,4)*100
   as day150,round(count(distinct
   if(dateDiff('day',register_date,login_date)=180,username,null))/register,4)*100
   as day180,round(count(distinct
   if(dateDiff('day',register_date,login_date)=210,username,null))/register,4)*100
   as day210,round(count(distinct
   if(dateDiff('day',register_date,login_date)=240,username,null))/register,4)*100
   as day240,round(count(distinct
   if(dateDiff('day',register_date,login_date)=270,username,null))/register,4)*100
   as day270,round(count(distinct
   if(dateDiff('day',register_date,login_date)=300,username,null))/register,4)*100
   as day300,round(count(distinct
   if(dateDiff('day',register_date,login_date)=330,username,null))/register,4)*100
   as day330,round(count(distinct
   if(dateDiff('day',register_date,login_date)=365,username,null))/register,4)*100
   as day365 from raw_logins where 1 and 1 and 1 and 1 and register_date
   >= '2022-03-16' and register_date <= '2022-08-19' group by register_date order by register_date asc

这样就一段sql,就将整个报表的数据全部计算出来了,相比之前用定时脚本去统计mysql,效率也高了不少。个人感觉还是蛮实用的,希望能对大家有帮助。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

大力水手z

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值