Hive 计算用户留存率
SQL数据分析 1周前
统计用户留存率,一次统计多个模块,多个日期的留存率;
之前用:SQL-用户月留存率 通过left join 不等式判定;
但是 hive 不支持 在 left join 中 不等式判定
so 通过另外一种方式实现 用户留存率:
代码如下:
with da_user as (
select
from_unixtime(unix_timestamp(ds,'yyyyMMdd'),'yyyy-MM-dd') as ds
,user_id
,regexp_extract(args,'project_id=(d+)',1) as project_id
from ods_view_ypp.ods_all_mobile_log log
where ds between '20190523' and '20190621'
and app_id = 100
and regexp_extract(args,'project_id=(d+)',1) = 1034
group by
from_unixtime(unix_timestamp(ds,'yyyyMMdd'),'yyyy-MM-dd')
,user_id
,regexp_extract(args,'project_id=(d+)',1)
)
select ds
,total_cnt
,concat_ws('% | ', cast(round(diff_1cnt*100/total_cnt, 2) as string), cast(diff_1cnt as string)) a
,concat_ws('% | ', cast(round(diff_2cnt*100/total_cnt, 2) as string), cast(diff_2cnt as string)) b
,concat_ws('% | ', cast(round(diff_3cnt*100/total_cnt, 2) as string), cast(diff_3cnt as string)) c
,concat_ws('% | ', cast(round(diff_4cnt*100/total_cnt, 2) as string), cast(diff_4cnt as string)) d
from(
select
t1.ds
,count(distinct t1.user_id) as total_cnt
,count(distinct if(datediff(t2.ds,t1.ds)=1,t1.user_id,null)) as diff_1cnt
,count(distinct if(datediff(t2.ds,t1.ds)=2,t1.user_id,null)) as diff_2cnt
,count(distinct if(datediff(t2.ds,t1.ds)=3,t1.user_id,null)) as diff_3cnt
,count(distinct if(datediff(t2.ds,t1.ds)=4,t1.user_id,null)) as diff_4cnt
from da_user t1
left join da_user t2
on (t1.user_id = t2.user_id )
group by t1.ds
)t
结果如下图所示:
参考地址:
[hive 关于用户留存率的计算 - chenpe32cp的博客 - CSDN博客](https://blog.csdn.net/chenpe32cp/article/details/85068184)
[【hive】关于用户留存率的计算 - zzhangyuhang - 博客园](https://www.cnblogs.com/zzhangyuhang/p/9884967.html)