hive中cast()_Hive 计算用户留存率

Hive 计算用户留存率
SQL数据分析 1周前
统计用户留存率,一次统计多个模块,多个日期的留存率;
之前用:SQL-用户月留存率 通过left join 不等式判定;
但是 hive 不支持 在 left join 中 不等式判定
so 通过另外一种方式实现 用户留存率:
代码如下:
with da_user as (
select
from_unixtime(unix_timestamp(ds,'yyyyMMdd'),'yyyy-MM-dd') as ds
,user_id
,regexp_extract(args,'project_id=(d+)',1) as project_id
from ods_view_ypp.ods_all_mobile_log log
where ds between '20190523' and '20190621'
and app_id = 100
and regexp_extract(args,'project_id=(d+)',1) = 1034
group by
from_unixtime(unix_timestamp(ds,'yyyyMMdd'),'yyyy-MM-dd')
,user_id
,regexp_extract(args,'project_id=(d+)',1)
)
select ds
,total_cnt
,concat_ws('% | ', cast(round(diff_1cnt*100/total_cnt, 2) as string), cast(diff_1cnt as string)) a
,concat_ws('% | ', cast(round(diff_2cnt*100/total_cnt, 2) as string), cast(diff_2cnt as string)) b
,concat_ws('% | ', cast(round(diff_3cnt*100/total_cnt, 2) as string), cast(diff_3cnt as string)) c
,concat_ws('% | ', cast(round(diff_4cnt*100/total_cnt, 2) as string), cast(diff_4cnt as string)) d
from(
select
t1.ds
,count(distinct t1.user_id) as total_cnt
,count(distinct if(datediff(t2.ds,t1.ds)=1,t1.user_id,null)) as diff_1cnt
,count(distinct if(datediff(t2.ds,t1.ds)=2,t1.user_id,null)) as diff_2cnt
,count(distinct if(datediff(t2.ds,t1.ds)=3,t1.user_id,null)) as diff_3cnt
,count(distinct if(datediff(t2.ds,t1.ds)=4,t1.user_id,null)) as diff_4cnt
from da_user t1
left join da_user t2
on (t1.user_id = t2.user_id )
group by t1.ds
)t
结果如下图所示:

01ade30eda67091e936897f79020fa20.png


参考地址:
[hive 关于用户留存率的计算 - chenpe32cp的博客 - CSDN博客](https://blog.csdn.net/chenpe32cp/article/details/85068184)
[【hive】关于用户留存率的计算 - zzhangyuhang - 博客园](https://www.cnblogs.com/zzhangyuhang/p/9884967.html)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值