使用hive计算用户留存率。
问题描述:留存率是一个常见的数据分析需求,即某日新增用户,计算这些用户在后5天内的获取数。
使用到的hive函数:
datediff:返回两个日期相差的天数,日期格式为 yyyy-MM-dd
unix_timestamp:将指定格式的字符串类型日期转换成时间戳
from_unixtime:将时间戳转换成指定格式的字符串日期
参数博客:https://blog.csdn.net/chenpe32cp/article/details/85068184
代码实现:
select
dt,
count(1) as total_new_user,
sum(rank_1) as rank_1,
sum(rank_2) as rank_2,
sum(rank_3) as rank_3,
sum(rank_4) as rank_4,
sum(rank_5) as rank_5
from
(select
a.device_id,
a.dt,
if(datediff(b.dt,a.dt)=1,1,0) as rank_1,
if(datediff(b.dt,a.dt)=2,1,0) as rank_2,
if(datediff(b.dt,a.dt)=3,1,0) as rank_3,
if(datediff(b.dt,a.dt)=4,1,0) as rank_4,
if(datediff(b.dt,a.dt)=5,1,0) as rank_5
from
(select device_id,from_unixtime(unix_timestamp(dt, 'yyyyMMdd'), 'yyyy-MM-dd') as dt from new_user_table where dt>='20200801' and dt<='20200805' and is_new_user='1') a --新增用户
left join
(select device_id,from_unixtime(unix_timestamp(dt, 'yyyyMMdd'), 'yyyy-MM-dd') as dt from active_user_table where dt>='20200802' and dt<='20200810' and is_day_active='1') b --每日活跃用户
on a.device_id=b.device_id) a
group by dt order by dt;