day 1
https://zhuanlan.zhihu.com/p/80905376
1、用户行为分析
(1) 取某一天查看用户资料行为(event_name=profile.index)的20,50,80分位点
思路:分位数采用 percentile_approx(),where定位某一天某一用户
实现:
select percentile_approx(event_num_map, array(0.2,0.5,0.8),9999)
from event_summary
where Partition_date = ${
date}
and event_name = profile.index
(2) 查看用户资料行为的不同行为频次的用户分布
思路:
event_summary: user_id 一个行为 行为次数
连接表,主表是某时间区间的用户id,一个事件为一条记录,所以id会有大量重复;
次表定义每个用户每个行为的行为频次字段。之所以用到两个表是因为占比字段需要用到主表
实现:
select t2."行为频次", count(distinct t2.user_id),
count(distinct t2.user_id)/count(t1.user_id),
sum(count(distinct t2.user_id)) over(partition by t2."行为频次")
from (select User_id from event_summary
where Partition_date between ${begin_date} and ${end_date}) t1
left join (select user_id,
(case when event_num_map between 0 and 5 then "0-5"
case when event_num_map between 5 and 15 then "5-15"
else "其他" end ) as "行为频次"
from event_summary
where Partition_date between ${begin_date} and ${end_date}) t2
on t1.user_id = t2.user_id
group by t2.行为频次
2、流失用户分析
思路:复写表,表1输出某一天活跃的用户表,表2输出不活跃用户表,两者连接后,按日期group by,其日期差值就是流失的天数
题目感觉条件不足,先不考虑了
代码:
select Partition_date,case when datediff(t2.Partition_date,t1.Partition_date)=1 then count(distinct t2.user_id) end) as '流失1天',
case when datediff(t2.Partition_date,t1.Partition_date)=2 then count(distinct t1.user_id) end) as '流失2天',
case when datediff(t2.Partition_date,t1.Partition_date)=3 then count(distinct t1.user_id) end) as '流失3天',
case when datediff(t2.Partition_date,t1.Partition_date)>=30 then count(distinct t1.user_id) end) as '流失30天以上',
from (
select user_id,Partition_date
from usre_active
where daily_active_status_map=1
)t1
left join (
select user_id,Partition_date
from