表:
用户行为日志表:dwd.event_app_detail_test;
需求:
定义一个漏斗分析出每个层级的人数
漏斗模型定义:
步骤1 ==> event_a,p1=v1
步骤2 ==> event_c,p2=v2
步骤3 ==> event_e,p3=v3
思路:
将用户的所有事件按照时间先后顺序,拼接成一个字符串,然后用正则模式去匹配,它满足到了第几个漏斗步骤
实现:
先将用户的事件按照时间顺序拼接到一起
SELECT
guid,
concat_ws(':',sort_array(collect_list(concat_ws(',',cast(ts as string),eventid)))) as events
FROM dwd.event_app_detail_test
WHERE (eventid='event_a' AND properties['p1']='v1') OR (eventid='event_c' AND properties['p2']='v2') OR (eventid='event_e' AND properties['p3']='v3')
GROUP BY guid
最终SQL
with tmp as(
SELECT
guid,
concat_ws(':',sort_array(collect_list(concat_ws(',',cast(ts as string),eventid)))) as events
FROM dwd.event_app_detail_test
WHERE (eventid='event_a' AND properties['p1']='v1') OR (eventid='event_c' AND properties['p2']='v2') OR (eventid='event_e' AND properties['p3']='v3')
GROUP BY guid
)
select
guid,
case
when (regexp_extract(events,'.*?(event_a).*?(event_c).*?(event_e).*?',3)='event_e') then 3
when (regexp_extract(events,'.*?(event_a).*?(event_c).*?',2)='event_c') then 2
when (regexp_extract(events,'.*?(event_a).*?',1)='event_a') then 1
end as step_num --执行到漏斗的步骤
from tmp
;
结果:
注意:case里面匹配步骤的顺序