需求背景:
业务方要求算出沉浸式用户和列表式用户,其他用户,在做的过程中发现日志中很多用户标识都对应多个类型,例如,一个用户既有沉浸式打点,也有列表式打点,还有其他的打点。现在要求对表中的数据进行归一化处理。
要求:
一个id 只能对应一个detail_page_type,处理规则:main_immersive对应沉浸式用户,main 对应列表式用户,剩下的都属于其他用户。如果一个id 同时存在main_immersive和main 就属于沉浸式用户。
测试数据:
id detail_page_type
1 main_immersive
1 main
1 other
1 lange
2 main
3 main
2 main_immersive
3 other
4 other
4 lange
我的处理方案:
使用concat_ws 先把列转行,一行一个id对应多个detail_page_type,然后使用instr() 函数做判断是否存在该字符。
代码:
select id,case when instr(detail_page_type,"main_immersive") > 0 then 'main_immersive'
when instr(detail_page_type,"main_immersive") = 0 and instr(detail_page_type,"main") > 0 then 'main'
when instr(detail_page_type,"main_immersive") = 0 and instr(detail_page_type,"main") = 0 then 'other'
end as detail_page_type
from (
--- 使用concat_ws 列转行
select id,concat_ws(',',collect_set(detail_page_type)) as detail_page_type
from (
select
id,
detail_page_type
FROM
tmp
WHERE
event_day = '20220829’
group by id,detail_page_type
) res
group by id
) t