42、规定若同一用户的相邻两次访问记录时间间隔小于60s,则认为两次浏览记录属于同一会话。现有如下需求,为属于同一会话的访问记录增加一个相同的会话id字段,会话id格式为"user_id-number",其中number从1开始,用于区分同一用户的不同会话
期望结果如下:
user_id | page_id | view_timestamp | session_id |
100 | home | 1659950435 | 100-1 |
100 | good_search | 1659950446 | 100-1 |
100 | good_list | 1659950457 | 100-1 |
100 | home | 1659950541 | 100-2 |
100 | good_detail | 1659950552 | 100-2 |
100 | cart | 1659950563 | 100-2 |
101 | home | 1659950435 | 101-1 |
101 | good_search | 1659950446 | 101-1 |
101 | good_list | 1659950457 | 101-1 |
101 | home | 1659950541 | 101-2 |
101 | good_detail | 1659950552 | 101-2 |
101 | cart | 1659950563 | 101-2 |
102 | home | 1659950435 | 102-1 |
102 | good_search | 1659950446 | 102-1 |
102 | good_list | 1659950457 | 102-1 |
103 | home | 1659950541 | 103-1 |
103 | good_detail | 1659950552 | 103-1 |
现有页面浏览记录表(page_view_events)如下,表中有每个用户的每次页面访问记录。
user_id | page_id | view_timestamp |
100 | home | 1659950435 |
100 | good_search | 1659950446 |
100 | good_list | 1659950457 |
100 | home | 1659950541 |
100 | good_detail | 1659950552 |
100 | cart | 1659950563 |
101 | home | 1659950435 |
101 | good_search | 1659950446 |
101 | good_list | 1659950457 |
101 | home | 1659950541 |
101 | good_detail | 1659950552 |
101 | cart | 1659950563 |
102 | home | 1659950435 |
102 | good_search | 1659950446 |
102 | good_list | 1659950457 |
103 | home | 1659950541 |
103 | good_detail | 1659950552 |
103 | cart | 1659950563 |
代码
with t as ( select -- 用户 页面 时间戳 上一个时间戳 两个时间戳之差 user_id ,page_id ,view_timestamp ,lag(view_timestamp,1,view_timestamp) over(partition by user_id order by view_timestamp) as lag_view_timestamp ,cast(view_timestamp - lag(view_timestamp,1,view_timestamp) over(partition by user_id order by view_timestamp) as int) as time_diff from page_view_events ) , t2 as ( -- 用户 页面 时间戳 上一个时间戳 时间戳之差 会话分组标志 累加会话分组标志 select user_id ,page_id ,view_timestamp ,lag_view_timestamp ,time_diff ,if(time_diff >=60,1,0) as group_flag ,sum(if(time_diff >=60,1,0)) over(partition by user_id order by view_timestamp)+1 as number from t ) select user_id ,page_id ,view_timestamp ,concat(user_id,'-',number) as session_id from t2