原始数据 字段名是 referer
1、 切割表中的 url 路径(带引号的字符串),解析出 PROTOCOL,HOST,PATH,QUERY 等字段
create table t_ods_tmp_referurl as
select a.*,b.*
from ods_weblog_origin as a lateral view parse_url_tuple(regexp_replace(http_referer, "\"", ""),
'HOST', 'PATH','QUERY', 'QUERY:id') b as host, path, query, query_id;
字段解释:
lateral view parse_url_tuple(url, ‘HOST’, ‘PATH’, ‘QUERY’, ‘QUERY:user_id’, ‘QUERY:platform’) b as host, path, query, user_id ,platform) 批量处理 url 路径
regexp_replace (字段,需要替换的,替换之后的) 字符串替换
步骤:现将 “(双引号) 替换成 “ ”(空格),因为带引号的 url , parse_url_tuple() 解析不出来 ,然后再批量解析 url
2、切割时间
字段名是 tome_local
select b.*,substring(time_local,0,10) as daystr,
substring(time_local,11) as tmstr,
substring(time_local,6,2) as month,
substring(time_local,9,2) as day,
substring(time_local,12,2) as hour
From t_ods_tmp_referurl b;