insert overwrite table dwd_base_event_log partition(dt='2020-03-10')
select
base_analizer(line,'mid') as mid_id,
base_analizer(line,'uid') as user_id,
base_analizer(line,'vc') as version_code,
base_analizer(line,'vn') as version_name,
base_analizer(line,'l') as lang,
base_analizer(line,'sr') as source,
base_analizer(line,'os') as os,
base_analizer(line,'ar') as area,
base_analizer(line,'md') as model,
base_analizer(line,'ba') as brand,
base_analizer(line,'sv') as sdk_version,
base_analizer(line,'g') as gmail,
base_analizer(line,'hw') as height_width,
base_analizer(line,'t') as app_time,
base_analizer(line,'nw') as network,
base_analizer(line,'ln') as lng,
base_analizer(line,'la') as lat,
event_name,
event_json,
base_analizer(line,'st') as server_time
from ods_event_log lateral view flat_analizer(base_analizer(line,'et')) tmp_flat as event_name,event_json
where dt='2020-03-10' and base_analizer(line,'et')<>'';
以此为例,将ods_event_log表的‘2020-03-10’分区数据的 并且通过UDF base_analizer()解析key为 ‘et’的字段不为空 的数据插入 表dwd_base_event_log 的2020-03-10分区。
base_analizer(): UDF 目的将原始数据解析成json,再根据传入的参数解析出公共字段mid uid cv 等
flat_analizer(): UDTF 目的解析出数据中含有的事件字段en ,即一条ods_event_log字段包含由多个en字段组成的json集合。而此处是将其全部解析成单个。[1,2,3]-> 1,2,3
lateral view flat_analizer(base_analizer(line,‘et’)) tmp_flat as event_name,event_json
lateral view : lateral view用于和split, explode等UDTF一起使用,它能够将一行数据拆成多行数据,在此基础上可以对拆分后的数据进行聚合。lateral view首先为原始表的每行调用UDTF,UTDF会把一行拆分成一或者多行,lateral view再把结果组合,产生一个支持别名表的虚拟表。
此句话目的是将
NAME | SCORE |
---|---|
小明 | [1,2,3] |
小花 | [7,7,7] |
转成
NAME | SCORE |
---|---|
小明 | 1 |
小明 | 2 |
小明 | 3 |
小花 | 7 |
小花 | 7 |
小花 | 7 |