UDTF(User-Defined Table-Generating Functions) 用来解决输入一行输出多行(On-to-many maping) 的需求。Explode也是拆列函数,比如Explode (ARRAY) ,array中的每个元素生成一行。
**explode的使用**
001,allen,usa|china|japan,1|3|6
002,kobe,usa|england|japan,2|3|4
创建表
create table test_message(id int, name string, location array<string>,city array<int>) row format delimited fields terminated by "," collection items terminated by '|';
导入数据
hive> load data local inpath '/export/servers/hive/explode.txt' into table test_message;
执行查询
hive> select * from test_message;
1 allen ["usa","china","japan"] [1,3,6]
2 kobe ["usa","england","japan"] [2,3,4]
使用explode
hive> select explode(location) from test_message;
OK
usa
china
japan
usa
england
japan
Time taken: 0.126 seconds, Fetched: 6 row(s)
问题:当使用UDTF函数的时候,hive只允许对拆分字段进行查询。
解决这个问题就需要使用lateral view来解决了。lateral view 也叫explode的好搭档
了解lateral view(侧视图)
lateral view为侧视图,意义是为了配合UDTF来使用,把某一行数据拆分成多行数据。不加lateral view的UDTF只能提取单个拆分的字段,并不能与原来的表字段一起查询。加上lateral view就可以将拆分的字段与原始表关联上。
在使用lateral view的时候需要指定试图别名和生成的新的列别名
语法
tableA lateral view UDTF(xxx) 试图别名(虚拟表) as a,b,c
hive> select subview.* from test_message lateral view explode(location) subview as locations
OK
usa
china
japan
usa
england
japan
Time taken: 0.312 seconds, Fetched: 6 row(s)
hive> select name,subview.* from test_message lateral view explode(location) subview as locations;
OK
allen usa
allen china
allen japan
kobe usa
kobe england
kobe japan
Time taken: 0.57 seconds, Fetched: 6 row(s)
一行拆多行实现
insert into table dw_weblog_detail partition(datestr='20181101')
select c.valid,c.remote_addr,c.remote_user,c.time_local,
substring(c.time_local,0,10) as daystr,
substring(c.time_local,12) as tmstr,
substring(c.time_local,6,2) as month,
substring(c.time_local,9,2) as day,
substring(c.time_local,12,2) as hour,
c.request,c.status,c.body_bytes_sent,c.http_referer,c.ref_host,c.ref_path,c.ref_query,c.ref_query_id,c.http_user_agent
from
(SELECT
a.valid,a.remote_addr,a.remote_user,a.time_local,
a.request,a.status,a.body_bytes_sent,a.http_referer,a.http_user_agent,b.ref_host,b.ref_path,b.ref_query,b.ref_query_id
FROM ods_weblog_origin a LATERAL VIEW parse_url_tuple(regexp_replace(http_referer, "\"", ""), 'HOST', 'PATH','QUERY', 'QUERY:id') b as ref_host, ref_path, ref_query, ref_query_id) c;
实现查询功能,问题完美解决,有帮助请不要吝啬你的赞!