hive 中通过lateral view 与explode 实现行转列功能
explode作用是处理map结构的字段,使用案例如下(hive自带map,struct,array字段类型):
drop table if exists XX;
create table XX(
area string
,goods_id string
,sale_info string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
-- NULL DEFINED AS 'null'
STORED AS textfile
;
导入数据:
area|goods_id|sale_info
a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]
1. explode的使用:
1.1 拆解array字段(goods_id)
select explode(
split(goods_id,',')
) as goods_id
from XX;
结果如下
1.2 拆解map字段(area)
select explode(
split(area,',')
) as area
from XX;
结果如下:
1.3 拆解json字段(sale_info)
这个时候要配合一下get_json_object
我们想获取所有的monthSales,第一步我们先把这个字段拆成list,并且拆成行展示:
select explode(
split(
regexp_replace(
regexp_replace(sale_info,'\\[\\{',''), --删除开头的[{
'}]',''), --删除结尾的}]
'},\\{') --每个元素以},{作为分隔符进行分割
) as sale_info
from XX;
2. LATERAL VIEW的使用:
侧视图的意义是配合explode(或者其他的UDTF),一个语句生成把单行数据拆解成多行后的数据结果集。
select goods_id2
,sale_info
from XX
LATERAL VIEW explode(
split(goods_id,',')
)goods as goods_id2;
其中LATERAL VIEW explode(split(goods_id,','))goods相当于一个虚拟表,与原表XX笛卡尔积关联。
也可以多重使用
select goods_id2
,sale_info
,area2
from XX
LATERAL VIEW explode(
split(goods_id,',')
)goods as goods_id2
LATERAL VIEW explode(
split(area,',')
)area as area2;
也是三个表笛卡尔积的结果
现在我们从sale_info字段中找出所有的monthSales并进行展示
select get_json_object(
concat('{',sale_info_r,'}'),'$.monthSales') as monthSales
from XX
LATERAL VIEW explode(
split(
regexp_replace(
regexp_replace(sale_info,'\\[\\{',''),'}]',''),
'},\\{')
)sale_info
as sale_info_r;
最后,把这个json格式的一行数据,完全转换成二维表的方式展现
select get_json_object(
concat('{',sale_info_1,'}'),'$.source') as source,
get_json_object(concat('{',sale_info_1,'}'),'$.monthSales') as monthSales,
get_json_object(concat('{',sale_info_1,'}'),'$.userCount') as monthSales,
get_json_object(concat('{',sale_info_1,'}'),'$.score') as monthSales
from XX
LATERAL VIEW explode(
split(
regexp_replace(
regexp_replace(sale_info,'\\[\\{',''),
'}]',''),
'},\\{')
)sale_info as sale_info_1;
-----------------------------
拓展:
uid string
value_list array<string>
2979156750204928 ["aa-1-2.0-4-4","bb-0-1.0-2","cc-1-0.22-4"]
select a.uid
,split(b.value_info,'-')[0] as value0
,split(b.value_info,'-')[2] as value2
from XX a
-- 可直接将array数组形式的value_list 展开,形成虚拟表b, 并新字段命名为value_info
lateral view explode(a.value_list)b as value_info
;