hive explode和LateralView的使用,创建包含数组类型字段的表
LateralView
点击查看LateralView用法
lateralview 与用户定义的表生成功能(例如)结合使用explode()
。如内置表生成函数中所述,UDTF为每个输入行生成零个或多个输出行。lateralview 首先将UDTF应用于基础表的每一行,然后将结果输出行与输入行连接起来以形成具有提供的表别名的虚拟表。
Lateral View 用法
lateralView: LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)*
fromClause: FROM baseTable (lateralView)*
explode (array)
select explode(array('A','B','C'));
select explode(array('A','B','C')) as col;
select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf;
select tf.* from (select 0) t lateral view explode(array('A','B','C')) tf as col;
col |
---|
A |
B |
C |
explode (map)
select explode(map('A',10,'B',20,'C',30));
select explode(map('A',10,'B',20,'C',30)) as (key,value);
select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf;
select tf.* from (select 0) t lateral view explode(map('A',10,'B',20,'C',30)) tf as key,value;
key | value |
---|---|
A | 10 |
B | 20 |
C | 30 |
创造测试数据测试Lateral View和explode
创建一个包含数组类型字段的表,且格式为textfile
字段间分隔符为空格,数组元素间分隔符为逗号
create table xtable(name string,age string,subject array<string>) row format delimited fields terminated by ' ' collection items terminated by ',' stored as textfile;
查看表所在位置
0: jdbc:hive2://hadoop91:10000> desc formatted xtable;
OK
| Location: | hdfs://hadoop90:9000/user/hive/warehouse/xtable | NULL |
创造数据
vi xtable.txt
# 存入以下数据
xhx 15 math,english,history
bjx 20 physical,biological
将数据加载到表中
[root@hadoop91 ~]# hdfs dfs -put /root/xtable.txt hdfs://hadoop90:9000/user/hive/warehouse/xtable/
查看表
0: jdbc:hive2://hadoop91:10000> select * from xtable;
+--------------+-------------+-------------------------------+--+
| xtable.name | xtable.age | xtable.subject |
+--------------+-------------+-------------------------------+--+
| xhx | 15 | ["math","english","history"] |
| bjx | 20 | ["physical","biological"] |
+--------------+-------------+-------------------------------+--+
查询中添加一个explode
0: jdbc:hive2://hadoop91:10000> select explode(subject) from xtable;
+-------------+--+
| col |
+-------------+--+
| math |
| english |
| history |
| physical |
| biological |
+-------------+--+
5 rows selected (0.402 seconds)
如果想要把name和id也查出来,则结果如下,报错了
0: jdbc:hive2://hadoop91:10000> select name,age,explode(subject) from xtable;
Error: Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions (state=42000,code=10081)
所以这个时候就需要用到Lateral View了
0: jdbc:hive2://hadoop91:10000> select name,age,subcol from xtable lateral view explode(subject) subtable as subcol;
+-------+------+-------------+--+
| name | age | subcol |
+-------+------+-------------+--+
| xhx | 15 | math |
| xhx | 15 | english |
| xhx | 15 | history |
| bjx | 20 | physical |
| bjx | 20 | biological |
+-------+------+-------------+--+
5 rows selected (0.302 seconds)
outer 的使用
但是有一个问题,如果explode的列如果为空,没有值,那么在这种情况下,源行永远也不会出现在结果中, 可以使用outer来防止这种情况,并且将生成带有NULL的值。
重新上传两行数据,其中新上传的两行数据中subject值时空的
select * from xtable;
+--------------+-------------+-------------------------------+
| xtable.name | xtable.age | xtable.subject |
+--------------+-------------+-------------------------------+
| xhx | 15 | ["math","english","history"] |
| bjx | 20 | ["physical","biological"] |
| ldh | 60 | NULL |
| lsp | 30 | NULL |
+--------------+-------------+-------------------------------+
不带outer时
select name,age,subcol from xtable lateral view explode(subject) subtable as subcol;
+-------+------+-------------+
| name | age | subcol |
+-------+------+-------------+
| xhx | 15 | math |
| xhx | 15 | english |
| xhx | 15 | history |
| bjx | 20 | physical |
| bjx | 20 | biological |
+-------+------+-------------+
带outer时
select name,age,subcol from xtable lateral view outer explode(subject) subtable as subcol;
+-------+------+-------------+
| name | age | subcol |
+-------+------+-------------+
| xhx | 15 | math |
| xhx | 15 | english |
| xhx | 15 | history |
| bjx | 20 | physical |
| bjx | 20 | biological |
| ldh | 60 | NULL |
| lsp | 30 | NULL |
+-------+------+-------------+