Impala & Hive 使用复杂数据类型

1. 环境

CDH 5.16.1

2. Hive 使用复杂数据类型

2.1 数据格式
1       zhangsan:man    football,basketball
2       lisi:female     sing,dance
2.2 Hive 建表
create table studentInfo(
    id int,
    info map<string,string>  comment 'map<姓名,性别>',
    favorite array<string> comment 'array[football,basketball]'
)
row format delimited fields terminated by '\t'    --列分隔符
collection items terminated by ','   --array中各个item之间的分隔符
map keys terminated 
by ':'        --map中key和value之间的分隔符
lines terminated by '\n';       --行分隔符
2.3 导入数据
load data local inpath '/opt/module/jobs/student.txt' into table studentInfo;
2.3 执行查询
select *  from studentInfo;

+-----------------+---------------------+----------------------------+--+
| studentinfo.id  |  studentinfo.info   |    studentinfo.favorite    |
+-----------------+---------------------+----------------------------+--+
| 1               | {"zhangsan":"man"}  | ["football","basketball"]  |
| 2               | {"lisi":"female"}   | ["sing","dance"]           |
+-----------------+---------------------+----------------------------+--+




-- 对于map查询,map[key]
--对于array查询,array[index]
select id, info['zhangsan'],favorite[1] from studentInfo;

+-----+-------+-------------+--+
| id  |  sex  |  favorite   |
+-----+-------+-------------+--+
| 1   | man   | basketball  |
| 2   | NULL  | dance       |
+-----+-------+-------------+--+

3. Impala 使用复杂类型

注意:Impala 只用parquet格式存储时,才能使用复杂数据类型

3.1 Hive中建表(parquet格式,导入数据
create table student_parquet(
    id int,
    info map<string,string>  comment 'map<姓名,性别>',
    favorite array<string> comment 'array[football,basketball]'
)
stored as parquet

insert overwrite table student_parquet select id,info,favorite from studentInfo;
3.2 刷新impala元数据
refresh default.student_parquet;
3.3 执行查询
select 
    id ,favorite_array.item,info_map.key,info_map.value
from student_parquet,
    student_parquet.info as info_map,
    student_parquet.favorite as favorite_array;

+----+------------+----------+--------+
| id | item       | key      | value  |
+----+------------+----------+--------+
| 1  | football   | zhangsan | man    |
| 1  | basketball | zhangsan | man    |
| 2  | sing       | lisi     | female |
| 2  | dance      | lisi     | female |
+----+------------+----------+--------+




select 
    id ,favorite_array.item
from student_parquet,
    student_parquet.info as info_map,
    student_parquet.favorite as favorite_array
where favorite_array.POS = 0;

+----+----------+
| id | item     |
+----+----------+
| 1  | football |
| 2  | sing     |
+----+----------+




select 
    id ,favorite_array.item,info_map.value
from student_parquet,
    student_parquet.info as info_map,
    student_parquet.favorite as favorite_array
where favorite_array.item = 'sing'
and info_map.key = 'lisi';

+----+------+--------+
| id | item | value  |
+----+------+--------+
| 2  | sing | female |
+----+------+--------+

总结:

  1. array 类型视为 一张表, 其列名为 item

  2. map类型有两个列, 一个是key, 一个是value

参考:

  1. https://blog.csdn.net/rav009/article/details/86750850
  2. https://docs.cloudera.com/documentation/enterprise/5-5-x/topics/impala_complex_types.html
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值