Hive之复杂数据类型-array、map和struct使用案例
复杂数据类型在Hive中使用场景非常多
具体数据类型的详细介绍这里就不说了,官网和博客介绍多的是,这里直接上例子,更直观。
一、array类型
array是数组类型,类似于python中的list类型:
array[] = ['aaa','bbb','ccc']
直接上例子:
1.1 数据格式
说明:
name与locations之间制表符分隔,locations中元素之间逗号分隔
zhangsan beijing,shanghai,tianjin,hangzhou
lisi changchu,chengdu,wuhan,beijing
1.2 建表语句
create table hive_array(name string, work_locations array<string>)
row format delimited fields terminated by '\t'
COLLECTION ITEMS TERMINATED BY ',';
1.3 导入数据(从本地导入,同样支持从HDFS导入)
load data local inpath '/home/zhangfei/data/work_locations.txt' overwrite into table hive_array;
1.4 常用查询:
select * from hive_array;
zhangsan ["beijing","shanghai","tianjin","hangzhou"]
lisi ["changchu","chengdu","wuhan","beijing"]
array_contains常与where子句连用
1. select name, work_locations[0] location from hive_array;
2. select name, size(work_locations) location from hive_array;
3. select * from hive_array where array_contains(work_locations,'tianjin');
二、map类型
直接上例子:
2.1 数据格式
说明:字段与字段分隔符: “,”;需要map字段之间的分隔符:"#";map内部k-v分隔符:":"
1,zhangsan,father:xiaoming#mother:xiaohuang#brother:xiaoxu,28
2,lisi,father:mayun#mother:huangyi#brother:guanyu,22
3,wangwu,father:wangjianlin#mother:ruhua#sister:jingtian,29
4,mayun,father:mayongzhen#mother:angelababy,26
2.2 建表语句
create table hive_map(
id int, name string, members map<string,string>, age int
)
row format delimited
fields terminated by ','
COLLECTION ITEMS TERMINATED BY '#' 条目分隔符
MAP KEYS TERMINATED BY ':'; k-v分隔符
2.3 导入数据
load data local inpath '/home/zhangfei/data/hive_map.txt' overwrite into table hive_map;
2.4 常用查询
select * from hive_map;
1 zhangsan {"father":"xiaoming","mother":"xiaohuang","brother":"xiaoxu"} 28
2 lisi {"father":"mayun","mother":"huangyi","brother":"guanyu"} 22
3 wangwu {"father":"wangjianlin","mother":"ruhua","sister":"jingtian"} 29
4 mayun {"father":"mayongzhen","mother":"angelababy"} 26
Time taken: 0.048 seconds, Fetched: 4 row(s)
注意:members[‘father’], map_keys(), map_va
lues(), ==array_contains()==常与where子句连用
1. select id, name, members['father'] father, members['mother'] mother, age from hive_map;
2. select id, name, map_keys(members) as relation from hive_map;
3. select id, name, map_values(members) as relation from hive_map;
4. select id,name,size(members) num from hive_map;
5. select * from hive_map where array_contains(map_keys(members), 'brother');
6. select id,name, members['brother'] brother from hive_map where array_contains(map_keys(members), 'brother');
三、struct类型
直接上例子
3.1 数据格式
说明:字段之间#分割,第二个字段之间冒号分割
192.168.1.1#zhangsan:40
192.168.1.2#lisi:50
192.168.1.3#wangwu:60
192.168.1.4#zhaoliu:70
3.2 建表语句
create table hive_struct(
ip string, info struct<name:string, age:int>
)
row format delimited
fields terminated by '#'
COLLECTION ITEMS TERMINATED BY ':';
3.3 导入数据
load data local inpath '/home/zhangfei/data/hive_struct.txt' into table hive_struct;
3.4 常用查询
select * from hive_struct;
hive_struct.ip hive_struct.info
192.168.1.1 {"name":"zhangsan","age":40}
192.168.1.2 {"name":"lisi","age":50}
192.168.1.3 {"name":"wangwu","age":60}
192.168.1.4 {"name":"zhaoliu","age":70}
Time taken: 0.244 seconds, Fetched: 4 row(s)
可直接通过.访问数据:
select ip, info.name from hive_struct;
192.168.1.1 zhangsan
192.168.1.2 lisi
192.168.1.3 wangwu
192.168.1.4 zhaoliu