Hive的列除了支持基本的数据类型外,还支持使用Struct、Map和Array三种集合数据类型。
假设某表有如下一行,我们用JSON格式来表示其数据结构。在Hive下访问的格式为
{
"name": "John Doe",
"salary": 100000.0 ,
"subordinates": ["Mary Smith" , "Todd Jones"] , //列表Array,
"deductions": { //键值Map,
"Federal Taxes": 0.2 ,
"State Taxes": 0.05,
"Insurance": 0.1
}
"address": { //结构Struct,
"street": "1 Michigan Ave." ,
"city": "Chicago" ,
"state": "IL" ,
"zip": 60600
}
}
基于上述数据结构,我们在Hive里创建对应的表,并导入数据。
创建本地测试文件test.txt
John Doe,100000.0,Mary Smith_Todd Jones,Federal Taxes:0.2_State Taxes:0.05_Insurance:0.1,1 Michigan Ave._Chicago_1L_60600
Tom Smith,90000.0,Jan_Hello Ketty,Federal Taxes:0.2_State Taxes:0.05_Insurance:0.1,Guang dong._China_0.5L_60661
ps:注意,MAP,STRUCT和ARRAY里的元素间关系都可以用同一个字符表示,这里用“_”。
Hive上创建测试表employees
CREATE TABLE learn.employees(
name STRING,
sa1ary FLOAT,
subordinates ARRAY<STRING>,
deductions MAP<STRING, FLOAT>,
address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' -- 列分隔符
COLLECTION ITEMS TERMINATED BY '_' --MAP STRUCT 和 ARRAY 的分隔符(数据分割符号)
MAP KEYS TERMINATED BY ':' -- MAP中的key与value的分隔符
LINES TERMINATED BY '\n'; -- 行分隔符
导入文本数据到测试表
load data local inpath "/home/hadoop/files/input/test.txt" overwrite into table learn.employees ;
访问三种集合列里的数据,以下分别是ARRAY,MAP,STRUCT的访问方式
hive> select subordinates[1], deductions['Federal Taxes'],address.city from learn.employees;
OK
Todd Jones 0.2 Chicago
Hello Ketty 0.2 China
Time taken: 0.123 seconds, Fetched: 2 row(s)