hive中的复杂类型struct、array、map

创建一个员工表:

create table if not exists employees( 
  	name	string, 
  	salary	float, 
  	subordinates	array<string>, 
  	deductions	map<string,float>, 
  	address	struct<street:string, city:string, state:string, zip:int> 
  ) row format delimited 
  fields terminated by '\001' 
  collection items terminated by '\002' 
  map keys terminated by '\003' 
  lines terminated by '\n' 
  stored as textfile;

其中subordinates下属员工字段为array类型,deductions扣除字段为map类型,address地址字段为struct类型。

每个列之间通过'\001'分割,array、struct、map中每组元素通过'\002'分割,map的key、value之间通过'\003'分割。

下面来导入数据:

load data local inpath './data/employees.txt' overwrite into table employees;

employeets.txt文件的内容如下:

John Doe100000.0Mary SmithTodd JonesFederal Taxes.2State Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600
Mary Smith80000.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100 Ontario St.ChicagoIL60601
Todd Jones70000.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago Ave.Oak ParkIL60700
Bill King60000.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure Dr.ObscuriaIL60100
Boss Man200000.0John DoeFred FinanceFederal Taxes.3State Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500
Fred Finance150000.0Stacy AccountantFederal Taxes.3State Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500
Stacy Accountant60000.0Federal Taxes.15State Taxes.03Insurance.1300 Main St.NapervilleIL60563

因为分隔符是不可见的,所以在这上面看不出来,可以直接复制。

array类型的数据可以通过'数组名[index]'的方式访问,index从0开始:

select subordinates[0],subordinates[1] from employees limit 1;

查询结果为:

c0	c1
Mary Smith	Todd Jones

第一行是字段的默认列名。

struct类型的数据可以通过'列名.字段名'的方式访问,如下:

select address.street, address.city, address.state, address.zip from employees limit 1;

查询结果为:

street	city	state	zip
1 Michigan Ave.	Chicago	IL	60600

map类型的数据可以通过'列名['key']'的方式访问,如下:

select deductions['Federal Taxes'],deductions['State Taxes'],deductions['Insurance'] from employees limit 1;

查询结果为:

c0	c1	c2
0.2	0.05	0.1

如果没有查到结果可以使用if判断:

select if(deductions['Federal Taxes'] is null, 0, deductions['Federal Taxes']) from employees limit 1;

我们也可以用UDTF把结果变成多行,比如:

select explode(deductions) from employees;

查询结果为:

key	value
Federal Taxes	0.2
State Taxes	0.05
Insurance	0.1
Federal Taxes	0.2
State Taxes	0.05
Insurance	0.1
Federal Taxes	0.15
State Taxes	0.03
Insurance	0.1
Federal Taxes	0.15
State Taxes	0.03
Insurance	0.1
Federal Taxes	0.3
State Taxes	0.07
Insurance	0.05
Federal Taxes	0.3
State Taxes	0.07
Insurance	0.05
Federal Taxes	0.15
State Taxes	0.03
Insurance	0.1

有时候我们需要把name也查询出来:

select name, explode(deductions) from employees;

发现执行语句会报错:

FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions

hive不支持这种语法。

可以用下面语句查询:

select name,dekey,devalue from employees LATERAL VIEW explode(deductions) dedView as dekey,devalue;

查询结果为:

name	dekey	devalue
John Doe	Federal Taxes	0.2
John Doe	State Taxes	0.05
John Doe	Insurance	0.1
Mary Smith	Federal Taxes	0.2
Mary Smith	State Taxes	0.05
Mary Smith	Insurance	0.1
Todd Jones	Federal Taxes	0.15
Todd Jones	State Taxes	0.03
Todd Jones	Insurance	0.1
Bill King	Federal Taxes	0.15
Bill King	State Taxes	0.03
Bill King	Insurance	0.1
Boss Man	Federal Taxes	0.3
Boss Man	State Taxes	0.07
Boss Man	Insurance	0.05
Fred Finance	Federal Taxes	0.3
Fred Finance	State Taxes	0.07
Fred Finance	Insurance	0.05
LATERAL VIEW关键字把explode产生的多行或多列变成一个视图,然后再与其他的数据聚合在一起。


  • 2
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值