ARRAY、MAP和STRUCT的使用

  1. ARRAY

在本地路径/root/runjar/下创建一个employees.tsv的文件,里面的数据如下:

Frances	Operations,2012-03-01
Greg	Engineering,2003-11-18
Harry	Intern,2012-05-15
Iris	Executive,2001-04-08
Jan	Support,2009-03-30

其中字段分别是(name STRING, dept STRING, date STRING),即每个员工的名字、部门和入职时间。其中第一个字段和第二个字段的分隔符是’\t’,第二个字段和第三个字段的分隔符为 ‘,’ , 我们可以把后面两列存储在字段profile中,profile是Array类型。
将下面内容保存到createTable.hql脚本中:

DROP TABLE employees;
CREATE TABLE employees(name STRING, profile ARRAY<STRING>)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/root/runjar/employees.tsv' OVERWRITE INTO TABLE employees;

执行完这个脚本,我们查询以下里面的数据:

hive> select * from employees;
OK
Frances	["Operations","2012-03-01"]
Greg	["Engineering","2003-11-18"]
Harry	["Intern","2012-05-15"]
Iris	["Executive","2001-04-08"]
Jan	["Support","2009-03-30"]
Time taken: 0.909 seconds, Fetched: 5 row(s)

访问Array中的数据是通过下标来访问的,下标从0开始:

hive> select name, profile[0] AS dept, profile[1] AS date from employees;

Query ID = hdfs_20180720182020_893ac207-6ef9-47fe-9ed5-8da18c77904b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1531978846208_6289, Tracking URL = http://master.hxdi.com:8088/proxy/application_1531978846208_6289/
Kill Command = /home/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop/bin/hadoop job  -kill job_1531978846208_6289
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-07-20 18:20:47,864 Stage-1 map = 0%,  reduce = 0%
2018-07-20 18:20:54,162 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.22 sec
MapReduce Total cumulative CPU time: 2 seconds 220 msec
Ended Job = job_1531978846208_6289
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.22 sec   HDFS Read: 3518 HDFS Write: 131 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 220 msec
OK
Frances	Operations	2012-03-01
Greg	Engineering	2003-11-18
Harry	Intern	2012-05-15
Iris	Executive	2001-04-08
Jan	Support	2009-03-30
Time taken: 16.528 seconds, Fetched: 5 row(s)
  1. MAP

在本地路径/root/runjar/下创建一个deductions.txt的文件,里面的数据如下:

Kim	income:2745,consumption:1032,medicare:321
Russell	income:1945,consumption:847,medicare:286
Ron	income:3490,consumption:1749,medicare:501

每行数据对应一个人的税收种类和征税金额数。比如Kim的imcome tax为2745,consumption tax为1032,medicare tax为321。其中第一列姓名和第二列以’\t’间隔。

将下面内容保存到createTable.hql脚本中:

DROP TABLE deductions;
CREATE TABLE deductions(name STRING, deduction MAP<STRING, INT>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':';
LOAD DATA LOCAL INPATH '/root/runjar/deductions.txt' OVERWRITE INTO TABLE deductions;

查询每个人的income tax数额:

hive> select name, deduction['income'] from deductions;

Query ID = hdfs_20180721012626_69acb052-e2c1-448d-9060-72550691b5c3
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1531978846208_7827, Tracking URL = http://master.hxdi.com:8088/proxy/application_1531978846208_7827/
Kill Command = /home/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop/bin/hadoop job  -kill job_1531978846208_7827
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-07-21 01:26:55,244 Stage-1 map = 0%,  reduce = 0%
2018-07-21 01:27:01,565 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.19 sec
MapReduce Total cumulative CPU time: 2 seconds 190 msec
Ended Job = job_1531978846208_7827
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.19 sec   HDFS Read: 3492 HDFS Write: 31 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 190 msec
OK
Kim	2745
Russell	1945
Ron	3490
Time taken: 14.385 seconds, Fetched: 3 row(s)

3.STRUCT

在本地路径/root/runjar/下创建一个address.txt的文件,里面的数据如下:

Andrew	Beverly Hills,Los Angeles,90001
Ben	Madison Avenue,New York,10047
Mark	Pioneer Square,Seattle,90876

每行数据对应姓名,住址,所在城市,邮政编码。其中第一列姓名和第二列住址以’\t’间隔,其它以’,'间隔。

将下面内容保存到createTable.hql脚本中:

DROP TABLE addresses;
CREATE TABLE addresses(name STRING, address STRUCT<street:STRING, city:STRING, zip:INT>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/root/runjar/address.txt' OVERWRITE INTO TABLE addresses;

将姓名、城市和邮政编码检索出来:

hive> select name, addresses.name, address.city, address.zip from addresses;

Query ID = hdfs_20180721020000_d5498cba-8351-463d-8cdd-c2615b307d4d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1531978846208_7911, Tracking URL = http://master.hxdi.com:8088/proxy/application_1531978846208_7911/
Kill Command = /home/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop/bin/hadoop job  -kill job_1531978846208_7911
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-07-21 02:00:42,470 Stage-1 map = 0%,  reduce = 0%
2018-07-21 02:00:47,738 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.54 sec
MapReduce Total cumulative CPU time: 1 seconds 540 msec
Ended Job = job_1531978846208_7911
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.54 sec   HDFS Read: 3552 HDFS Write: 79 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 540 msec
OK
Andrew	Andrew	Los Angeles	90001
Ben	Ben	New York	10047
Mark	Mark	Seattle	90876
Time taken: 12.951 seconds, Fetched: 3 row(s)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值