ARRAY、MAP和STRUCT的使用

最新推荐文章于 2024-09-22 23:19:14 发布

AndrewTeng

最新推荐文章于 2024-09-22 23:19:14 发布

阅读量1.1k

点赞数

分类专栏： Hive 文章标签： hive 集合数据类型

本文链接：https://blog.csdn.net/qq_30982323/article/details/81137001

版权

Hive 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

ARRAY

在本地路径/root/runjar/下创建一个employees.tsv的文件，里面的数据如下：

Frances	Operations,2012-03-01
Greg	Engineering,2003-11-18
Harry	Intern,2012-05-15
Iris	Executive,2001-04-08
Jan	Support,2009-03-30

其中字段分别是(name STRING, dept STRING, date STRING)，即每个员工的名字、部门和入职时间。其中第一个字段和第二个字段的分隔符是’\t’，第二个字段和第三个字段的分隔符为 ‘,’ ，我们可以把后面两列存储在字段profile中，profile是Array类型。
将下面内容保存到createTable.hql脚本中：

DROP TABLE employees;
CREATE TABLE employees(name STRING, profile ARRAY<STRING>)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/root/runjar/employees.tsv' OVERWRITE INTO TABLE employees;

执行完这个脚本，我们查询以下里面的数据：

hive> select * from employees;
OK
Frances	["Operations","2012-03-01"]
Greg	["Engineering","2003-11-18"]
Harry	["Intern","2012-05-15"]
Iris	["Executive","2001-04-08"]
Jan	["Support","2009-03-30"]
Time taken: 0.909 seconds, Fetched: 5 row(s)

访问Array中的数据是通过下标来访问的，下标从0开始：

hive> select name, profile[0] AS dept, profile[1] AS date from employees;

Query ID = hdfs_20180720182020_893ac207-6ef9-47fe-9ed5-8da18c77904b
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1531978846208_6289, Tracking URL = http://master.hxdi.com:8088/proxy/application_1531978846208_6289/
Kill Command = /home/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop/bin/hadoop job  -kill job_1531978846208_6289
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-07-20 18:20:47,864 Stage-1 map = 0%,  reduce = 0%
2018-07-20 18:20:54,162 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.22 sec
MapReduce Total cumulative CPU time: 2 seconds 220 msec
Ended Job = job_1531978846208_6289
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.22 sec   HDFS Read: 3518 HDFS Write: 131 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 220 msec
OK
Frances	Operations	2012-03-01
Greg	Engineering	2003-11-18
Harry	Intern	2012-05-15
Iris	Executive	2001-04-08
Jan	Support	2009-03-30
Time taken: 16.528 seconds, Fetched: 5 row(s)

在本地路径/root/runjar/下创建一个deductions.txt的文件，里面的数据如下：

Kim	income:2745,consumption:1032,medicare:321
Russell	income:1945,consumption:847,medicare:286
Ron	income:3490,consumption:1749,medicare:501

每行数据对应一个人的税收种类和征税金额数。比如Kim的imcome tax为2745，consumption tax为1032，medicare tax为321。其中第一列姓名和第二列以’\t’间隔。

将下面内容保存到createTable.hql脚本中：

DROP TABLE deductions;
CREATE TABLE deductions(name STRING, deduction MAP<STRING, INT>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':';
LOAD DATA LOCAL INPATH '/root/runjar/deductions.txt' OVERWRITE INTO TABLE deductions;

查询每个人的income tax数额：

hive> select name, deduction['income'] from deductions;

Query ID = hdfs_20180721012626_69acb052-e2c1-448d-9060-72550691b5c3
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1531978846208_7827, Tracking URL = http://master.hxdi.com:8088/proxy/application_1531978846208_7827/
Kill Command = /home/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop/bin/hadoop job  -kill job_1531978846208_7827
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-07-21 01:26:55,244 Stage-1 map = 0%,  reduce = 0%
2018-07-21 01:27:01,565 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.19 sec
MapReduce Total cumulative CPU time: 2 seconds 190 msec
Ended Job = job_1531978846208_7827
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.19 sec   HDFS Read: 3492 HDFS Write: 31 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 190 msec
OK
Kim	2745
Russell	1945
Ron	3490
Time taken: 14.385 seconds, Fetched: 3 row(s)

3.STRUCT

在本地路径/root/runjar/下创建一个address.txt的文件，里面的数据如下：

Andrew	Beverly Hills,Los Angeles,90001
Ben	Madison Avenue,New York,10047
Mark	Pioneer Square,Seattle,90876

每行数据对应姓名，住址，所在城市，邮政编码。其中第一列姓名和第二列住址以’\t’间隔，其它以’,'间隔。

将下面内容保存到createTable.hql脚本中：

DROP TABLE addresses;
CREATE TABLE addresses(name STRING, address STRUCT<street:STRING, city:STRING, zip:INT>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/root/runjar/address.txt' OVERWRITE INTO TABLE addresses;

将姓名、城市和邮政编码检索出来：

hive> select name, addresses.name, address.city, address.zip from addresses;

Query ID = hdfs_20180721020000_d5498cba-8351-463d-8cdd-c2615b307d4d
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1531978846208_7911, Tracking URL = http://master.hxdi.com:8088/proxy/application_1531978846208_7911/
Kill Command = /home/cloudera/parcels/CDH-5.8.2-1.cdh5.8.2.p0.3/lib/hadoop/bin/hadoop job  -kill job_1531978846208_7911
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2018-07-21 02:00:42,470 Stage-1 map = 0%,  reduce = 0%
2018-07-21 02:00:47,738 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.54 sec
MapReduce Total cumulative CPU time: 1 seconds 540 msec
Ended Job = job_1531978846208_7911
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.54 sec   HDFS Read: 3552 HDFS Write: 79 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 540 msec
OK
Andrew	Andrew	Los Angeles	90001
Ben	Ben	New York	10047
Mark	Mark	Seattle	90876
Time taken: 12.951 seconds, Fetched: 3 row(s)