Hive中复合数据类型Array,Map,Struct的操作
The following functions construct instances of complex types.
Constructor Function Operands Description map (key1, value1, key2, value2, …) Creates a map with the given key/value pairs. struct (val1, val2, val3, …) Creates a struct with the given field values. Struct field names will be col1, col2, …. named_struct (name1, val1, name2, val2, …) Creates a struct with the given field names and values. (As of Hive 0.8.0.) array (val1, val2, …) Creates an array with the given elements. create_union (tag, val1, val2, …) Creates a union type with the value that is being pointed to by the tag parameter.
1.array的用法
实例数据array.txt:姓名和工作地点
zhangsan beijing,shanghai,hangzhou
lisi chengdu,wuhan,suzhou,jilin
wangwu nanjing,kunsan
创建数据库表
hive> create table person_array(name string,work_locations array<string>)
> row format delimited fields terminated by '\t'
> collection items terminated by ',';
数据加载到数据库
hive> load data local inpath '/root/data/array.txt' into table person_array;
一些查询操作
hive> select * from person_array;
OK
zhangsan ["beijing","shanghai","hangzhou"]
lisi ["chengdu","wuhan","suzhou","jilin"]
wangwu ["nanjing","kunsan"]
hive> select name,work_locations[0],size(work_locations) from person_array;
OK
zhangsan beijing 3
lisi chengdu 4
wangwu nanjing 2
hive> select name from person_array where array_contains(work_locations,'nanjing');
OK
wangwu
hive> select work_locations[3],work_locations[4] from person_array;
OK
NULL NULL
jilin NULL
NULL NULL
具体可参考Hive的官方文档
Collection Functions
The following built-in collection functions are supported in Hive:
Return Type Name(Signature) Description int size(Map) Returns the number of elements in the map type. int size(Array) Returns the number of elements in the array type. array map_keys(Map) Returns an unordered array containing the keys of the input map. array map_values(Map) Returns an unordered array containing the values of the input map. boolean array_contains(Array, value) Returns TRUE if the array contains value. array sort_array(Array) Sorts the input array in ascending order according to the natural ordering of the array elements and returns it (as of version 0.9.0).
2.map的用法
实例数据map.txt:姓名和学习成绩
zhangsan Math:90,Chinese:92,English:78
lisi Chinese:80,English:94,Math:69,Elective:70
wangwu Math:58,Chinese:95
创建数据库表
hive> create table score_map(name string,score map<string,int>)
> row format delimited fields terminated by '\t'
> collection items terminated by ','
> map keys terminated by ':';
数据加载到数据库
hive> load data local inpath '/root/data/map.txt' into table score_map;
一些查询操作
hive> select * from score_map;
OK
zhangsan {"Math":90,"Chinese":92,"English":78}
lisi {"Chinese":80,"English":94,"Math":69,"Elective":70}
wangwu {"Math":58,"Chinese":95}
hive> select name,score['English'],size(score) from score_map;
OK
zhangsan 78 3
lisi 94 4
wangwu NULL 2
hive> select name from score_map where array_contains(map_keys(score),'Elective');
OK
lisi
3.struct的用法
实例数据struct.txt:姓名、课程及得分
zhangsan Math,90
lisi Chinese,80
wangwu Elective,70
创建数据库表
hive> create table course_struct(name string,course struct<course:string,score:int>)
> row format delimited fields terminated by '\t'
> collection items terminated by ',';
数据加载到数据库
hive> load data local inpath '/root/data/struct.txt' into table course_struct;
一些查询操作
hive> select * from course_struct;
OK
zhangsan {"course":"Math","score":90}
lisi {"course":"Chinese","score":80}
wangwu {"course":"Elective","score":70}
hive> select name,course.score,course.course from course_struct;
OK
zhangsan 90 Math
lisi 80 Chinese
wangwu 70 Elective
4.嵌套复合类型的用法
实例数据combination.txt:姓名、课程及多个得分
zhangsan Math:90,92,78
lisi Chinese:80,94,69,70
wangwu Math:58,95
①创建临时表course_combination_temp
hive> create table course_combination_temp (name string,course_score string)
> row format delimited fields terminated by '\t';
②数据加载到临时表中
hive> load data local inpath '/root/data/combination.txt' into table
> course_combination_temp;
临时表数据
hive> select * from course_combination_temp;
OK
zhangsan Math:90,92,78
lisi Chinese:80,94,69,70
wangwu Math:58,95
③编写UDF
package com.sl.hive;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
import org.apache.hadoop.hive.ql.exec.UDF;
/**
* @description 传入参数格式String:Integer1,Integer2...
* @return Map<String,ArrayList>
*/
public class StringToMapUDF extends UDF {
public Map<String, ArrayList<Integer>> evaluate(String course_score) {
Map<String, ArrayList<Integer>> map = new HashMap<String, ArrayList<Integer>>();
if (course_score == null || course_score.trim() == "") {
return null;
}
ArrayList<Integer> array = new ArrayList<Integer>();
String[] mapSplit = course_score.split("\\:");
if (mapSplit.length > 1) {
String[] arraySplit = mapSplit[1].split("\\,");
for (String score : arraySplit) {
array.add(Integer.valueOf(score));
}
}
map.put(mapSplit[0], array);
return map;
}
}
④程序打成jar包放到服务器上
⑤进入hive客户端,添加jar包
hive> add jar /root/jar/StringToMapUDF.jar;
Added [/root/jar/StringToMapUDF.jar] to class path
Added resources: [/root/jar/StringToMapUDF.jar]
⑥创建临时函数
hive> create temporary function string_to_map as 'com.sl.hive.StringToMapUDF';
⑦测试
hive> select name,string_to_map(course_score) from course_combination_temp;
OK
zhangsan {"Math":[90,92,78]}
lisi {"Chinese":[80,94,69,70]}
wangwu {"Math":[58,95]}
⑧创建正式表,临时表数据导入到正式表
hive> create table course_combination as
> select name,string_to_map(course_score) as score from course_combination_temp;
搜索查看正式表数据
hive> select * from course_combination;
OK
zhangsan {"Math":[90,92,78]}
lisi {"Chinese":[80,94,69,70]}
wangwu {"Math":[58,95]}
⑨先创建表,再插入数据
上述是在创建表的时候通过从临时表中查询出相应的记录并插入到所创建的表中,可以先创建表,再插入数据
1.创建表
hive> create table course_combination2 (name string,score map<string,array<int>>)
> row format delimited fields terminated by '\t';
2.插入数据
hive> insert into table course_combination2
> select name,string_to_map(course_score) from course_combination_temp;
3.查看数据
hive> select * from course_combination2;
OK
zhangsan {"Math":[90,92,78]}
lisi {"Chinese":[80,94,69,70]}
wangwu {"Math":[58,95]}
⑩思考:上述数据使用struct格式存储,即表的格式为
(name string,course_score struct<course:string,score:array<int>>)
本文参考:
http://blog.csdn.net/wf1982/article/details/7474601
http://www.cnblogs.com/ggjucheng/archive/2013/01/31/2868941.html