Hive复合数据类型array,map,struct的使用

最新推荐文章于 2024-08-02 00:27:30 发布

LifeIsForSharing

最新推荐文章于 2024-08-02 00:27:30 发布

阅读量7k

点赞数 2

分类专栏： Hive 文章标签： hive array map struct 嵌套复合类型

本文链接：https://blog.csdn.net/sl1992/article/details/53894481

版权

Hive 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

1.array的用法
2.map的用法
3.struct的用法
4.嵌套复合类型的用法

Hive中复合数据类型Array，Map，Struct的操作

The following functions construct instances of complex types.

Constructor Function Operands Description
map (key1, value1, key2, value2, …) Creates a map with the given key/value pairs.
struct (val1, val2, val3, …) Creates a struct with the given field values. Struct field names will be col1, col2, ….
named_struct (name1, val1, name2, val2, …) Creates a struct with the given field names and values. (As of Hive 0.8.0.)
array (val1, val2, …) Creates an array with the given elements.
create_union (tag, val1, val2, …) Creates a union type with the value that is being pointed to by the tag parameter.

Constructor Function	Operands	Description
map	(key1, value1, key2, value2, …)	Creates a map with the given key/value pairs.
struct	(val1, val2, val3, …)	Creates a struct with the given field values. Struct field names will be col1, col2, ….
named_struct	(name1, val1, name2, val2, …)	Creates a struct with the given field names and values. (As of Hive 0.8.0.)
array	(val1, val2, …)	Creates an array with the given elements.
create_union	(tag, val1, val2, …)	Creates a union type with the value that is being pointed to by the tag parameter.

1.array的用法

实例数据array.txt：姓名和工作地点

zhangsan    beijing,shanghai,hangzhou
lisi    chengdu,wuhan,suzhou,jilin
wangwu  nanjing,kunsan

创建数据库表

hive> create table person_array(name string,work_locations array<string>)
    > row format delimited fields terminated by '\t'
    > collection items terminated by ',';

数据加载到数据库

hive> load data local inpath '/root/data/array.txt' into table person_array;

一些查询操作

hive> select * from person_array;
OK
zhangsan    ["beijing","shanghai","hangzhou"]
lisi    ["chengdu","wuhan","suzhou","jilin"]
wangwu  ["nanjing","kunsan"]

hive> select name,work_locations[0],size(work_locations) from person_array;
OK
zhangsan    beijing 3
lisi    chengdu 4
wangwu  nanjing 2

hive> select name from person_array where array_contains(work_locations,'nanjing');
OK
wangwu

hive> select work_locations[3],work_locations[4] from person_array;
OK
NULL    NULL
jilin   NULL
NULL    NULL

具体可参考Hive的官方文档

Collection Functions

The following built-in collection functions are supported in Hive:

Return Type Name(Signature) Description
int size(Map) Returns the number of elements in the map type.
int size(Array) Returns the number of elements in the array type.
array map_keys(Map) Returns an unordered array containing the keys of the input map.
array map_values(Map) Returns an unordered array containing the values of the input map.
boolean array_contains(Array, value) Returns TRUE if the array contains value.
array sort_array(Array) Sorts the input array in ascending order according to the natural ordering of the array elements and returns it (as of version 0.9.0).

Return Type	Name(Signature)	Description
int	size(Map)	Returns the number of elements in the map type.
int	size(Array)	Returns the number of elements in the array type.
array	map_keys(Map)	Returns an unordered array containing the keys of the input map.
array	map_values(Map)	Returns an unordered array containing the values of the input map.
boolean	array_contains(Array, value)	Returns TRUE if the array contains value.
array	sort_array(Array)	Sorts the input array in ascending order according to the natural ordering of the array elements and returns it (as of version 0.9.0).

2.map的用法

实例数据map.txt：姓名和学习成绩

zhangsan    Math:90,Chinese:92,English:78
lisi    Chinese:80,English:94,Math:69,Elective:70
wangwu  Math:58,Chinese:95

创建数据库表

hive> create table score_map(name string,score map<string,int>)
    > row format delimited fields terminated by '\t'
    > collection items terminated by ','
    > map keys terminated by ':';

数据加载到数据库

hive> load data local inpath '/root/data/map.txt' into table score_map;

一些查询操作

hive> select * from score_map;
OK
zhangsan    {"Math":90,"Chinese":92,"English":78}
lisi    {"Chinese":80,"English":94,"Math":69,"Elective":70}
wangwu  {"Math":58,"Chinese":95}

hive> select name,score['English'],size(score) from score_map;
OK
zhangsan    78  3
lisi    94  4
wangwu  NULL    2

hive> select name from score_map where array_contains(map_keys(score),'Elective');
OK
lisi

3.struct的用法

实例数据struct.txt：姓名、课程及得分

zhangsan    Math,90
lisi    Chinese,80
wangwu  Elective,70

创建数据库表

hive> create table course_struct(name string,course struct<course:string,score:int>)
    > row format delimited fields terminated by '\t'
    > collection items terminated by ',';

数据加载到数据库

hive> load data local inpath '/root/data/struct.txt' into table course_struct;

一些查询操作

hive> select * from course_struct;
OK
zhangsan    {"course":"Math","score":90}
lisi    {"course":"Chinese","score":80}
wangwu  {"course":"Elective","score":70}

hive> select name,course.score,course.course from course_struct;
OK
zhangsan    90  Math
lisi    80  Chinese
wangwu  70  Elective

4.嵌套复合类型的用法

实例数据combination.txt：姓名、课程及多个得分

zhangsan    Math:90,92,78
lisi    Chinese:80,94,69,70
wangwu  Math:58,95

①创建临时表course_combination_temp

hive> create table course_combination_temp (name string,course_score string)
    > row format delimited fields terminated by '\t';

②数据加载到临时表中

hive> load data local inpath '/root/data/combination.txt' into table 
    > course_combination_temp;

临时表数据

hive> select * from course_combination_temp;
OK
zhangsan    Math:90,92,78
lisi    Chinese:80,94,69,70
wangwu  Math:58,95

③编写UDF

package com.sl.hive;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
import org.apache.hadoop.hive.ql.exec.UDF;
/** 
 * @description 传入参数格式String:Integer1,Integer2...
 * @return Map<String,ArrayList>
 */
public class StringToMapUDF extends UDF {

    public Map<String, ArrayList<Integer>> evaluate(String course_score) {
        Map<String, ArrayList<Integer>> map = new HashMap<String, ArrayList<Integer>>();
        if (course_score == null || course_score.trim() == "") {
            return null;
        }
        ArrayList<Integer> array = new ArrayList<Integer>();
        String[] mapSplit = course_score.split("\\:");
        if (mapSplit.length > 1) {
            String[] arraySplit = mapSplit[1].split("\\,");
            for (String score : arraySplit) {
                array.add(Integer.valueOf(score));
            }
        }
        map.put(mapSplit[0], array);
        return map;
    }

}

④程序打成jar包放到服务器上
⑤进入hive客户端，添加jar包

hive> add jar /root/jar/StringToMapUDF.jar;
Added [/root/jar/StringToMapUDF.jar] to class path
Added resources: [/root/jar/StringToMapUDF.jar]

⑥创建临时函数

hive> create temporary function string_to_map as 'com.sl.hive.StringToMapUDF';

⑦测试

hive> select name,string_to_map(course_score) from course_combination_temp;
OK
zhangsan    {"Math":[90,92,78]}
lisi    {"Chinese":[80,94,69,70]}
wangwu  {"Math":[58,95]}

⑧创建正式表，临时表数据导入到正式表

hive> create table course_combination as 
    > select name,string_to_map(course_score) as score from course_combination_temp;

搜索查看正式表数据

hive> select * from course_combination;
OK
zhangsan    {"Math":[90,92,78]}
lisi    {"Chinese":[80,94,69,70]}
wangwu  {"Math":[58,95]}

⑨先创建表，再插入数据
上述是在创建表的时候通过从临时表中查询出相应的记录并插入到所创建的表中，可以先创建表，再插入数据
1.创建表

hive> create table course_combination2 (name string,score map<string,array<int>>)
    > row format delimited fields terminated by '\t';

2.插入数据

hive> insert into table course_combination2 
    > select name,string_to_map(course_score) from course_combination_temp;

3.查看数据

hive> select * from course_combination2;
OK
zhangsan    {"Math":[90,92,78]}
lisi    {"Chinese":[80,94,69,70]}
wangwu  {"Math":[58,95]}

⑩思考:上述数据使用struct格式存储，即表的格式为

(name string,course_score struct<course:string,score:array<int>>)

本文参考：
http://blog.csdn.net/wf1982/article/details/7474601
http://www.cnblogs.com/ggjucheng/archive/2013/01/31/2868941.html

LifeIsForSharing

关注

2
点赞
踩
12

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录