Hive复合数据类型array,map,struct的使用

Hive中复合数据类型Array,Map,Struct的操作

The following functions construct instances of complex types.

Constructor FunctionOperandsDescription
map(key1, value1, key2, value2, …)Creates a map with the given key/value pairs.
struct(val1, val2, val3, …)Creates a struct with the given field values. Struct field names will be col1, col2, ….
named_struct(name1, val1, name2, val2, …)Creates a struct with the given field names and values. (As of Hive 0.8.0.)
array(val1, val2, …)Creates an array with the given elements.
create_union(tag, val1, val2, …)Creates a union type with the value that is being pointed to by the tag parameter.

1.array的用法

实例数据array.txt:姓名和工作地点

zhangsan    beijing,shanghai,hangzhou
lisi    chengdu,wuhan,suzhou,jilin
wangwu  nanjing,kunsan

创建数据库表

hive> create table person_array(name string,work_locations array<string>)
    > row format delimited fields terminated by '\t'
    > collection items terminated by ',';

数据加载到数据库

hive> load data local inpath '/root/data/array.txt' into table person_array;

一些查询操作

hive> select * from person_array;
OK
zhangsan    ["beijing","shanghai","hangzhou"]
lisi    ["chengdu","wuhan","suzhou","jilin"]
wangwu  ["nanjing","kunsan"]
hive> select name,work_locations[0],size(work_locations) from person_array;
OK
zhangsan    beijing 3
lisi    chengdu 4
wangwu  nanjing 2
hive> select name from person_array where array_contains(work_locations,'nanjing');
OK
wangwu
hive> select work_locations[3],work_locations[4] from person_array;
OK
NULL    NULL
jilin   NULL
NULL    NULL

具体可参考Hive的官方文档

Collection Functions

The following built-in collection functions are supported in Hive:

Return TypeName(Signature)Description
intsize(Map)Returns the number of elements in the map type.
intsize(Array)Returns the number of elements in the array type.
arraymap_keys(Map)Returns an unordered array containing the keys of the input map.
arraymap_values(Map)Returns an unordered array containing the values of the input map.
booleanarray_contains(Array, value)Returns TRUE if the array contains value.
arraysort_array(Array)Sorts the input array in ascending order according to the natural ordering of the array elements and returns it (as of version 0.9.0).

2.map的用法

实例数据map.txt:姓名和学习成绩

zhangsan    Math:90,Chinese:92,English:78
lisi    Chinese:80,English:94,Math:69,Elective:70
wangwu  Math:58,Chinese:95

创建数据库表

hive> create table score_map(name string,score map<string,int>)
    > row format delimited fields terminated by '\t'
    > collection items terminated by ','
    > map keys terminated by ':';

数据加载到数据库

hive> load data local inpath '/root/data/map.txt' into table score_map;

一些查询操作

hive> select * from score_map;
OK
zhangsan    {"Math":90,"Chinese":92,"English":78}
lisi    {"Chinese":80,"English":94,"Math":69,"Elective":70}
wangwu  {"Math":58,"Chinese":95}
hive> select name,score['English'],size(score) from score_map;
OK
zhangsan    78  3
lisi    94  4
wangwu  NULL    2
hive> select name from score_map where array_contains(map_keys(score),'Elective');
OK
lisi

3.struct的用法

实例数据struct.txt:姓名、课程及得分

zhangsan    Math,90
lisi    Chinese,80
wangwu  Elective,70

创建数据库表

hive> create table course_struct(name string,course struct<course:string,score:int>)
    > row format delimited fields terminated by '\t'
    > collection items terminated by ',';

数据加载到数据库

hive> load data local inpath '/root/data/struct.txt' into table course_struct;

一些查询操作

hive> select * from course_struct;
OK
zhangsan    {"course":"Math","score":90}
lisi    {"course":"Chinese","score":80}
wangwu  {"course":"Elective","score":70}
hive> select name,course.score,course.course from course_struct;
OK
zhangsan    90  Math
lisi    80  Chinese
wangwu  70  Elective

4.嵌套复合类型的用法

实例数据combination.txt:姓名、课程及多个得分

zhangsan    Math:90,92,78
lisi    Chinese:80,94,69,70
wangwu  Math:58,95

①创建临时表course_combination_temp

hive> create table course_combination_temp (name string,course_score string)
    > row format delimited fields terminated by '\t';

②数据加载到临时表中

hive> load data local inpath '/root/data/combination.txt' into table 
    > course_combination_temp;

临时表数据

hive> select * from course_combination_temp;
OK
zhangsan    Math:90,92,78
lisi    Chinese:80,94,69,70
wangwu  Math:58,95

③编写UDF

package com.sl.hive;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
import org.apache.hadoop.hive.ql.exec.UDF;
/** 
 * @description 传入参数格式String:Integer1,Integer2...
 * @return Map<String,ArrayList>
 */
public class StringToMapUDF extends UDF {

    public Map<String, ArrayList<Integer>> evaluate(String course_score) {
        Map<String, ArrayList<Integer>> map = new HashMap<String, ArrayList<Integer>>();
        if (course_score == null || course_score.trim() == "") {
            return null;
        }
        ArrayList<Integer> array = new ArrayList<Integer>();
        String[] mapSplit = course_score.split("\\:");
        if (mapSplit.length > 1) {
            String[] arraySplit = mapSplit[1].split("\\,");
            for (String score : arraySplit) {
                array.add(Integer.valueOf(score));
            }
        }
        map.put(mapSplit[0], array);
        return map;
    }

}

④程序打成jar包放到服务器上
⑤进入hive客户端,添加jar包

hive> add jar /root/jar/StringToMapUDF.jar;
Added [/root/jar/StringToMapUDF.jar] to class path
Added resources: [/root/jar/StringToMapUDF.jar]

⑥创建临时函数

hive> create temporary function string_to_map as 'com.sl.hive.StringToMapUDF';

⑦测试

hive> select name,string_to_map(course_score) from course_combination_temp;
OK
zhangsan    {"Math":[90,92,78]}
lisi    {"Chinese":[80,94,69,70]}
wangwu  {"Math":[58,95]}

⑧创建正式表,临时表数据导入到正式表

hive> create table course_combination as 
    > select name,string_to_map(course_score) as score from course_combination_temp;    

搜索查看正式表数据

hive> select * from course_combination;
OK
zhangsan    {"Math":[90,92,78]}
lisi    {"Chinese":[80,94,69,70]}
wangwu  {"Math":[58,95]}

⑨先创建表,再插入数据
上述是在创建表的时候通过从临时表中查询出相应的记录并插入到所创建的表中,可以先创建表,再插入数据
1.创建表

hive> create table course_combination2 (name string,score map<string,array<int>>)
    > row format delimited fields terminated by '\t';

2.插入数据

hive> insert into table course_combination2 
    > select name,string_to_map(course_score) from course_combination_temp;

3.查看数据

hive> select * from course_combination2;
OK
zhangsan    {"Math":[90,92,78]}
lisi    {"Chinese":[80,94,69,70]}
wangwu  {"Math":[58,95]}

⑩思考:上述数据使用struct格式存储,即表的格式为

(name string,course_score struct<course:string,score:array<int>>)

本文参考:
http://blog.csdn.net/wf1982/article/details/7474601
http://www.cnblogs.com/ggjucheng/archive/2013/01/31/2868941.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值