Hive学习笔记（二）—Hive数据类型和存储格式

最新推荐文章于 2019-12-04 17:06:16 发布

水墨之白

最新推荐文章于 2019-12-04 17:06:16 发布

阅读量344

点赞数

分类专栏： BigData 文章标签： Hive 数据类型存储格式

本文链接：https://blog.csdn.net/LJJZJ/article/details/101776198

版权

BigData 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

Hive 支持关系型数据中大多数基本数据类型，除了额外的三个复杂的数据类型。

一、数据类型

array

类比java中的array 有序的的同类型的集合

create table test(
    id int,
    name string,
    hobby array<string>
)
row format delimited
fields terminated by '\t'
collection items terminated by ',';

array的默认分割是\002，在shell中如何敲出来ctrl+v ctrl+b，这里使用的是逗号分隔

在导入数据时，数据的格式应该如下：

1 张三 read,run

array的引用，使用arrayName[index],索引从0开始

map

类比java中的map key-value,key必须为原始类型，value可以任意类型

create table test(
    id int,
    name string,
    score map<string, float> comment "this is score"
) row format delimited 
fields terminated by '\t'
collection items terminated by ','
map keys terminated by '=';

根据上面的定义，导入的数据格式应该如下：

1 张三 Chinese=102,Math=121,English=124

map里面的默认的key和value之间的分隔符:\003,在shell里面通过ctrl+v ctrl+c
map具体值的调用格式,列名[“属性”],比如score[“math”]

struct

类比java中的object 字段集合,类型可以不同

create table t5_struct (
    id int,
    name string,
    address struct<province:string, city:string, zip:int>
) row format delimited 
fields terminated by '\t'
collection items terminated by ',';

根据上面的定义，导入的数据格式应该如下：

id name address(province:string, city:string, zip:int)
1 小陈 bj,chaoyang,100002
2 老王 hb,shijiazhuang,052260
3 小何 hn,huaiyang,466000
4 小马 hlj,harbin,10000

调用的格式：列名.属性，比如address.province