hive的复杂数据类型和分区表

复杂数据类型
针对复杂数据类型,要了解
1)如何存放
2)如何查询

数组类型
Array(1,2,3,4) 要求装的数据类型是一样的

create table hive_array(name string, work_locations array<string>)
row format delimited fields terminated by '\t'
COLLECTION ITEMS TERMINATED BY ',';

取数据
select name, work_locations[0] from hive.array; 取第0个
select name, work_locations[3] from hive.array; 取第3个 

查找是否有指定数据
select * from hive_array where array_contains(work_locations,'tianjin');

求长度
select name,size(work_locations) as num from hive_array;

map
Map(‘a’,1,‘b’,c) 要求key的类型一样

create table hive_map(id int,name string, members map<string,string>, age int)
row format delimited fields terminated by ','
COLLECTION ITEMS TERMINATED BY '#'
MAP KEYS TERMINATED BY ':';

取数据(家庭成员关系的个数)
select name,size(members) as size,members['father'] as father,members['brother'] as brother,map_keys(members) as keys,map_values(members) as values from hive_map;

选有兄弟的
select name,size(members) as size,members['father'] as father,members['brother'] as brother,map_keys(members) as keys,map_values(members) as values from hive_map where array_contains(map_keys(members),'brother');

struct

create table hive_struct(ip string, userinfo struct<name:string, age:int>)
row format delimited fields terminated by '#'
COLLECTION ITEMS TERMINATED BY ':';

取数据
select ip,userinfo.name,userinfo.age from hive_struct;

分区表
hive的分区表是一张表,只是不同的分区在不同的目录里面。真正的表的字段是不包括分区字段的,分区字段只是hdfs上的文件夹的名称。在HDFS上的数据存储目录:tablename/partition_column=partition_value

1.一级分区

create table order_partiton(
order_no string,
order_time string
)
PARTITIONED BY (event_month string)
row format delimited fields terminated by '\t';

加载数据,加载之后,hdfs存储路径为order_partiton/event_month=2014-05
load data local inpath '/home/hadoop/data/order_created.txt' into table order_partiton PARTITION (event_month='2014-05');	

2.多级分区

create table order_mulit_partiton(
order_no string,
order_time string
)
PARTITIONED BY (event_month string, step string)
row format delimited fields terminated by '\t';

load data local inpath '/home/hadoop/data/order_created.txt' into table order_mulit_partiton PARTITION (event_month='2014-05', step='1');	

在生产上,一般:数据经过清洗后存放到hdfs目录上,然后将目录的数据加载到分区表中,分为静态分区和动态分区

load data inpath '' into table order_partiton PARTITION (event_month='2014-06');

例题:将相同部门的人写到一个分区表里面去

1)静态分区

CREATE TABLE emp_partition(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double
)
PARTITIONED BY (deptno int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

insert into table emp_partition partition(deptno=30)
select empno,ename,job,mgr,hiredate,sal,comm from ruozedata_emp where deptno=10;

2)动态分区

create table emp_dynamic_partition(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double
)
PARTITIONED BY (deptno int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

insert overwrite table emp_dynamic_partition partition(deptno)
select empno,ename,job,mgr,hiredate,sal,comm,deptno from ruozedata_emp;

动态分区不需要指定特定值,即partition后不需跟value

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值