CDH环境下关于Hive的部分命令（二）

最新推荐文章于 2021-11-08 11:18:55 发布

原创最新推荐文章于 2021-11-08 11:18:55 发布 · 360 阅读

0 ·

CC 4.0 BY-SA版权

Hive 专栏收录该内容

8 篇文章

订阅专栏

本文详细介绍了Hive中的数据类型，包括集合数据类型的格式和语法，以及如何使用外部表和分区表来优化数据存储和查询效率。通过具体实例展示了如何创建、加载数据、查询和管理分区。

1.集合数据类型
数据格式：

xiaoming,basketball_volleyball,phone1:18_phone2:19,x_xxx
xiaoli,basketball_badminton,phone1:18_phone2:19,xx_xxx

创表语法：

create table demo(
name string,
like array<string>,
phone map<string, int>,
body struct<height:string, weight:string>
)
row format delimited fields terminated by ','
collection items terminated by '_'
map keys terminated by ':'
lines terminated by '\n';

row format delimited fields terminated by ‘,’ 列分隔符
collection items terminated by ‘_’ 数据分割符号
map keys terminated by ‘:’ MAP中的key和value的分隔符
lines terminated by ‘\n’ 行分隔符

查询语法：

select like[0],phone['phone1'],body.height from demo
where name="xiaoming";

2.外部表
创表语法：

create external table if not exists test(
id int,
name string,
age int
)
row format delimited fields terminated by '\t';

如果删除外部表，会将元数据删除而实际的数据不会删除

3.分区表
在Hive Select查询中一般会扫描整个表内容，会消耗很多时间做没必要的工作。
有时候只需要扫描表中关心的一部分数据，因此建表时引入了partition概念。
分区表指的是在创建表时指定的partition的分区空间。
当前互联网应用每天都要存储大量的日志文件，几G、几十G甚至更大都是有可能。
存储日志，其中必然有个属性是日志产生的日期。
在产生分区时，就可以按照日志产生的日期列进行划分。把每一天的日志当作一个分区。
将数据组织成分区，主要可以提高数据的查询速度。

创表语法：

create table test(
id int, name string, age int
)
partitioned by (month string)
row format delimited fields terminated by '\t';

为分区表加载数据：

load data local inpath '/run1/f.txt' into table test partition(month='201906');
load data local inpath '/run1/f.txt' into table test partition(month='201907');

分区会在加载数据时生成

单分区查询

select * from test where month='201906';

多分区查询

select * from tset where month='201906'
union
select * from test where month='201907';

单独增加分区

alter table test add partition(month='201908') ;

单独增加多个分区

alter table test add partition(month='201909') partition(month='201910');

删除分区

alter table test drop partition (month='201910');

删除多个分区

alter table test drop partition (month='201908'), partition (month='201909');