Hive分区表(重点)

最新推荐文章于 2024-04-22 09:30:00 发布

梦里Coding

最新推荐文章于 2024-04-22 09:30:00 发布

阅读量304

点赞数

分类专栏： Hive 文章标签： hive big data

本文链接：https://blog.csdn.net/weixin_43586713/article/details/116482226

版权

Hive 专栏收录该内容

43 篇文章 5 订阅

订阅专栏

1.分区表:

实际上就是对应一个HDFS文件系统个上的独立的文件夹,该文件夹下是该分区所有的数据文件.Hive中的分区就是分目录,把一个大的数据集根据业务需要分割成小的数据集.在查询的时候时,根据where条件找到分目录下获取数据,避免做了全局扫描,提高了查询的效率.
注意:实际上,分区也是表的一个字段,且该字段往往放在所有字段的最后,所以后面做项目或者突然多添加了一行数据的时候,都能正常理解。

2.基本操作演示

（1）创建分区表的语法

create table dept_partition(
deptno int, 
dname string, 
loc string
)
partitioned by (no string)
row format delimited fields terminated by '\t';

（2）加载数据到分区表中

load data local inpath '/opt/module/datas/dept1.txt' 
into table dept_partition 
partition (no='2021-05-07');

load data local inpath '/opt/module/datas/dept2.txt' 
into table dept_partition 
partition (no='2021-05-08');

load data local inpath '/opt/module/datas/dept3.txt' 
into table dept_partition 
partition (no='2021-05-09');

进如到dept_partition数据库查看表结构：
在这里插入图片描述

生成了分区表目录：no=2021-05-07，no=2021-05-08，no=2021-05-09三个独立目录，每个目录里面又有相对应的文件。

（3）查询分区表中的数据：

单分区查询

select * from dept_partition where no='2021-05-07';

多分区查询

select * from dept_partition where no='2021-05-07'
union
select * from dept_partition where no='2021-05-08'
union
select * from dept_partition where no='2021-05-09';

使用union多分区查询的时候会走mapreduce。
如果是查询所有的分区，直接采用select * from dept_partition;不走mapreduce速度更快。

（4）增加分区

增加单个分区：

alter table dept_partition add partition(no='2021-05-10');

增加多个分区：

alter table dept_partition add 
partition(no='2021-05-11') 
partition(no='2021-05-12');

下图为增加的新分区：
在这里插入图片描述

（5）删除分区

删除单个分区：

alter table dept_partition drop partition(no='2021-05-10');

删除多个分区：

alter table dept_partition drop
partition(no='2021-05-11'), 
partition(no='2021-05-12');

注意：在这里，drop后面的多个分区，之间要采用逗号来隔开。而增加多个分区的时候中间采用空格即可。

二级分区

创建二级分区表：

create table dept_partition2(
deptno int, dname string, loc string
)
partitioned by (month string,day string)
row format delimited fields terminated by '\t';

表的结构如下：
在这里插入图片描述
加载数据到二级目录下：

load data local inpath '/opt/module/datas/dept1.txt' 
into table dept_partition2
partition(month='2021-05',day='07');

查询分区下的数据：

select * from dept_partition2 where month='2021-05' and day ='07';

梦里Coding

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
Hive分区表(重点)

1.分区表:实际上就是对应一个HDFS文件系统个上的独立的文件夹,该文件夹下是该分区所有的数据文件.Hive中的分区就是分目录,把一个大的数据集根据业务需要分割成小的数据集.在查询的时候时,根据where条件找到分目录下获取数据,避免做了全局扫描,提高了查询的效率.2.基本操作演示（1）创建分区表的语法create table dept_partition(deptno int, dname string, loc string)partitioned by (no string)row
复制链接

扫一扫