Hive分区表

最新推荐文章于 2024-07-29 20:39:23 发布

雲小妖

最新推荐文章于 2024-07-29 20:39:23 发布

阅读量239

点赞数

分类专栏： Hive 文章标签：分区表 hive

本文链接：https://blog.csdn.net/fengjinghong/article/details/109715994

版权

Hive 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

本文详细介绍了Hive中的分区表概念，包括如何创建、加载数据、查询、增加和删除分区。通过示例展示了单个和多分区查询，以及如何处理二级分区。此外，还探讨了将数据直接上传到HDFS分区目录并使其与分区表关联的三种方法，包括使用`MSCK REPAIR TABLE`、手动添加分区和`LOAD DATA`命令。

摘要由CSDN通过智能技术生成

1、分区表：

分区表实际上就是对应一个HDFS文件系统上的独立的文件夹，该文件夹下是该分区所有的数据文件。Hive中的分区就是分目录，把一个大的数据集根据业务需要分割成小的数据集。在查询时通过WHERE子句中的表达式选择查询所需要的指定的分区，这样的查询效率会提高很多。(分区，“伪列”)。
源数据：

10	ACCOUNTING	bj
20	RESEARCH	sh
30	SALES	cq
40	OPERATIONS	gz

创建分区表语法：

0: jdbc:hive2://bigdata01:10000> create table dept_partition(id int,name string, loc string)                                                                                
0: jdbc:hive2://bigdata01:10000> partitioned by (month string)
0: jdbc:hive2://bigdata01:10000> row format delimited fields terminated by '\t';

加载数据到分区表中(注意：分区表加载数据时，必须指定分区)：


0: jdbc:hive2://bigdata01:10000> load data local inpath '/opt/module/datas/stu1.txt' into table dept_partition partition(month='202009');
0: jdbc:hive2://bigdata01:10000> load data local inpath '/opt/module/datas/stu1. txt' into table dept_partition partition(month='202010');                                                                                       
0: jdbc:hive2://bigdata01:10000> load data local inpath '/opt/module/datas/stu1.txt' into table dept_partition partition(month='202011');

查询所有分区表的数据：

0: jdbc:hive2://bigdata01:10000> select * from dept_partition;

在这里插入图片描述

2、查询数据

查询单个分区数据：

0: jdbc:hive2://bigdata01:10000> select * from dept_partition where month='202009';

在这里插入图片描述

多分区联合查询（方式一）

 select * from dept_partition where month='202009' union select * from dept_partition where month='202010';

在这里插入图片描述

多分区联合查询（方式二）：

0: jdbc:hive2://bigdata01:10000> select * from dept_partition where month='202010' or month='202009';

在这里插入图片描述

3、增加分区

创建单个分区

0: jdbc:hive2://bigdata01:10000> alter table dept_partition add partition(month='202012');

在这里插入图片描述

同时创建多个分区

0: jdbc:hive2://bigdata01:10000> alter table dept_partition add partition(month='202001') partition(month='202002');

在这里插入图片描述

4、删除分区

删除单个分区

0: jdbc:hive2://bigdata01:10000> alter table dept_partition drop partition(month='202012');

同时删除多个分区

0: jdbc:hive2://bigdata01:10000> alter table dept_partition drop partition(month='202001'),partition(month='202002');

5、查看分区表有多少分区

 show partitions dept_partition;

在这里插入图片描述

6、查看分区表结构

0: jdbc:hive2://bigdata01:10000> desc formatted dept_partition;

在这里插入图片描述

7、分区表注意事项

创建二级分区表

0: jdbc:hive2://bigdata01:10000> create table dept_partition2(id int,name string,loc string) partition                                                                 ed by (month string,day string) row format delimited fields terminated by '\t';

加载数据到二级分区表中

0: jdbc:hive2://bigdata01:10000> load data local inpath '/opt/module/datas/stu1.txt' into table dept_partition2 partition(month='202001',day='1');

查询分区数据

0: jdbc:hive2://bigdata01:10000> select * from dept_partition2 where month='202001' and day='1';

8、把数据直接上传到（HDFS）分区目录上，让分区表和数据产生关联的三种方式

方式一：上传数据后修复
1、上传数据

0: jdbc:hive2://bigdata01:10000> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=202001/day=2;
0: jdbc:hive2://bigdata01:10000> dfs -put /opt/module/datas/stu1.txt /user/hive/warehouse/dept_partition2/month=202001/day=2;

查询结果为空
在这里插入图片描述

2、执行修复命令

0: jdbc:hive2://bigdata01:10000> msck repair table dept_partition2;

3、查询数据

select * from dept_partition2 where month='202001' and d                                                                              ay='2';

在这里插入图片描述

方式二：上传数据后添加分区

上传数据

0: jdbc:hive2://bigdata01:10000> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=202001/day=3;
0: jdbc:hive2://bigdata01:10000> dfs -put /opt/module/datas/stu1.txt /user/hive/warehouse/dept_partition2/month=202001/day=3;

执行添加分区

0: jdbc:hive2://bigdata01:10000> alter table dept_partition2 add partition(month='202001',day='3');

查询数据

0: jdbc:hive2://bigdata01:10000> select * from dept_partition2 where month='202001' and day='3';

在这里插入图片描述

方式三：创建文件夹后load数据到分区

创建目录

0: jdbc:hive2://bigdata01:10000> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=202001/day=4;

上传数据

0: jdbc:hive2://bigdata01:10000> load data local inpath '/opt/module/datas/stu1.txt' into table dept_partition2 partition(month='202001',day='10');

查询数据

0: jdbc:hive2://bigdata01:10000> select * from dept_partition2 where month='202001' and day='10';

在这里插入图片描述

浏览器访问：

在这里插入图片描述

雲小妖

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录