Hive分区表的基本操作及结构

最新推荐文章于 2024-07-29 20:39:23 发布

大数据点滴

最新推荐文章于 2024-07-29 20:39:23 发布

阅读量5k

点赞数 3

分类专栏： # Hive 文章标签： hive sql 分区表

本文链接：https://blog.csdn.net/m0_48283915/article/details/107672279

版权

Hive 专栏收录该内容

22 篇文章 0 订阅

订阅专栏

一、分区表概述

我们知道传统的DBMS系统一般都具有表分区的功能，通过表分区能够在特定的区域检索数据，减少扫描成本，在一定程度上提高查询效率，当然我们还可以通过进一步在分区上建立索引进一步提升查询效率。在此就不赘述了。

在Hive数仓中也有分区分桶的概念，在逻辑上分区表与未分区表没有区别，在物理上分区表会将数据按照分区键的列值存储在表目录的子目录中，目录名=“分区键=键值”。其中需要注意的是分区键的值不一定要基于表的某一列（字段），它可以指定任意值，只要查询的时候指定相应的分区键来查询即可。我们可以对分区进行添加、删除、重命名、清空等操作。因为分区在特定的区域（子目录）下检索数据，它作用同DNMS分区一样，都是为了减少扫描成本。

二、分区表的基本操作

1. 创建分区表的语法（这里以根据日期对日志进行管理）

create table dept_partition(
deptno int, dname string, loc string
)
partitioned by (month string)
row format delimited fields terminated by '\t';

默认建表后路径为：/user/hive/warehouse（如图）
在这里插入图片描述

注意：分区字段不能是表中已经存在的数据，可以将分区字段看作表的伪列。

2. 加载数据到分区表中

load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201709');
load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201708');
load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201707');

注意：分区表加载数据时，必须指定分区

加载以后的分区名（目录名=“分区键=键值”）及路径：
在这里插入图片描述
分区数据的路径：

3.查询表内容

select * from dept_partition;

在这里插入图片描述

4. 查询分区表中数据
单分区查询：

select * from dept_partition where month='201707'

查询结果：
在这里插入图片描述
多分区联合查询：

select * from dept_partition where month='201707'
        union
        select * from dept_partition where month='201708';

查询结果：
在这里插入图片描述
5. 增加分区

创建单个分区：

alter table dept_partition add partition(month='201706');

在这里插入图片描述
同时创建多个分区：

alter table dept_partition add partition(month='201705') partition(month='201704');

在这里插入图片描述
6.删除分区
创建单个分区：

alter table dept_partition add partition(month='201706');

在这里插入图片描述
同时删除多个分区：

 alter table dept_partition drop partition (month='201705'), partition (month='201706');

在这里插入图片描述
注意;分区之间有 " , " 隔开。

7.查看分区表有多少分区

show partitions dept_partition;

在这里插入图片描述
8.查看分区表结构

desc  dept_partition;

在这里插入图片描述

desc formatted dept_partition;

在这里插入图片描述

三、二级分区表的基本操作及注意事项

1. 创建二级分区表

 create table dept_partition2(
               deptno int, dname string, loc string
               )
               partitioned by (month string, day string)
               row format delimited fields terminated by '\t';

建表后的路径：
在这里插入图片描述
2. 正常的加载数据

load data local inpath '/opt/module/datas/dept.txt' into table
 default.dept_partition2 partition(month='201709', day='13');

加载以后的表结构：
在这里插入图片描述
3.查询分区数据

select * from dept_partition2 where month='201709' and day='13';

查询结果：
在这里插入图片描述

4.把数据直接上传到分区目录上，让分区表和数据产生关联的三种方式
方式一：上传数据后修复
1.上传数据-----先建立分区文件夹（day=12），并上传文件dept.txt

dfs -mkdir -p
 /user/hive/warehouse/dept_partition2/month=201709/day=12;

dfs -put /opt/module/datas/dept.txt  /user/hive/warehouse/dept_partition2/month=201709/day=12;

建分区后的路径：
在这里插入图片描述
文件的路径：

2. 查询数据（查询不到刚上传的数据）

select * from dept_partition2 where month='201709' and day='12';

查询结果：
在这里插入图片描述
3. 执行修复命令

msck repair table dept_partition2;片

再次查询：

select * from dept_partition2 where month='201709' and day='12';

查询结果：
在这里插入图片描述
方式二：上传数据后添加分区
1.上传数据-----先建立分区文件夹（day=12），并上传文件dept.txt

dfs -mkdir -p
 /user/hive/warehouse/dept_partition2/month=201709/day=11;
hive (default)> dfs -put /opt/module/datas/dept.txt

dfs -put /opt/module/datas/dept.txt  /user/hive/warehouse/dept_partition2/month=201709/day=11;

建分区后路径：
在这里插入图片描述
2. 执行添加分区

alter table dept_partition2 add partition(month='201709',
 day='11');

查询数据

select * from dept_partition2 where month='201709' and day='11';

查询结果：
在这里插入图片描述
方式三：创建文件夹后load数据到分区
1.创建目录

dfs -mkdir -p
 /user/hive/warehouse/dept_partition2/month=201709/day=10;

建分区后路径：
在这里插入图片描述
2.上传数据

load data local inpath '/opt/module/datas/dept.txt' into table
 dept_partition2 partition(month='201709',day='10');

上传结果：
在这里插入图片描述

3.查询数据

select * from dept_partition2 where month='201709' and day='10';

在这里插入图片描述
注：以上均本人制作，有不足之处欢迎提出来，一起进步。

大数据点滴

关注

3
点赞
踩
9

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录