Hive的分区表

最新推荐文章于 2024-05-10 07:38:43 发布

章鱼哥TuNan&Z

最新推荐文章于 2024-05-10 07:38:43 发布

阅读量127

点赞数

分类专栏： # Hive 文章标签： hive big data

本文链接：https://blog.csdn.net/qq_43528451/article/details/120511990

版权

Hive 专栏收录该内容

59 篇文章 0 订阅

订阅专栏

分区表

基本操作

在大数据中，最常用的一种思想就是分治，我们可以把大的文件切割划分成一个个的小的文件，这样每次操作一个小的文件就会很容易了，同样的道理，在hive当中也是支持这种思想的，就是我们可以把大的数据，按照每天，或者每小时进行切分成一个个的小的文件，这样去操作小的文件就会容易得多了。

创建分区表语法

create table score(s_id string,c_id string, s_score int) partitioned by (month string) row format delimited fields terminated by '\t';

创建一个表带多个分区

create table score2 (s_id string,c_id string, s_score int) partitioned by (year string,month string,day string) row format delimited fields terminated by '\t';

加载数据到分区表中

load data local inpath '/export/servers/hivedatas/score.txt' into table score partition (month='202006');

加载数据到一个多分区表中去

load data local inpath '/export/servers/hivedatas/score.txt' into table score2 partition(year='2020',month='06',day='01');

查看分区

show partitions score;

添加一个分区

alter table score add partition(month='202005');

同时添加多个分区

alter table score add partition(month='202004') partition(month='202003');

注意：添加分区之后就可以在hdfs文件系统当中看到表下面多了一个文件夹

删除分区

alter table score drop partition(month='202006')

外部分区表综合练习

需求描述：现在有一个文件score.txt文件，存放在集群的这个目录下/scoredatas/month=202006，这个文件每天都会生成，存放到对应的日期文件夹下面去，文件别人也需要公用，不能移动。需求，创建hive对应的表，并将数据加载到表中，进行数据统计分析，且删除表之后，数据不能删除。(外部表)

1、数据准备;

hadoop fs -mkdir -p /scoredatas/month=202006
hadoop fs -put score.txt /scoredatas/month=202006/

2、创建外部分区表，并指定文件数据存放目录

create external table score4(s_id string,c_id string,s_score int) partitioned by (month string) row format delimited fields terminated by '\t' location '/scoredatas'

3、进行表的修复，说白了就是建立我们表与我们数据文件之间的一个关系映射

msck repair table score4；

修复成功之后即可看到数据已经全部加载到表当中去了

第二种实现方式，上传数据之后手动添加分区即可

数据准备：

hadoop fs -mkdir -p /scoredatas/month=202005

章鱼哥TuNan&Z

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
Hive的分区表

分区表基本操作在大数据中，最常用的一种思想就是分治，我们可以把大的文件切割划分成一个个的小的文件，这样每次操作一个小的文件就会很容易了，同样的道理，在hive当中也是支持这种思想的，就是我们可以把大的数据，按照每天，或者每小时进行切分成一个个的小的文件，这样去操作小的文件就会容易得多了。创建分区表语法create table score(s_id string,c_id string, s_score int) partitioned by (month string) row format
复制链接

扫一扫