Hive分区表总结

最新推荐文章于 2024-09-08 18:21:03 发布

置顶攻城狮Kevin

最新推荐文章于 2024-09-08 18:21:03 发布

阅读量2.7k

点赞数 3

分类专栏： Hive 文章标签： Hive

本文链接：https://blog.csdn.net/wx1528159409/article/details/89641989

版权

Hive 专栏收录该内容

66 篇文章 17 订阅

订阅专栏

Hive中的分区其实就是分目录，根据某些维度（例如时间等）将数据表分成多份，一个分区表对应HDFS文件系统上一个独立的文件夹，该文件夹下是该分区所有的数据文件；

查询时，通过where表达式选择查询所指定的分区即可，不用再查询整张表的所有数据，提高海量数据查询的效率。

（1）创建分区表

hive (hive_db1)> create table stu_par(id int , name string)
               > partitioned by (month string)
               > row format delimited fields terminated by '\t';

表的字段 id，name；分区字段为month

注意：创建分区表时，分区字段不能是表中的已有字段，否则会报错column repeated in partitioning columns；这也说明分区字段并不是表中的一列，它是一个伪列，对应HDFS中的一个分区文件夹。

（2）往分区表里导数据（load data local inpath '~' into table ~ partition(分区字段 = '~')）

hive (hive_db1)> load data local inpath '/opt/module/datas/student.txt' into table stu_par partition(month = '12');

相同的方法，给分区month=11和month=10也导入student.txt的数据；

ps：由于创建分区表时，没有指定location，它默认保存在当前数据库/hive路径下，创建分区后：

看到在/hive_db1/stu_par目录下，有三个month文件夹；

/hive_db1是当前数据库的默认路径，stu_par是分区表的存储路径，三个month是三个分区的存储路径，每个month文件夹下都有一个student.txt文件；

这就是分区表在HDFS上的存储详情。

ps：如果在default数据库中创建分区表，它的默认存储位置是/user/hive/warehouse/stu_par

（3）创建二级分区表

与创建一级分区表类似，区别点例如：partitioned by(month string , day string)

上传数据也一致，只是需要指定两级分区

hive (hive_db1)> load data local inpath '/opt/module/datas/dept.txt' into table
 dept_partition2 partition(month='201709',day='10');

（4）分区表数据查询（select * from ~ where 分区字段 = '~'）

为了方便显示，接下来用JDBC客户端连接hive，查询如下：

单分区查询：

0: jdbc:hive2://hadoop100:10000> select * from stu_par where month = '12';
+-------------+---------------+----------------+--+
| stu_par.id  | stu_par.name  | stu_par.month  |
+-------------+---------------+----------------+--+
| 1001        | lisi          | 12             |
| 1002        | wangwu        | 12             |
+-------------+---------------+----------------+--+

多分区联合查询：

0: jdbc:hive2://hadoop100:10000> select * from stu_par where month = '12'
0: jdbc:hive2://hadoop100:10000> union
0: jdbc:hive2://hadoop100:10000> select * from stu_par where month = '11'
0: jdbc:hive2://hadoop100:10000> union
0: jdbc:hive2://hadoop100:10000> select * from stu_par where month = '10';

+---------+-----------+------------+--+
| _u3.id  | _u3.name  | _u3.month  |
+---------+-----------+------------+--+
| 1       | one       | 11         |
| 2       | two       | 11         |
| 3       | three     | 11         |
| 101     | kevin     | 10         |
| 102     | john      | 10         |
| 103     | daniel    | 10         |
| 104     | lee       | 10         |
| 1001    | lisi      | 12         |
| 1002    | wangwu    | 12         |
+---------+-----------+------------+--+

多级分区查询：

hive> select * from table where concat_ws('-',dt,proj_id)='20190506-8535';

（5）增加分区

增加单个分区：

hive (hive_db1)> alter table stu_par add partition(month = '1');

同时增加多个分区

hive (hive_db1)> alter table stu_par add partition(month = '2') partition(month = '3');

（6）删除分区

删除单个分区：

hive (hive_db1)> alter table stu_par drop partition(month = '1');
Dropped the partition month=1
OK

同时删除多个分区：

hive (hive_db1)> alter table stu_par drop partition(month = '2'),partition(month = '3');
Dropped the partition month=2
Dropped the partition month=3
OK

（7）查看分区数

hive (hive_db1)> show partitions stu_par;
OK
partition
month=10
month=11
month=12

攻城狮Kevin

关注

3
点赞
踩
9

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录