1、分区表:
分区表实际上就是对应一个HDFS文件系统上的独立的文件夹,该文件夹下是该分区所有的数据文件。Hive中的分区就是分目录,把一个大的数据集根据业务需要分割成小的数据集。在查询时通过WHERE子句中的表达式选择查询所需要的指定的分区,这样的查询效率会提高很多。(分区,“伪列”)。
源数据:
10 ACCOUNTING bj
20 RESEARCH sh
30 SALES cq
40 OPERATIONS gz
- 创建分区表语法:
0: jdbc:hive2://bigdata01:10000> create table dept_partition(id int,name string, loc string)
0: jdbc:hive2://bigdata01:10000> partitioned by (month string)
0: jdbc:hive2://bigdata01:10000> row format delimited fields terminated by '\t';
- 加载数据到分区表中(注意:分区表加载数据时,必须指定分区):
0: jdbc:hive2://bigdata01:10000> load data local inpath '/opt/module/datas/stu1.txt' into table dept_partition partition(month='202009');
0: jdbc:hive2://bigdata01:10000> load data local inpath '/opt/module/datas/stu1. txt' into table dept_partition partition(month='202010');
0: jdbc:hive2://bigdata01:10000> load data local inpath '/opt/module/datas/stu1.txt' into table dept_partition partition(month='202011');
- 查询所有分区表的数据:
0: jdbc:hive2://bigdata01:10000> select * from dept_partition;
2、查询数据
- 查询单个分区数据:
0: jdbc:hive2://bigdata01:10000> select * from dept_partition where month='202009';
- 多分区联合查询(方式一)
select * from dept_partition where month='202009' union select * from dept_partition where month='202010';
- 多分区联合查询(方式二):
0: jdbc:hive2://bigdata01:10000> select * from dept_partition where month='202010' or month='202009';
3、增加分区
- 创建单个分区
0: jdbc:hive2://bigdata01:10000> alter table dept_partition add partition(month='202012');
- 同时创建多个分区
0: jdbc:hive2://bigdata01:10000> alter table dept_partition add partition(month='202001') partition(month='202002');
4、删除分区
- 删除单个分区
0: jdbc:hive2://bigdata01:10000> alter table dept_partition drop partition(month='202012');
- 同时删除多个分区
0: jdbc:hive2://bigdata01:10000> alter table dept_partition drop partition(month='202001'),partition(month='202002');
5、查看分区表有多少分区
show partitions dept_partition;
6、查看分区表结构
0: jdbc:hive2://bigdata01:10000> desc formatted dept_partition;
7、分区表注意事项
创建二级分区表
0: jdbc:hive2://bigdata01:10000> create table dept_partition2(id int,name string,loc string) partition ed by (month string,day string) row format delimited fields terminated by '\t';
加载数据到二级分区表中
0: jdbc:hive2://bigdata01:10000> load data local inpath '/opt/module/datas/stu1.txt' into table dept_partition2 partition(month='202001',day='1');
查询分区数据
0: jdbc:hive2://bigdata01:10000> select * from dept_partition2 where month='202001' and day='1';
8、把数据直接上传到(HDFS)分区目录上,让分区表和数据产生关联的三种方式
- 方式一:上传数据后修复
1、上传数据
0: jdbc:hive2://bigdata01:10000> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=202001/day=2;
0: jdbc:hive2://bigdata01:10000> dfs -put /opt/module/datas/stu1.txt /user/hive/warehouse/dept_partition2/month=202001/day=2;
查询结果为空
2、执行修复命令
0: jdbc:hive2://bigdata01:10000> msck repair table dept_partition2;
3、查询数据
select * from dept_partition2 where month='202001' and d ay='2';
- 方式二:上传数据后添加分区
上传数据
0: jdbc:hive2://bigdata01:10000> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=202001/day=3;
0: jdbc:hive2://bigdata01:10000> dfs -put /opt/module/datas/stu1.txt /user/hive/warehouse/dept_partition2/month=202001/day=3;
执行添加分区
0: jdbc:hive2://bigdata01:10000> alter table dept_partition2 add partition(month='202001',day='3');
查询数据
0: jdbc:hive2://bigdata01:10000> select * from dept_partition2 where month='202001' and day='3';
- 方式三:创建文件夹后load数据到分区
创建目录
0: jdbc:hive2://bigdata01:10000> dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=202001/day=4;
上传数据
0: jdbc:hive2://bigdata01:10000> load data local inpath '/opt/module/datas/stu1.txt' into table dept_partition2 partition(month='202001',day='10');
查询数据
0: jdbc:hive2://bigdata01:10000> select * from dept_partition2 where month='202001' and day='10';