在HIVE中,有时候我们需要根据动态分区插入数据,默认情况下,动态分区是关闭的,这个时候执行SQL会报错,这里我们简单记录如何设置动态分区。
-- overwrite
insert overwrite table XXX partition(report_date)
select XXX, report_date from XXX where report_date>='20200701' and report_date <= '20200930';
set hive.exec.dynamic.partition=true;#开启动态分区
set hive.exec.dynamic.partition.mode=nonstrict;#允许所有分区都是动态的
因为比较简单,这里我们主要关注一下平时使用时候的注意事项。
- 仅有一个动态分区字段
保持字段个数一致,如果不一致,会直接报错。需要注意的是,如果字段个数保持一致,但是类型不一致,此时不会报错,但是对应数据会被null填充。 - 有多个分区字段
同仅有一个动态分区字段一样,只需要保持字段一致就可以了 - 包含静态分区字段和动态分区字段
这里没有明确要求,对数据插入没有影响,建议动态分区字段放在后面
下面我们进行简单测试,这里新建三张测试表。
其中test_01为非分区表,用来存放基础数据。test_02为单个分区表,test_03为多个分区表。建表语句如下:
CREATE TABLE test_01(
id INT,name STRING,sex STRING,dataday STRING);
CREATE TABLE test_02(
id INT,name STRING,sex STRING)
PARTITIONED BY(dataday STRING);
CREATE TABLE test_03(
id INT,name STRING)
PARTITIONED BY(sex STRING,dataday STRING);
我们往test_01表中插入一些数据用作测试。
hive> select * from test_01;
OK
1 Mary female 20190101
2 Emma female 20190101
3 Lucy female 20190102
4 Kevin male 20190102
5 Jack male 20190103
6 Rose male 20190103
首先需要打开动态分区,否则会报错,错误如下
hive> insert into table test_02 partition (dataday)
> select id,name,sex from test_01;
FAILED: SemanticException [Error 10096]:
Dynamic partition strict mode requires at least one static partition column.
To turn this off set hive.exec.dynamic.partition.mode=nonstrict
- Test01:正常插入数据
insert into table test_02 partition (dataday)
select id,name,sex,dataday from test_01;
--查看分区
hive> show partitions test_02;
OK
dataday=20190101
dataday=20190102
dataday=20190103
Time taken: 0.146 seconds, Fetched: 3 row(s)
/*事实上,我们在插入数据的时候,只需要保持字段数量和类型一致即可,
*系统会默认为以最后一个字段作为分区,所以使用时需要确保分区字段位置正确*/
insert into table test_03 partition (sex,dataday)
select id,name,sex,dataday from test_01;
--查看分区
hive> show partitions test_03;
OK
sex=female/dataday=20190101
sex=female/dataday=20190102
sex=male/dataday=20190102
sex=male/dataday=20190103
- Test02:字段个数不一致插入数据
--这里我们少一个字段
insert into table test_02 partition (dataday)
select id,name,sex from test_01;
--这里会直接报错,错误如下
FAILED: SemanticException [Error 10044]: Line 1:18
Cannot insert into target table because column number/types
are different 'dataday':
Table insclause-0 has 4 columns, but query has 3 columns.
--这里我们多一个字段
insert into table test_02 partition (dataday)
select id,name,sex,'aaa',dataday from test_01;
--这里直接报错,错误如下
FAILED: SemanticException [Error 10044]: Line 1:18
Cannot insert into target table because column number/types
are different 'dataday':
Table insclause-0 has 4 columns, but query has 5 columns.
- Test03:字段个数一致,类型不一致插入数据
--这里我们把id和name换个位置
insert into table test_02 partition (dataday)
select name,id,sex,dataday from test_01;
--插入后查询数据,需要注意的是这里int类型能转string类型进行存储
hive> select * from test_02;
OK
NULL 1 female 20190101
NULL 2 female 20190101
NULL 3 female 20190102
NULL 4 male 20190102
NULL 5 male 20190103
NULL 6 male 20190103
- Test04:动态分区在前,静态分区在后插入数据
insert into table test_03 partition (dataday,sex='male')
select id,name,dataday from test_01;
--查看插入结果
hive> select * from test_03;
OK
1 Mary male 20190101
2 Emma male 20190101
3 Lucy male 20190102
4 Kevin male 20190102
5 Jack male 20190103
6 Rose male 20190103
- Test05:动态分区在后,静态分区在前插入数据
insert into table test_03 partition (sex='male',dataday)
select id,name,dataday from test_01;
--查看插入结果
hive> select * from test_03;
OK
1 Mary male 20190101
2 Emma male 20190101
3 Lucy male 20190102
4 Kevin male 20190102
5 Jack male 20190103
6 Rose male 20190103