#1. 启用动态分区
SET hive.exec.dynamic.partition=true; --开启动态分区
SET hive.exec.dynamic.partition.mode=nonstrict; --默认值是strict,默认要求分区字段必须有一个是静态的分区值
##1.1 概念介绍
在HIVE建表时可以创建分区表,如:
create table db_working.singe_partition_table(
order_id string COMMENT ‘订单编号’,
create_time string COMMENT ‘订单创建时间’
)
COMMENT ‘单分区表’
PARTITIONED by (dt string COMMENT ‘分区日期’)
STORED AS orc
;
###1.1.1 静态分区SP
static partition,在插入数据(有insert和load两种插入方式)时必须明确指定明确的分区值,如:
insert into db_working.singe_partition_table partition(dt=‘2022-10-25’)
select ‘order00001’,‘2022-02-24 17:36:18.0’
;
###1.1.2 动态分区DP
dynamic partition,在插入数据(只能用insert插入方式)时根据分区字段的实际值,动态进行分区,如:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
insert into table db_working.singe_partition_table partition(dt)
select order_no,create_time,substr(create_time,1,10)
from tb_test
where pt = ‘2022-04-10’
;
直接执行会把如下错误:
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
#2. 分区表-分区最大值设置
使用动态分区DP方式向分区表插入数据时,如果要插入的数据的分区数超过HIVE中设置的表的默认分区数最大值时会报分区数过多的错误:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions. The maximum number of dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode. Maximum was set to 100 partitions per node, number of dynamic partitions on this node: 101
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:951)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:722)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:146)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:484)
… 9 more
set hive.exec.max.dynamic.partitions=100000; --表分区总数最大值设置
加入参数后:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=100000;
insert into table db_working.singe_partition_table partition(dt)
select order_no,create_time,substr(create_time,1,10)
from tb_test
where pt = ‘2022-04-10’
;
#3. 分区表-每个mapper或reducer可以创建的最大动态分区设置
使用动态分区DP方式向分区表插入数据时,如果要插入的数据在单节点分区数超过HIVE中设置的默认最大值时会报节点分区数过多的错误:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions. The maximum number of dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode. Maximum was set to 100 partitions per node, number of dynamic partitions on this node: 101
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:951)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:722)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:146)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:484)
… 9 more
set hive.exec.max.dynamic.partitions.pernode=100000; --每个节点分区数最大值设置
hive.exec.max.dynamic.partitions.pernode (default value being 100) is the maximum dynamic partitions that can be created by each mapper or reducer. If one mapper or reducer created more than that the threshold, a fatal error will be raised from the mapper/reducer (through counter) and the whole job will be killed. 每个mapper或reducer可以创建的最大动态分区。
加入参数后:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.dynamic.partitions.pernode=100000;
insert into table db_working.singe_partition_table partition(dt)
select order_no,create_time,substr(create_time,1,10)
from tb_test
where pt = ‘2022-04-10’
;
加入后执行成功
select count(1) from (
select dt from db_working.singe_partition_table
group by dt
) aa
;
1057
#4. 动态分区列将全局排序设置
SET hive.optimize.sort.dynamic.partition=true; --指定是否开启自动分区
Default Value: true in Hive 0.13.0 and 0.13.1; false in Hive 0.14.0 and later (HIVE-8151)
Added In: Hive 0.13.0 with HIVE-6455
When enabled, dynamic partitioning column will be globally sorted. This way we can keep only one record writer open for each partition value in the reducer thereby reducing the memory pressure on reducers.
启用时,动态分区列将全局排序。通过这种方式,我们可以为减速器中的每个分区值只保留一个记录写入器,从而减少减速器上的内存压力。
如果不出现OOM,尽量注释掉这个参数
使用建议:
a. 进行历史数据初始化时,可以开启这个(特别是在insert动态分区的时候,当数据量达到一定程度的时候,经常出现OOM时,可以加入这个参数)
b. 正常的任务,可以关闭这个