hive_sql提交常用参数设置

#1. 启用动态分区
SET hive.exec.dynamic.partition=true; --开启动态分区
SET hive.exec.dynamic.partition.mode=nonstrict; --默认值是strict,默认要求分区字段必须有一个是静态的分区值

##1.1 概念介绍
在HIVE建表时可以创建分区表,如:
create table db_working.singe_partition_table(
order_id string COMMENT ‘订单编号’,
create_time string COMMENT ‘订单创建时间’
)
COMMENT ‘单分区表’
PARTITIONED by (dt string COMMENT ‘分区日期’)
STORED AS orc
;

###1.1.1 静态分区SP
static partition,在插入数据(有insert和load两种插入方式)时必须明确指定明确的分区值,如:
insert into db_working.singe_partition_table partition(dt=‘2022-10-25’)
select ‘order00001’,‘2022-02-24 17:36:18.0’
;

###1.1.2 动态分区DP
dynamic partition,在插入数据(只能用insert插入方式)时根据分区字段的实际值,动态进行分区,如:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
insert into table db_working.singe_partition_table partition(dt)
select order_no,create_time,substr(create_time,1,10)
from tb_test
where pt = ‘2022-04-10’
;

直接执行会把如下错误:
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict

#2. 分区表-分区最大值设置
使用动态分区DP方式向分区表插入数据时,如果要插入的数据的分区数超过HIVE中设置的表的默认分区数最大值时会报分区数过多的错误:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions. The maximum number of dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode. Maximum was set to 100 partitions per node, number of dynamic partitions on this node: 101
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:951)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:722)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:146)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:484)
… 9 more

set hive.exec.max.dynamic.partitions=100000; --表分区总数最大值设置

加入参数后:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=100000;
insert into table db_working.singe_partition_table partition(dt)
select order_no,create_time,substr(create_time,1,10)
from tb_test
where pt = ‘2022-04-10’
;

#3. 分区表-每个mapper或reducer可以创建的最大动态分区设置
使用动态分区DP方式向分区表插入数据时,如果要插入的数据在单节点分区数超过HIVE中设置的默认最大值时会报节点分区数过多的错误:
Caused by: org.apache.hadoop.hive.ql.metadata.HiveFatalException: [Error 20004]: Fatal error occurred when node tried to create too many dynamic partitions. The maximum number of dynamic partitions is controlled by hive.exec.max.dynamic.partitions and hive.exec.max.dynamic.partitions.pernode. Maximum was set to 100 partitions per node, number of dynamic partitions on this node: 101
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:951)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:722)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:882)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:146)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:484)
… 9 more

set hive.exec.max.dynamic.partitions.pernode=100000; --每个节点分区数最大值设置
hive.exec.max.dynamic.partitions.pernode (default value being 100) is the maximum dynamic partitions that can be created by each mapper or reducer. If one mapper or reducer created more than that the threshold, a fatal error will be raised from the mapper/reducer (through counter) and the whole job will be killed. 每个mapper或reducer可以创建的最大动态分区。

加入参数后:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.dynamic.partitions.pernode=100000;
insert into table db_working.singe_partition_table partition(dt)
select order_no,create_time,substr(create_time,1,10)
from tb_test
where pt = ‘2022-04-10’
;

加入后执行成功

select count(1) from (
select dt from db_working.singe_partition_table
group by dt
) aa
;

1057

#4. 动态分区列将全局排序设置
SET hive.optimize.sort.dynamic.partition=true; --指定是否开启自动分区

Default Value: true in Hive 0.13.0 and 0.13.1; false in Hive 0.14.0 and later (HIVE-8151)
Added In: Hive 0.13.0 with HIVE-6455
When enabled, dynamic partitioning column will be globally sorted. This way we can keep only one record writer open for each partition value in the reducer thereby reducing the memory pressure on reducers.

启用时,动态分区列将全局排序。通过这种方式,我们可以为减速器中的每个分区值只保留一个记录写入器,从而减少减速器上的内存压力。

如果不出现OOM,尽量注释掉这个参数

使用建议:
a. 进行历史数据初始化时,可以开启这个(特别是在insert动态分区的时候,当数据量达到一定程度的时候,经常出现OOM时,可以加入这个参数)
b. 正常的任务,可以关闭这个

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

海阔天空_81

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值