Spark- 由于 dynamicAllocation 导致的 executor 不断增加的坑

参考文章 :

解决CDH SparkStreaming任务启动之后executor不停增长的问题,num-executors配置不管用。...

解决CDH SparkStreaming任务启动之后executor不停增长的问题,num-executors配置不管用。..._spark num-executors_杨五五的博客-CSDN博客

spark中 Dynamic Allocation 以及 num-executors 的问题

spark中 Dynamic Allocation 以及 num-executors 的问题_EnterPine的博客-CSDN博客

Spark Dynamic Allocation 分析

Spark Dynamic Allocation 分析 - 简书

spark 运行环境 : cdh 5.14 默认的 spark 版本

      最近在提交作业的时候遇到了一个问题,特点是 executor 会动态增加。 

       经过研究,我发现主要的问题是,spark 该环境下会 默认设置 spark.dynamicAllocation.enabled 为 true , 该值会导致 executor 数量的动态变化。

        另外需要注意的一点,对于这个问题,我们可以通过设置最大最小的并发度去解决,并没有必要直接关闭此参数。

动态扩容的相关配置参数:

Configuration - Spark 3.4.1 Documentation

Spark 2.4.0  latest

Dynamic Allocation

Property NameDefaultMeaning
spark.dynamicAllocation.enabledfalseWhether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload. For more detail, see the description here

This requires spark.shuffle.service.enabled to be set. The following configurations are also relevant:spark.dynamicAllocation.minExecutors,spark.dynamicAllocation.maxExecutors, andspark.dynamicAllocation.initialExecutorsspark.dynamicAllocation.executorAllocationRatio
spark.dynamicAllocation.executorIdleTimeout60sIf dynamic allocation is enabled and an executor has been idle for more than this duration, the executor will be removed. For more detail, see this description.
spark.dynamicAllocation.cachedExecutorIdleTimeoutinfinityIf dynamic allocation is enabled and an executor which has cached data blocks has been idle for more than this duration, the executor will be removed. For more details, see this description.
spark.dynamicAllocation.initialExecutorsspark.dynamicAllocation.minExecutorsInitial number of executors to run if dynamic allocation is enabled. 

If `--num-executors` (or `spark.executor.instances`) is set and larger than this value, it will be used as the initial number of executors.
spark.dynamicAllocation.maxExecutorsinfinityUpper bound for the number of executors if dynamic allocation is enabled.
spark.dynamicAllocation.minExecutors0Lower bound for the number of executors if dynamic allocation is enabled.
spark.dynamicAllocation.executorAllocationRatio1By default, the dynamic allocation will request enough executors to maximize the parallelism according to the number of tasks to process. While this minimizes the latency of the job, with small tasks this setting can waste a lot of resources due to executor allocation overhead, as some executor might not even do any work. This setting allows to set a ratio that will be used to reduce the number of executors w.r.t. full parallelism. Defaults to 1.0 to give maximum parallelism. 0.5 will divide the target number of executors by 2 The target number of executors computed by the dynamicAllocation can still be overridden by the spark.dynamicAllocation.minExecutors andspark.dynamicAllocation.maxExecutors settings
spark.dynamicAllocation.schedulerBacklogTimeout1sIf dynamic allocation is enabled and there have been pending tasks backlogged for more than this duration, new executors will be requested. For more detail, see this description.
spark.dynamicAllocation.sustainedSchedulerBacklogTimeoutschedulerBacklogTimeoutSame as spark.dynamicAllocation.schedulerBacklogTimeout, but used only for subsequent executor requests. For more detail, see this description.

解决方法 :

设置 该值 为 false, 

并显式指定  driver-cores , executor-cores 等属性

    --driver-memory 2G \
    --driver-cores 1 \
    --num-executors 6 \
    --executor-cores 2 \
    --executor-memory 2G \

完整提交命令 :

nohup /usr/bin/spark2-submit \
    --class ${class_name} \
    --name ${JOB_NAME} \
    --files ${config} \
    --master yarn \
    --conf spark.dynamicAllocation.enabled=false \
    --driver-memory 2G \
    --driver-cores 1 \
    --num-executors 6 \
    --executor-cores 2 \
    --executor-memory 2G \
    --jars ${classpath} \
    ${ROOT_PATH}/libs/${APP_NAME}-${libVersion}-SNAPSHOT.jar online ${config} \
    > ${ROOT_PATH}/logs/start.error 2> ${ROOT_PATH}/logs/start.log &
 

  • 1
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
kylin.query.spark-conf.spark.executor.memoryOverhead=4g是Kylin中关于Spark执行器内存的参数设置。在Kylin中使用Spark作为计算引擎时,该参数用于设置每个Spark执行器在运行过程中可以使用的最大堆外内存。堆外内存是指位于堆以外的Java进程使用的内存空间,它通常用于存储直接内存,如Java垃圾收集器的元数据和Spark任务的执行过程中产生的临时数据。 通过将kylin.query.spark-conf.spark.executor.memoryOverhead设置为4g,可以为每个Spark执行器分配4GB的堆外内存空间。这样做的目的是提高Spark任务的执行效率和稳定性。由于Spark任务在执行过程中会产生大量的临时数据,如果没有足够的堆外内存空间进行存储和管理,可能会导致Spark任务频繁进行垃圾收集和内存回收,进而影响任务的性能和稳定性。 设置kylin.query.spark-conf.spark.executor.memoryOverhead=4g时需要考虑集群的可用内存大小和Spark任务的实际需求。如果集群的可用内存比较充足,并且Spark任务产生的临时数据较多,则可以适当增加该参数的值,以提高Spark任务的执行效率。反之,如果集群的可用内存有限或者Spark任务产生的临时数据较少,则可以减小该参数的值,以节省资源和提高任务的稳定性。 总之,kylin.query.spark-conf.spark.executor.memoryOverhead=4g是Kylin中关于Spark执行器内存的配置参数,它决定了每个Spark执行器可以使用的最大堆外内存空间大小,合理设置该参数可以提高Spark任务的执行效率和稳定性。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值