解决 Spark yarn 任务启动之后executor不停增长的问题,num-executors配置不管用

最新推荐文章于 2025-02-24 21:44:02 发布

灵佑666

最新推荐文章于 2025-02-24 21:44:02 发布

阅读量2.1k

点赞数

分类专栏： Spark

本文链接：https://blog.csdn.net/onway_goahead/article/details/110139839

版权

Spark 专栏收录该内容

77 篇文章

订阅专栏

spark2-submit --class SparkKafka  \
--master yarn \ 
--executor-memory 1G \
--num-executors 6 \
--driver-memory 1g \
--conf spark.driver.supervise=true \
--conf spark.dynamicAllocation.maxExecutors=6 \  -- 限制最大executor
--conf spark.streaming.kafka.maxRatePerPartition=100 recommend-1.0-SNAPSHOT.jar

主要原因是spark.dynamicAllocation.maxExecutors这个配置，

在CDH中，默认开启了动态资源占用，即资源如果空余时，SparkStreaming会自动按照并发度(并行的block数)来占用资源，而spark-streaming作为一个实时处理系统，在大多数时候是不需要太多资源的。

为了限制spark streaming最多分配的executor数，可以配置spark.dynamicAllocation.maxExecutors为动态资源分配的上限。num-executors其实是资源初始化时所取的值，所以其实还是有用的。

这里要注意的是开源是默认没有开启动态资源占用的，可以通过spark.dynamicAllocation.enabled=true这一配置来开启，如果配置了这一项，同时还需要开启external-shuffle-service，保证在动态回收不再工作的executor的时候不会中断在executor上的shuffle过程spark.shuffle.service.enabled=true。

=====================================================================

1.修改每台NodeManager上的yarn-site.xml：

##修改
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle</value>
</property>
##增加
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>

以上若在ambari平台,应该是默认设置好的,
在这里插入图片描述

2.在spark-defaults.conf

设置

ark.shuffle.service.enabled true   //启用External shuffle Service服务
spark.shuffle.service.port 7337 //Shuffle Service服务端口，必须和yarn-site中的一致
spark.dynamicAllocation.enabled true  //开启动态资源分配
spark.dynamicAllocation.minExecutors 1  //每个Application最小分配的executor数
spark.dynamicAllocation.maxExecutors 30  //每个Application最大并发分配的executor数
spark.dynamicAllocation.schedulerBacklogTimeout 1s 
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 5s

附上说明:

这里是引用
以下是基本配置参考
spark.shuffle.service.enabled true 配置External shuffle Service服务（一定要配置启用）
spark.shuffle.service.port 7337
spark.dynamicAllocation.enabled true 启用动态资源调度
spark.dynamicAllocation.minExecutors 3 每个应用中最少executor的个数
spark.dynamicAllocation.maxExecutors 8 每个应用中最多executor的个数
可选参数说明：
配置项说明默认值
spark.dynamicAllocation.minExecutors 最小Executor个数。 0
spark.dynamicAllocation.initialExecutors 初始Executor个数。 spark.dynamicAllocation.minExecutors
spark.dynamicAllocation.maxExecutors 最大executor个数。 Integer.MAX_VALUE
spark.dynamicAllocation.schedulerBacklogTimeout 调度第一次超时时间。 1(s)
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 调度第二次及之后超时时间。 spark.dynamicAllocation.schedulerBacklogTimeout
spark.dynamicAllocation.executorIdleTimeout 普通Executor空闲超时时间。 60(s)
spark.dynamicAllocation.cachedExecutorIdleTimeout 含有cached blocks的Executor空闲超时时间。spark.dynamicAllocation.executorIdleTimeout的2倍
说明
1.使用动态资源调度功能，必须配置External Shuffle Service。如果没有使用External Shuffle Service，Executor被杀时会丢失shuffle文件。
2.配置了动态资源调度功能，就不能再单独配置Executor的个数，否则会报错退出。
3.使用动态资源调度功能，能保证最少的executor的个数（spark.dynamicAllocation.minExecutors）
来源:https://blog.csdn.net/dandykang/article/details/48160953

动态资源分配策略：
开启动态分配策略后，application会在task因没有足够资源被挂起的时候去动态申请资源，这种情况意味着该application现有的executor无法满足所有task并行运行。spark一轮一轮的申请资源，当有task挂起或等待spark.dynamicAllocation.schedulerBacklogTimeout(默认1s)时间的时候，会开始动态资源分配；之后会每隔spark.dynamicAllocation.sustainedSchedulerBacklogTimeout(默认1s)时间申请一次，直到申请到足够的资源。每次申请的资源量是指数增长的，即1,2,4,8等。
之所以采用指数增长，出于两方面考虑：其一，开始申请的少是考虑到可能application会马上得到满足；其次要成倍增加，是为了防止application需要很多资源，而该方式可以在很少次数的申请之后得到满足。

资源回收策略
当application的executor空闲时间超过spark.dynamicAllocation.executorIdleTimeout（默认60s）后，就会被回收。

这里提醒下,实是在spark-defaults.conf下增加,在
在这里插入图片描述
spark2-thrift-sparkconf下也有改配置,该配置是spark2-thrift的配置,通过远程调用的,本地的spark-shell或者pyspark是不生效的,研究了好久,
添加提示

不用管,直接添加就行,提示重复添加,俩不是一个配置文件里的没有影响
配置完后,我启动spark-sql测试(pysaprk和spark-shell一样),
未提交任务情况下
在这里插入图片描述

占用集群资源较少,当提交任务后

集群资源动态调节,最大话利用集群