为什么需要动态?
a) Spark默认情况下粗粒度的,先分配好资源再计算。对于Spark Streaming而言有高峰值和低峰值,但是他们需要的资源是不一样的,如果按照高峰值的角度的话,就会有大量的资源浪费。
b) Spark Streaming不断的运行,对资源消耗和管理也是我们要考虑的因素。
Spark Streaming资源动态调整的时候会面临挑战:
Spark Streaming是按照Batch Duration运行的,Batch Duration需要很多资源,下一次Batch Duration就不需要那么多资源了,调整资源的时候还没调整完Batch Duration运行就已经过期了。这个时候调整时间间隔。
SparkStreaming资源动态申请
1. 在SparkContext中默认是不开启动态资源分配的,但是可以通过手动在SparkConf中配置。
// Optionally scale
numberof
executors dynamically based
on workload. Exposed for testing.
val dynamicAllocationEnabled = Utils.isDynamicAllocationEnabled(
_conf)
if (!dynamicAllocationEnabled &&
//参数配置是否开启资源动态分配
_conf.getBoolean(
"spark.dynamicAllocation.enabled",
false)) {
logWarning(
"Dynamic Allocation and num executors both set, thus dynamic allocation disabled.")
}
_executorAllocationManager =
if
(dynamicAllocationEnabled) {
Some(
newExecutorAllocationManager(this, listenerBus,
_conf))
}
else{
None
}
_executorAllocationManager.foreach(_.start())
2. ExecutorAllocationManager: 有定时器会不断的去扫描Executor的情况,正在运行的Stage,要运行在不同的Executor中,要么增加Executor或者减少。
3. ExecutorAllocationManager中schedule方法会被周期性触发进行资源动态调整。
/**
* This
iscalled
ata fixed interval
toregulate
thenumber
of
pending executor requests
*
andnumber
of
executors
running.
*
* First, adjust our requested executors based
onthe
add
timeand
our current needs.
* Then,
ifthe
remove
timefor
an existing executor has expired, kill
theexecutor.
*
* This
isfactored out
intoits
own method
fortesting.
*/
private def schedule(): Unit = synchronized {
val now = clock.getTimeMillis
updateAndSyncNumExecutorsTarget(now)
removeTimes.retain { case (executorId, expireTime) =>
val expired = now >= expireTime
if
(expired) {
initializing =
false
removeExecutor(executorId)
}
!expired
}
}
4. 在ExecutorAllocationManager中会在线程池中定时器会不断的运行schedule.
/**
* Register
forscheduler callbacks to decide when to add
andremove executors,
andstart
* the scheduling task.
*/
def start(): Unit = {
listenerBus.addListener(listener)
val scheduleTask = new Runnable() {
override
def run():Unit = {
try
{
schedule()
} catch {
case ct: ControlThrowable =>
throw ct
case t: Throwable =>
logWarning(s
"Uncaught exception in thread ${Thread.currentThread().getName}", t)
}
}
}
// intervalMillis定时器触发时间
executor.scheduleAtFixedRate(scheduleTask,
0, intervalMillis, TimeUnit.MILLISECONDS)
}
动态控制消费速率:
Spark Streaming提供了一种弹性机制,流进来的速度和处理速度的关系,是否来得及处理数据。如果不能来得及的话,他会自动动态控制数据流进来的速度,spark.streaming.backpressure.enabled参数设置。
-
资料来源于:DT_大数据梦工厂(Spark发行版本定制)
-
DT大数据梦工厂微信公众号:DT_Spark
-
新浪微博:http://www.weibo.com/ilovepains
-
王家林老师每晚20:00免费大数据实战
YY直播:68917580