背景
SparkContext 在初始化的过程中,其中schudularBackEnd会向Master发送RegisterApplication的注册信息,注册成功之后,会调用schedula() 方法为application分配cores资源,并通知worker启动executor。
正文
1.起点 - receive()
因为Master是一个消息循环体,他的receive方法会接收来自client的注册application请求,最后,注册成功之后,会调用schedule() 方法,进行资源的调度和分配,代码如下:
case RegisterApplication(description, driver) =>
// TODO Prevent repeated registrations from some driver
if (state == RecoveryState.STANDBY) {
//master 必须处于alive状态
// ignore, don't send response
} else {
logInfo("Registering app " + description.name)
val app = createApplication(description, driver)
//注册
registerApplication(app)
logInfo("Registered app " + description.name + " with ID " + app.id)
//持久化
persistenceEngine.addApplication(app)
//返回成功信息
driver.send(RegisteredApplication(app.id, self))
//资源调度
schedule()
}
2. 核心方法 - schedule()
作用:为driver分配资源、为分配worker资源
调用时机:新application注册、worker资源变动时候
注意:此次运行,并没有为driver分配资源,因为注册application的时候driver已经启动了,本次主要是分配worker资源
private def schedule(): Unit = {
//状态检测、省略
//打乱worker顺序
val shuffledAliveWorkers = Random.shuffle(workers.toSeq.filter(_.state == WorkerState.ALIVE))
val numWorkersAlive = shuffledAliveWorkers.size
var curPos = 0
//为driver分配资源,启动driver, 本次不调用
for (driver <- waitingDrivers.toList) { // iterate over a copy of waitingDrivers
// We assign workers to each waiting driver in a round-robin fashion. For each driver, we
// start from the last worker that was assigned a driver, and continue onwards until we have
// explored all alive workers.
var launched = false
var numWorkersVisited = 0
while (numWorkersVisited < numWorkersAlive && !launched) {
val worker = shuffledAliveWorkers(curPos)
numWorkersVisited += 1
if (worker.memoryFree >= driver.desc.mem && worker.coresFree >= driver.desc.cores) {
//满足要求启动deriver
launchDriver(worker, driver)
waitingDrivers -= driver
launched = true
}
curPos = (curPos + 1) % numWorkersAlive
}
}
//启动worker核心方法
startExecutorsOnWorkers()
}
3. 启动worker核心方法 - startExecutorsOnWorkers()
schedulaExecutorsOnWorker方法,会根据参数SpreadOutApps参数设定,来决定,是将所有cores分配到一个worker上,还是尽可能的分配到多个worker上,返回每个worker分配的cores数目。
allocateWorkerResourceToExecutors方法,会使用已经分配好的核心数,跟觉coresPerExecutor参数不同,在Worker上启动Executor,启动的方式为,RPCEndPoint调用,发送消息。
步骤:
a.过滤可用Worker
b.为每个Worker分配核心数cores
c.按照每个Worker所分配的cores,启动executor
private def startExecutorsOnWorkers(): Unit = {
for (app <- waitingApps if app.coresLeft > 0) {
val coresPerExecutor: Option[Int] = app.desc.coresPerExecutor
//过滤出满足coresPerExecutor 条件的Worker
val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)
.filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&
worker.coresFree >= coresPerExecutor.getOrElse(1))
.sortBy(_.coresFree).reverse
//决定每个Worker,分配集合核心数
val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)
// Now that we've decided how many cores to allocate on each worker, let's allocate them
//按照核心数 启动executor
/*
此处启动executor 时候,会跟觉coresPerExecutor 不同启动方式不同,
如果coresPerExecutor定义,则启动多个executor
如果coresPerExecutor未定义,则会启动一个executor,该executor持有全部的cores
*/
for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {
allocateWorkerResourceToExecutors(
app, assignedCores(pos), coresPerExecutor, usableWorkers(pos))
}
}
}
4. 启动executor之后
启动executor之后,就是executorBackEnd和Driver之间的通信了,backend会向driver注册executor,相关信息记录于之前的博文信息中。
总结
当有新的Application,或者Worker信息变动的时候,都会导致schedula() 调度资源方法的调用。
分配cores资源的时候,SpreadOutApps参数会决定,按照计算密集还是数据密集方式来分配cores资源。
coresPerExecutor 参数,会决定在一个Worker启动几个Executor.