Master主备切换
spark原生的standalone是支持主备切换的,下面从发生主备切换并且选出新的Leader Master开始
Master
case ElectedLeader =>
// 当当前Master收到自己被选为Leader的信息后,会从持久化引擎中读取缓存的app,driver,worker信息
val (storedApps, storedDrivers, storedWorkers) = persistenceEngine.readPersistedData(rpcEnv)
// 如果这三个信息都是空的,那么直接将当前Master的状态设置为ALIVE即可
// 如果不都为空,那么将状态设置为RECOVEING
state = if (storedApps.isEmpty && storedDrivers.isEmpty && storedWorkers.isEmpty) {
RecoveryState.ALIVE
} else {
RecoveryState.RECOVERING
}
logInfo("I have been elected leader! New state: " + state)
//如果状态是RECOVERING,那么首先执行beginRecovery然后使用另外一个线程延迟执行copleteRecovery
if (state == RecoveryState.RECOVERING) {
beginRecovery(storedApps, storedDrivers, storedWorkers)
recoveryCompletionTask = forwardMessageThread.schedule(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
self.send(CompleteRecovery)
}
}, WORKER_TIMEOUT_MS, TimeUnit.MILLISECONDS)
}
case CompleteRecovery => completeRecovery()
下面首先看一下beginRecovery()
private def beginRecovery(storedApps: Seq[ApplicationInfo], storedDrivers: Seq[DriverInfo],
storedWorkers: Seq[WorkerInfo]) {
// 这里就是将从缓存引擎中读取的applicaton,driver,worker信息重新添加到本地的缓冲中,并且将application和worker的状态都设置为UNKNOWN
// 并且向application的driver,worker发送MasterChanged Master变更信息
for (app <- storedApps) {
logInfo("Trying to recover app: " + app.id)
try {
registerApplication(app)
app.state = ApplicationState.UNKNOWN
app.driver.send(MasterChanged(self, masterWebUiUrl))
} catch {
case e: Exception => logInfo("App " + app.id + " had exception on reconnect")
}
}
for (driver <- storedDrivers) {
// Here we just read in the list of drivers. Any drivers associated with now-lost workers
// will be re-launched when we detect that the worker is missing.
drivers += driver
}
for (worker <- storedWorkers) {
logInfo("Trying to recover worker: " + worker.id)
try {
registerWorker(worker)
worker.state = WorkerState.UNKNOWN
worker.endpoint.send(MasterChanged(self, masterWebUiUrl))
} catch {
case e: Exception => logInfo("Worker " + worker.id + " had exception on reconnect")
}
}
}
下面看一下driver,worker收到Master状态变更信息后都做了什么
Worker
case MasterChanged(masterRef, masterWebUiUrl) =>
logInfo("Master has changed, new master is at " + masterRef.address.toSparkURL)
// 主要就是更新关于master的缓存信息
// 并且将在当前worker上运行的executor以及driver信息发送给Master
changeMaster(masterRef, masterWebUiUrl, masterRef.address)
val execs = executors.values.
map(e => new ExecutorDescription(e.appId, e.execId, e.cores, e.state))
masterRef.send(WorkerSchedulerStateResponse(workerId, execs.toList, drivers.keys.toSeq))
Master
case WorkerSchedulerStateResponse(workerId, executors, driverIds) =>
idToWorker.get(workerId) match {
case Some(worker) =>
logInfo("Worker has been re-registered: " + workerId)
// Master接收到worker的响应信息后,更改worker的状态为ALIVE
worker.state = WorkerState.ALIVE
// 并且更新executor信息以及将运行于该worker上的driver的状态修改为RUNNING
val validExecutors = executors.filter(exec => idToApp.get(exec.appId).isDefined)
for (exec <- validExecutors) {
val app = idToApp.get(exec.appId).get
val execInfo = app.addExecutor(worker, exec.cores, Some(exec.execId))
worker.addExecutor(execInfo)
execInfo.copyState(exec)
}
for (driverId <- driverIds) {
drivers.find(_.id == driverId).foreach { driver =>
driver.worker = Some(worker)
driver.state = DriverState.RUNNING
worker.addDriver(driver)
}
}
case None =>
logWarning("Scheduler state from unknown worker: " + workerId)
}
if (canCompleteRecovery) { completeRecovery() }
下面看一下driver如何处理
StandaloneAppClient中的ClientEndPoint的receive()
case MasterChanged(masterRef, masterWebUiUrl) =>
logInfo("Master has changed, new master is at " + masterRef.address.toSparkURL)
// 就是简单地设置了一下master地址
master = Some(masterRef)
alreadyDisconnected = false
// 向Master发送MasterChangeAcknowledged消息
masterRef.send(MasterChangeAcknowledged(appId.get))
Master
case MasterChangeAcknowledged(appId) =>
idToApp.get(appId) match {
case Some(app) =>
logInfo("Application has been re-registered: " + appId)
// 修改Application的状态为WAITING
app.state = ApplicationState.WAITING
case None =>
logWarning("Master change ack from unknown app: " + appId)
}
if (canCompleteRecovery) { completeRecovery() }
接下来看一下completeRecovery()
/**
* 完成master的恢复
* 总共完成下面三件事儿
* 1. 从内存缓存结构中移除
* 2. 从相关组件的内存缓存结构中移除
* 3. 从持久化结构中移除
*/
private def completeRecovery() {
// Ensure "only-once" recovery semantics using a short synchronization period.
if (state != RecoveryState.RECOVERING) { return }
state = RecoveryState.COMPLETING_RECOVERY
// Kill off any workers and apps that didn't respond to us.
// 将没有发送响应信息的application和worker移除
// 当进行主备切换之前会先将所有的worker状态设置为unknown,然后将接收到来自worker响应的worker的状态进行更改
// 将没有发送响应信息的worker移除
workers.filter(_.state == WorkerState.UNKNOWN).foreach(
removeWorker(_, "Not responding for recovery"))
// 将没有发送响应信息的application移除
apps.filter(_.state == ApplicationState.UNKNOWN).foreach(finishApplication)
// Update the state of recovered apps to RUNNING
// 将回送响应的app设置为运行
apps.filter(_.state == ApplicationState.WAITING).foreach(_.state = ApplicationState.RUNNING)
// Reschedule drivers which were not claimed by any workers
// 过滤worker为空的driver
// 将因为移除未响应worker而导致worker为空的driver根据配置的属性,来决定是否自动启动
drivers.filter(_.worker.isEmpty).foreach { d =>
logWarning(s"Driver ${d.id} was not found after master recovery")
// 当前driver配置了相应的属性,由master手动进行重启动
if (d.desc.supervise) {
logWarning(s"Re-launching ${d.id}")
relaunchDriver(d)
} else {
// 没有自动启动,那么就移除
removeDriver(d.id, DriverState.ERROR, None)
logWarning(s"Did not re-launch ${d.id} because it was not supervised")
}
}
// 将Master状态设置为ALIVE
state = RecoveryState.ALIVE
// 开始调度
schedule()
logInfo("Recovery complete - resuming operations!")
}
流程总结:
(1)当一个Master在进行主备切换时被选为Leader后,从持久化引擎中读取application,worker,driver的信息
(2)将application和worker状态都设置为UNKNOWN,并向他们发送Master状态改变信息
(3)Master将那些没有发送响应的driver,worker从内存以及缓存引擎中移除,并且执行相应的清理工作
(4)如果一个driver没有对应的worker,那么如果配合了surpervise参数,那么会尝试重新启动driver
(5)将Master的状态设置为ALIVE,至此主备切换完毕