Master实际上是可以配置两个的,Standalone模式下也支持主备切换,也就是说当Active Master节点挂掉的时候,standby Master就会切换为Active Master。
Spark Master主备切换一般常用的有两种机制:一个是基于文件系统的;一个是基于Zookeeper。基于文件系统的主备切换闷在主Master节点挂掉之后,需要手动切换到Standby节点上;而基于Zookeeper可以实现自动切换。
下面我们看看在主备切换的时候,Master做了什么事情,主要流程如下所示,然后针对流程对源码进行分析:
下面我们针对上述流程来看源码:
在发生主备切换的时候,Master会接收到ElectedLeader消息:
override def receive: PartialFunction[Any, Unit] = {
case ElectedLeader => {
// 从持久化引擎中读取storedApps、storedDrivers、storedWorkers
val (storedApps, storedDrivers, storedWorkers) = persistenceEngine.readPersistedData(rpcEnv)
// 判断这三个状态
state = if (storedApps.isEmpty && storedDrivers.isEmpty && storedWorkers.isEmpty) {
// 如果都是空的说明,Active Master还存活,不需要进行切换
RecoveryState.ALIVE
} else {
// 将状态改变为,recovering状态
RecoveryState.RECOVERING
}
logInfo("I have been elected leader! New state: " + state)
if (state == RecoveryState.RECOVERING) {
// 开始进行状态切换
beginRecovery(storedApps, storedDrivers, storedWorkers)
// 状态切换完成,就发送一个CompleteRecovery消息
recoveryCompletionTask = forwardMessageThread.schedule(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
self.send(CompleteRecovery)
}
}, WORKER_TIMEOUT_MS, TimeUnit.MILLISECONDS)
}
}
// 接收到状态切换完成的消息,调用completeRecovery()方法过滤掉已经挂掉的节点信息
// 并使用schedule()方法,重新进行资源调度
case CompleteRecovery => completeRecovery()
// 省略代码若干
..................
}
从上面的源码可以看出,当发生主备切换的时候,Standby Master会接收到一个ElectedLeader的消息,standby master接收到之后,从持久化引擎中读取App、Driver和Worker的信息,假如读取到了,那么说明active master挂掉了,那么这时候,standby master,状态改变为recovering,开始进行master节点恢复。
首先调用beginRecovery()对App、Worker和Driver进行恢复,发送MasterChanged的消息给App Driver和Worker,并将当前App和Worker状态更新为UNKNOW;
等待接收到反馈的消息之后,就会发送一个CompleteRecovery的消息,master在接收到这个消息之后,就调用completeRecovery()方法,对App、Worker进行过滤,将除了UNKNOW的App和Worker进行移除(过滤掉出故障的或者已经完成的Application),接着在调用schedule()方法重新进行资源调度。
下面是beginRecovery()和completeRecovery()方法,配上相关代码注释帮助理解:
private def beginRecovery(storedApps: Seq[ApplicationInfo], storedDrivers: Seq[DriverInfo],
storedWorkers: Seq[WorkerInfo]) {
// 遍历Application
for (app <- storedApps) {
logInfo("Trying to recover app: " + app.id)
try {
// 对app信息进行封装,保存Application的相关信息
registerApplication(app)
// 状态更改为UNKNOW
app.state = ApplicationState.UNKNOWN
// 向Driver发送MasterChanged信息,包括了master的地址
app.driver.send(MasterChanged(self, masterWebUiUrl))
} catch {
case e: Exception => logInfo("App " + app.id + " had exception on reconnect")
}
}
// driver信息就保存在本地缓存即可
for (driver <- storedDrivers) {
// Here we just read in the list of drivers. Any drivers associated with now-lost workers
// will be re-launched when we detect that the worker is missing.
drivers += driver
}
// 遍历Worker信息,状态更改和向worker发送消息
for (worker <- storedWorkers) {
logInfo("Trying to recover worker: " + worker.id)
try {
registerWorker(worker)
worker.state = WorkerState.UNKNOWN
worker.endpoint.send(MasterChanged(self, masterWebUiUrl))
} catch {
case e: Exception => logInfo("Worker " + worker.id + " had exception on reconnect")
}
}
}
上面代码就是beginRecovery(),对App和Worker进行状态改变为UNKNOW,并发送standby master的URL地址到对应的worker和Application Driver节点上。
当注册结束之后,当前状态就会变为CompleteRecovery,standby master接收到这个消息之后,就调用如下方法:
private def completeRecovery() {
// Ensure "only-once" recovery semantics using a short synchronization period.
if (state != RecoveryState.RECOVERING) { return }
// 更改状态
state = RecoveryState.COMPLETING_RECOVERY
// Kill off any workers and apps that didn't respond to us.
// 过滤掉目前状态还是UNKONW的App和Worker
// 删除出故障或死掉的worker,或完成的Application
// 总结一下清理机制:1、从内存缓存结构中移除;2、从相关组件的内存缓存中移除;3、从持久化存储中移除
workers.filter(_.state == WorkerState.UNKNOWN).foreach(removeWorker)
apps.filter(_.state == ApplicationState.UNKNOWN).foreach(finishApplication)
// Reschedule drivers which were not claimed by any workers
// 如果Driver为空,那么假如配置了supervise这个选项,就会尝试启动Driver
// 否则就移除Driver
drivers.filter(_.worker.isEmpty).foreach { d =>
logWarning(s"Driver ${d.id} was not found after master recovery")
if (d.desc.supervise) {
logWarning(s"Re-launching ${d.id}")
relaunchDriver(d)
} else {
removeDriver(d.id, DriverState.ERROR, None)
logWarning(s"Did not re-launch ${d.id} because it was not supervised")
}
}
// Standalone Master状态变为 Active Master
state = RecoveryState.ALIVE
// 重新进行资源调度
schedule()
logInfo("Recovery complete - resuming operations!")
}
在调用completeRecovery()方法之后,首先将master状态更改为completing_recovery,接着过滤掉不是UNKNOW的App和Worker,这里过滤机制主要是:从内存缓存中移除掉相关信息;从相关组件中移除;从持久化存储中移除。接着判断Driver上是否是空节点,假如是,并且设置了supervise,那么尝试重新launch,否则就删除。
最后将standalone master 变为 active master,这里就master就切换完成,接着调用schedule()重新进行资源调度。