在sparkContext初始化的时候,bankend向mster发送了一个appDesc进行application的注册。master在处理注册信息之前,首先要确认两件事情,1.master用什么引擎进行持久化?2.master如果宕机了该怎么办? 我们根据源码一一来看。
找到maste类所在路径 core\src\main\scala\org\apache\spark\deploy\master\Master.scala
首先是一些基本参数的设定:
// master和worker的心跳为60s
private val WORKER_TIMEOUT_MS = conf.getLong("spark.worker.timeout", 60) * 1000
// master最多保留200个application
private val RETAINED_APPLICATIONS = conf.getInt("spark.deploy.retainedApplications", 200)
// drivre和application一一对应 所以最多也只有200个driver
private val RETAINED_DRIVERS = conf.getInt("spark.deploy.retainedDrivers", 200)
// master主备切换的模式
private val RECOVERY_MODE = conf.get("spark.deploy.recoveryMode", "NONE")
val workers = new HashSet[WorkerInfo] // master中专门存放worker的内存缓存
val apps = new HashSet[ApplicationInfo] // 存放application的内存缓存
private val drivers = new HashSet[DriverInfo] // 存放driver的内存缓存
// 默认分发application的方式为spreadOut 均匀分发
private val spreadOutApps = conf.getBoolean("spark.deploy.spreadOut", true)
// 默认最大cpu数无限大
private val defaultCores = conf.getInt("spark.deploy.defaultCores", Int.MaxValue)
master的持久化引擎选择和领导选举机制是在onstart()方法中完成的
val serializer = new JavaSerializer(conf) // 拿到序列化的配置信息
// 对master的故障恢复模式进行匹配 不同的故障恢复会有不同的持久化引擎以及master的选举机制
val (persistenceEngine_, leaderElectionAgent_) = RECOVERY_MODE match {
case "ZOOKEEPER" => // zk持久化引擎 以及选举机制
logInfo("Persisting recovery state to ZooKeeper")
val zkFactory =
new ZooKeeperRecoveryModeFactory(conf, serializer)
(zkFactory.createPersistenceEngine(), zkFactory.createLeaderElectionAgent(this))
case "FILESYSTEM" => // 分布式文件系统持久化引擎 以及 选举机制
val fsFactory =
new FileSystemRecoveryModeFactory(conf, serializer)
// FileSystemPersistenceEngine 将数据存储在单个磁盘目录中,每个app和worker只有一个文件。当app和worker被删除的时候文件同时删除
// 选举方式为LeaderElectionAgent的单节点实现 - 领导者初始终是领导者
(fsFactory.createPersistenceEngine(), fsFactory.createLeaderElectionAgent(this))
case "CUSTOM" =>
val clazz = Utils.classForName(conf.get("spark.deploy.recoveryMode.factory"))
val factory = clazz.getConstructor(classOf[SparkConf], classOf[Serializer])
.newInstance(conf, serializer)
.asInstanceOf[StandaloneRecoveryModeFactory]
(factory.createPersistenceEngine(), factory.createLeaderElectionAgent(this))
case _ =>
// 黑洞持久化引擎 直接将数据持久化到内存 帝制选举 leader始终是leader
(new BlackHolePersistenceEngine(), new MonarchyLeaderAgent(this))
}
persistenceEngine = persistenceEngine_
leaderElectionAgent = leaderElectionAgent_
}
这里以zookeeper的持久化引擎和选举机制为例来看详细代码,zk持久化引擎叫做ZooKeeperPersistenceEngine
PersistenceEngine的作用:1.在app、worker、driver向master注册的时候,持久化该信息。2.在master发送故障的时候,读取持久化的app、driver、worker信息
// 这里定义了将一个对象写入不同zk节点指定目录的方法
private def serializeIntoFile(path: String, value: AnyRef) {
val serialized = serializer.newInstance().serialize(value)
val bytes = new Array[Byte](serialized.remaining())
serialized.get(bytes)
zk.create().withMode(CreateMode.PERSISTENT).forPath(path, bytes)
}
// 定义zk工作目录
private val WORKING_DIR = conf.get("spark.deploy.zookeeper.dir", "/spark") + "/master_status"
// 创建zk客户端
private val zk: CuratorFramework = SparkCuratorUtil.newClient(conf)
// 将需要持久化的对象写入/spark/master_status这个目录下
override def persist(name: String, obj: Object): Unit = {
serializeIntoFile(WORKING_DIR + "/" + name, obj)
}
zk的选举机制 ZooKeeperLeaderElectionAgent
// 定义zk选举工作目录
val WORKING_DIR = conf.get("spark.deploy.zookeeper.dir", "/spark") + "/leader_election"
private def start() {
logInfo("Starting ZooKeeper LeaderElection agent")
// 创建zk客户端
zk = SparkCuratorUtil.newClient(conf)
// 对/spark/leader_election这个目录进行监听
leaderLatch = new LeaderLatch(zk, WORKING_DIR)
leaderLatch.addListener(this)
leaderLatch.start()
}
// 如果leader有切换 那么更改leader状态 所以zk的选举是master切换是自动的
private def updateLeadershipStatus(isLeader: Boolean) {
if (isLeader && status == LeadershipStatus.NOT_LEADER) {
status = LeadershipStatus.LEADER
masterInstance.electedLeader()
} else if (!isLeader && status == LeadershipStatus.LEADER) {
status = LeadershipStatus.NOT_LEADER
masterInstance.revokedLeadership()
}
}
接下来看看master的切换到底做了什么
// master继承了 LeaderElectable 在进行选举的时候会向自己发送ElectedLeader消息 在master的receive()方法中会接收到该消息并进行处理
override def electedLeader() {
self.send(ElectedLeader)
}
master的切换过程是在receiver()方法中完成的
// 模式匹配去处理ElectedLeader 消息
case ElectedLeader =>
// 用持久化引擎去读取持久化的 app driver worker信息
val (storedApps, storedDrivers, storedWorkers) = persistenceEngine.readPersistedData(rpcEnv)
// 判断当前master的状态
// 如果这个集合是非空的 那说明之前有在另外一个master上注册过,现在要进行master切换 所以需要把master的状态修改为 recovering
state = if (storedApps.isEmpty && storedDrivers.isEmpty && storedWorkers.isEmpty) {
RecoveryState.ALIVE
} else {
RecoveryState.RECOVERING
}
logInfo("I have been elected leader! New state: " + state)
if (state == RecoveryState.RECOVERING) {
// 如果master是recovering状态 调用beginRecovery()开始进行master切换
beginRecovery(storedApps, storedDrivers, storedWorkers)
recoveryCompletionTask = forwardMessageThread.schedule(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
// 向自己发送CompleteRecovery消息 接下来master的receiver()方法中会接受到这个消息
self.send(CompleteRecovery)
}
}, WORKER_TIMEOUT_MS, TimeUnit.MILLISECONDS)
}
看看核心方法beginRecovery()
private def beginRecovery(storedApps: Seq[ApplicationInfo], storedDrivers: Seq[DriverInfo],
storedWorkers: Seq[WorkerInfo]) {
// 遍历持久化引擎读取到的所有的application
for (app <- storedApps) {
logInfo("Trying to recover app: " + app.id)
try {
// 重新在master上进行注册
registerApplication(app)
// 并且把app的状态都修改为初始值UNKNOW
app.state = ApplicationState.UNKNOWN
// 向app对应的driver发送master改变的消息和standby master的地址 如果driver是正常工作的 那么就会对这条消息做出相应 怎么做出响应并且修改app状态?
app.driver.send(MasterChanged(self, masterWebUiUrl))
} catch {
case e: Exception => logInfo("App " + app.id + " had exception on reconnect")
}
}
for (driver <- storedDrivers) {
// Here we just read in the list of drivers. Any drivers associated with now-lost workers will be re-launched when we detect that the worker is missing.
// 当worker丢失的时候,和这个worker相关的driver都会被重启。所以这里只是把driver重新加入内存缓存 不需要手动重启
drivers += driver
}
// 遍历持久化引擎读到的所有的worker
for (worker <- storedWorkers) {
logInfo("Trying to recover worker: " + worker.id)
try {
// 重新注册一遍
registerWorker(worker)
// 修改状态为初始值UNKNOWN
worker.state = WorkerState.UNKNOWN
// 向worker发送master改变的消息和新master的地址 有效的worker会做出响应 并修改初始值
worker.endpoint.send(MasterChanged(self, masterWebUiUrl))
} catch {
case e: Exception => logInfo("Worker " + worker.id + " had exception on reconnect")
}
}
}
在完成master的切换之后,master向自己发送了CompleteRecovery消息。receive()方法中对这条消息做了处理
case CompleteRecovery => completeRecovery()
completeRecovery()方法主要是对上面重新注册的app worker 以及driver进行过滤
private def completeRecovery() {
// Ensure "only-once" recovery semantics using a short synchronization period.
// 首先判断如果状态不是处于恢复状态,那么直接返回,并且把master的状态设置为正在完成恢复
if (state != RecoveryState.RECOVERING) { return }
state = RecoveryState.COMPLETING_RECOVERY
// Kill off any workers and apps that didn't respond to us.
// master主备切换第一步:清除掉所有没有回复的worker 和 application
workers.filter(_.state == WorkerState.UNKNOWN).foreach(
removeWorker(_, "Not responding for recovery"))
apps.filter(_.state == ApplicationState.UNKNOWN).foreach(finishApplication)
// Update the state of recovered apps to RUNNING
// 第二步:将所有等待运行(重新注册)的application的状态修改为running
apps.filter(_.state == ApplicationState.WAITING).foreach(_.state = ApplicationState.RUNNING)
// Reschedule drivers which were not claimed by any workers
// 如果driver是存活状态,那么就重新调度driver 否则直接移除这个driver
drivers.filter(_.worker.isEmpty).foreach { d =>
logWarning(s"Driver ${d.id} was not found after master recovery")
if (d.desc.supervise) {
logWarning(s"Re-launching ${d.id}")
relaunchDriver(d)
} else {
removeDriver(d.id, DriverState.ERROR, None)
logWarning(s"Did not re-launch ${d.id} because it was not supervised")
}
}
// 完成mastre的切换 更改master的状态
state = RecoveryState.ALIVE
// master资源调度的核心方法
schedule()
logInfo("Recovery complete - resuming operations!")
}
总结:
master主备切换机制:Master实际上可以配置两个,原生的standalone模式是支持master主备切换的,也就是说当active master节点挂掉之后,我们可以将standby master切换为active master
Spark master的切换可以基于两种机制自己选择,一种是基于文件系统的,当active master挂掉之后需要手动切换到standby master上,另外一种是基于zookeeper 的,可以实现自动切换
master持久化引擎:
BlackHolePersistenceEngine(默认) 直接将数据持久化到内存中
FileSystemPersistenceEngine 将数据存储在单个磁盘目录中
ZooKeeperPersistenceEngine 将数据持久化到ZK中