一:Master HA解析
1.生成环境下一般采用Zookeeper做HA,且3台Master,Zookeeper会自动化管理Master的切换
2.采用Zookeeper做HA,Zookeeper负责保存Spark集群运行的元数据信息:Worker、Drivers、Applications、Executors
3.Zookeeper遇到当前Active级别的Master出现障碍的时候,从Standby Master中选出一台作为Activer Master ,但是要注意,被选举后到成为正真的Active Master之间需要从Zookeeper中获取集群当前数据的元数据信息并进行恢复。
4.Master切换过程中,所有已经运行的程序都正常运行,因为Spark Application在运行前就已经通过Cluster Manager获取了计算资源,所以在运行Job本身的调度和处理与Master没有任何关系。
5.在Master的切换过程中唯一的影响是:不能提交Job,一方面不能提交新的应用程序给集群,因为只有Active Master才能接收新的程序提交请求;另一方面,已经运行的程序也不能因为Active操作出发新的Job提交请求。
二、Master HA的四大方式
1.Master HA的四大方式:ZOOKEEPER、FILESYSTEM、CUSTOM、NONE
2.说明
(1)ZOOKEEPER:自动管理Master
(2)FILESYSTEM的方式在Master出现故障后,需要手动重新启动机器,机器启动后会立即成为Master级别的Master来对外提供服务(接收新的Job提交,接收新的应用程序提交请求)
(3)CUSTOM:允许用户自定义Master HA的实现,这对于高级用户特别重要
(4)NONE:这是默认情况,当我们下载安装了Spark集群,就是采用这种方式,该方式不会持久化集群数据,Master启动后立即管理集群。
三、源码分析
1.master启动后HA的四种方式
(1)Master类中的onStart方法
override def onStart(): Unit = {
logInfo("Starting Spark master at " + masterUrl)
logInfo(s"Running Spark version ${org.apache.spark.SPARK_VERSION}")
webUi = new MasterWebUI(this, webUiPort)
webUi.bind()
masterWebUiUrl = "http://" + masterPublicAddress + ":" + webUi.boundPort
checkForWorkerTimeOutTask = forwardMessageThread.scheduleAtFixedRate(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
self.send(CheckForWorkerTimeOut)
}
}, 0, WORKER_TIMEOUT_MS, TimeUnit.MILLISECONDS)
if (restServerEnabled) {
val port = conf.getInt("spark.master.rest.port", 6066)
restServer = Some(new StandaloneRestServer(address.host, port, conf, self, masterUrl))
}
restServerBoundPort = restServer.map(_.start())
masterMetricsSystem.registerSource(masterSource)
masterMetricsSystem.start()
applicationMetricsSystem.start()
// Attach the master and app metrics servlet handler to the web ui after the metrics systems are
// started.
masterMetricsSystem.getServletHandlers.foreach(webUi.attachHandler)
applicationMetricsSystem.getServletHandlers.foreach(webUi.attachHandler)
val serializer = new JavaSerializer(conf)
val (persistenceEngine_, leaderElectionAgent_) = RECOVERY_MODE match {
case "ZOOKEEPER" =>
logInfo("Persisting recovery state to ZooKeeper")
val zkFactory =
new ZooKeeperRecoveryModeFactory(conf, serializer)
(zkFactory.createPersistenceEngine(), zkFactory.createLeaderElectionAgent(this))
case "FILESYSTEM" =>
val fsFactory =
new FileSystemRecoveryModeFactory(conf, serializer)
(fsFactory.createPersistenceEngine(), fsFactory.createLeaderElectionAgent(this))
case "CUSTOM" =>
val clazz = Utils.classForName(conf.get("spark.deploy.recoveryMode.factory"))
val factory = clazz.getConstructor(classOf[SparkConf], classOf[Serializer])
.newInstance(conf, serializer)
.asInstanceOf[StandaloneRecoveryModeFactory]
(factory.createPersistenceEngine(), factory.createLeaderElectionAgent(this))
case _ =>
(new BlackHolePersistenceEngine(), new MonarchyLeaderAgent(this))
}
persistenceEngine = persistenceEngine_
leaderElectionAgent = leaderElectionAgent_
}
2.zookeeper的方式
(1)创建zkFactory通过类ZooKeeperRecoveryModeFactory
private[master] class ZooKeeperRecoveryModeFactory(conf: SparkConf, serializer: Serializer)
extends StandaloneRecoveryModeFactory(conf, serializer) {
//持久化
def createPersistenceEngine(): PersistenceEngine = {
new ZooKeeperPersistenceEngine(conf, serializer)
}
//leader
def createLeaderElectionAgent(master: LeaderElectable): LeaderElectionAgent = {
new ZooKeeperLeaderElectionAgent(master, conf)
}
}
(2)PersistenceEngine中的方法
abstract class PersistenceEngine {
/**
* Defines how the object is serialized and persisted. Implementation will
* depend on the store used.
*/
def persist(name: String, obj: Object): Unit
/**
* Defines how the object referred by its name is removed from the store.
*/
def unpersist(name: String): Unit
/**
* Gives all objects, matching a prefix. This defines how objects are
* read/deserialized back.
*/
def read[T: ClassTag](prefix: String): Seq[T]
final def addApplication(app: ApplicationInfo): Unit = {
persist("app_" + app.id, app)
}
final def removeApplication(app: ApplicationInfo): Unit = {
unpersist("app_" + app.id)
}
final def addWorker(worker: WorkerInfo): Unit = {
persist("worker_" + worker.id, worker)
}
final def removeWorker(worker: WorkerInfo): Unit = {
unpersist("worker_" + worker.id)
}
final def addDriver(driver: DriverInfo): Unit = {
persist("driver_" + driver.id, driver)
}
final def removeDriver(driver: DriverInfo): Unit = {
unpersist("driver_" + driver.id)
}
/**
* Returns the persisted data sorted by their respective ids (which implies that they're
* sorted by time of creation).
*/
//通过这个方法来恢复集群中的元数据
final def readPersistedData(
rpcEnv: RpcEnv): (Seq[ApplicationInfo], Seq[DriverInfo], Seq[WorkerInfo]) = {
rpcEnv.deserialize { () =>
(read[ApplicationInfo]("app_"), read[DriverInfo]("driver_"), read[WorkerInfo]("worker_"))
}
}
def close() {}
}
(3)类ZooKeeperLeaderElectionAgent中进行leader的选举
private[master] class ZooKeeperLeaderElectionAgent(val masterInstance: LeaderElectable,
conf: SparkConf) extends LeaderLatchListener with LeaderElectionAgent with Logging
start()
private def start() {
logInfo("Starting ZooKeeper LeaderElection agent")
zk = SparkCuratorUtil.newClient(conf)
leaderLatch = new LeaderLatch(zk, WORKING_DIR)
leaderLatch.addListener(this)
leaderLatch.start()
}
3.FILESYSTEM方式
(1)
/**
* LeaderAgent in this case is a no-op. Since leader is forever leader as the actual
* recovery is made by restoring from filesystem.
*/
private[master] class
FileSystemRecoveryModeFactory(conf: SparkConf, serializer: Serializer)
extends StandaloneRecoveryModeFactory(conf, serializer) with Logging {
val RECOVERY_DIR = conf.get("spark.deploy.recoveryDirectory", "")
def createPersistenceEngine(): PersistenceEngine = {
logInfo("Persisting recovery state to directory: " + RECOVERY_DIR)
new FileSystemPersistenceEngine(RECOVERY_DIR, serializer)
}
//进行leader的选举其实现方式是将传入的leader设置为leader
def createLeaderElectionAgent(master: LeaderElectable): LeaderElectionAgent = {
new MonarchyLeaderAgent(master)
}
}
(2)直接设置为leader
/** Single-node implementation of LeaderElectionAgent -- we're initially and always the leader. */
private[spark] class MonarchyLeaderAgent(val masterInstance: LeaderElectable)
extends LeaderElectionAgent {
masterInstance.electedLeader()
}
3.NONE的方式也是直接传入的leader就是leader
(1)leader选举
/** Single-node implementation of LeaderElectionAgent -- we're initially and always the leader. */
private[spark] class MonarchyLeaderAgent(val masterInstance: LeaderElectable)
extends LeaderElectionAgent {
masterInstance.electedLeader()
}
(2)持久化,但是没有实现只是空实现,这样做是为了代码的扩展
private[master] class BlackHolePersistenceEngine extends PersistenceEngine {
override def persist(name: String, obj: Object): Unit = {}
override def unpersist(name: String): Unit = {}
override def read[T: ClassTag](name: String): Seq[T] = Nil
}
三、master的恢复。
private def completeRecovery() {
// Ensure "only-once" recovery semantics using a short synchronization period.
if (state != RecoveryState.RECOVERING) { return }
state = RecoveryState.COMPLETING_RECOVERY
// Kill off any workers and apps that didn't respond to us.
workers.filter(_.state == WorkerState.UNKNOWN).foreach(removeWorker)
apps.filter(_.state == ApplicationState.UNKNOWN).foreach(finishApplication)
// Reschedule drivers which were not claimed by any workers
drivers.filter(_.worker.isEmpty).foreach { d =>
logWarning(s"Driver ${d.id} was not found after master recovery")
if (d.desc.supervise) {
logWarning(s"Re-launching ${d.id}")
relaunchDriver(d)
} else {
removeDriver(d.id, DriverState.ERROR, None)
logWarning(s"Did not re-launch ${d.id} because it was not supervised")
}
}
state = RecoveryState.ALIVE
schedule()
logInfo("Recovery complete - resuming operations!")
}