34、Spark内核源码深度剖析之Master主备切换机制原理剖析与源码分析

Master实际上可以配置两个,Spark原生的standalone模式支持Master主备切换,也就是说,当Active Master节点挂掉的时候,我们可以将Stand Master切换为Active Master
Spark Master主备切换可以基于两种机制,一种是基于文件系统的,一种是基于Zookeeper的,基于文件系统的主备切换机制,需要再Active Master挂掉之后,由我们手动去切换Standby Master上,而基于Zookeeper的主备雀环机制,可以实现自动切换Master
所以,这里说的Master主备切换机制,实际上指的就是,在Active Master挂掉之后,切换到Standby Master时,会做哪些操作

流程图

13274599-8eac6084d89b0f06.png

主备切换机制原理剖析.png

源码解释

持久化引擎有ZooKeeperPersistenceEngine,FileSystemPersistenceEngine
持久化引擎创建再preStart()方法里面

    // 创建持久化引擎
    val (persistenceEngine_, leaderElectionAgent_) = RECOVERY_MODE match {
      // Zookeeper类型的持久化引擎
      case "ZOOKEEPER" =>
        logInfo("Persisting recovery state to ZooKeeper")
        val zkFactory =
          new ZooKeeperRecoveryModeFactory(conf, SerializationExtension(context.system))
        (zkFactory.createPersistenceEngine(), zkFactory.createLeaderElectionAgent(this))
      // 本地系统类型的持久化引擎
      case "FILESYSTEM" =>
        val fsFactory =
          new FileSystemRecoveryModeFactory(conf, SerializationExtension(context.system))
        (fsFactory.createPersistenceEngine(), fsFactory.createLeaderElectionAgent(this))
      // 自定义类型的持久化引擎
      case "CUSTOM" =>
        val clazz = Class.forName(conf.get("spark.deploy.recoveryMode.factory"))
        val factory = clazz.getConstructor(conf.getClass, Serialization.getClass)
          .newInstance(conf, SerializationExtension(context.system))
          .asInstanceOf[StandaloneRecoveryModeFactory]
        (factory.createPersistenceEngine(), factory.createLeaderElectionAgent(this))
      case _ =>
        (new BlackHolePersistenceEngine(), new MonarchyLeaderAgent(this))
    }
    persistenceEngine = persistenceEngine_
    leaderElectionAgent = leaderElectionAgent_
  }

使用持久化引擎去读取持久化的storedApps, storedDrivers, storedWorkers
判断,如果storedApps, storedDrivers, storedWorkers有任何一个是非空的
将持久化的Application、Driver、Worker的信息重新注册,注册到Master内部的内存缓存结构中

case ElectedLeader => {
      // 从持久化引擎中获取数据,app,driver,worker等信息
      val (storedApps, storedDrivers, storedWorkers) = persistenceEngine.readPersistedData()
      state = if (storedApps.isEmpty && storedDrivers.isEmpty && storedWorkers.isEmpty) {
        // 如果app,driver,wroker是空的,RecoveryState 设置为ALIVE
        RecoveryState.ALIVE
      } else {
        // 有一个不为空 设置为RECOVERING
        RecoveryState.RECOVERING
      }
      logInfo("I have been elected leader! New state: " + state)
      // 判断状态如果为RECOVERING 恢复中
      if (state == RecoveryState.RECOVERING) {
        // 将storedApps,storedDrier,storeWorkers重新注册到master内部缓存结构中
        beginRecovery(storedApps, storedDrivers, storedWorkers)
        recoveryCompletionTask = context.system.scheduler.scheduleOnce(WORKER_TIMEOUT millis, self,
          CompleteRecovery)
      }
    }

详细看下上面代码的beginRecovery()方法

  // 开始恢复
  def beginRecovery(storedApps: Seq[ApplicationInfo], storedDrivers: Seq[DriverInfo],
      storedWorkers: Seq[WorkerInfo]) {
    for (app <- storedApps) {
      logInfo("Trying to recover app: " + app.id)
      try {
        //重新注册application
        registerApplication(app)
        //将application状态设置为unknown
        app.state = ApplicationState.UNKNOWN
        //向driver发送masterChanged消息
        app.driver ! MasterChanged(masterUrl, masterWebUiUrl)
      } catch {
        case e: Exception => logInfo("App " + app.id + " had exception on reconnect")
      }
    }
    //将storedDrivers重新加入内存缓存中
    for (driver <- storedDrivers) {
      // Here we just read in the list of drivers. Any drivers associated with now-lost workers
      // will be re-launched when we detect that the worker is missing.
      drivers += driver
    }
    //将storedWorkers重新加入内存缓存中
    for (worker <- storedWorkers) {
      logInfo("Trying to recover worker: " + worker.id)
      try {
        //重新注册worker
        registerWorker(worker)
        //将worker状态修改为unknown
        worker.state = WorkerState.UNKNOWN
        //向work发用masterChanged
        worker.actor ! MasterChanged(masterUrl, masterWebUiUrl)
      } catch {
        case e: Exception => logInfo("Worker " + worker.id + " had exception on reconnect")
      }
    }
  }

详细看下registerWorker()和registerApplication()方法

// 注册Application
  def registerApplication(app: ApplicationInfo): Unit = {
    val appAddress = app.driver.path.address
    if (addressToApp.contains(appAddress)) {
      logInfo("Attempted to re-register application at same address: " + appAddress)
      return
    }
    //spark测量系统通注册appsource
    applicationMetricsSystem.registerSource(app.appSource)
    //将APP加入内存缓存中
    apps += app
    idToApp(app.id) = app
    actorToApp(app.driver) = app
    addressToApp(appAddress) = app
    //等待调度的队列
    waitingApps += app
  }
def registerWorker(worker: WorkerInfo): Boolean = {
    // There may be one or more refs to dead workers on this same node (w/ different ID's),
    // remove them.
    //在同一个节点上可能有一个或多个死掉的worker(不同ID),删除它们。
    workers.filter { w =>
      (w.host == worker.host && w.port == worker.port) && (w.state == WorkerState.DEAD)
    }.foreach { w =>
      workers -= w
    }

    val workerAddress = worker.actor.path.address
    if (addressToWorker.contains(workerAddress)) {
      val oldWorker = addressToWorker(workerAddress)
      if (oldWorker.state == WorkerState.UNKNOWN) {
        // A worker registering from UNKNOWN implies that the worker was restarted during recovery.
        // The old worker must thus be dead, so we will remove it and accept the new worker.
        //从UNKNOWN注册的worker意味着worker在恢复期间重新启动。
        //因此,老worker必须死亡,所以我们会把它删除并接受新的worker。
        removeWorker(oldWorker)
      } else {
        logInfo("Attempted to re-register worker at same address: " + workerAddress)
        return false
      }
    }
    //保存workerInfo到wokers(hashmap)中
    workers += worker
    //保存worker的id到idToWorker(hashmap)中
    idToWorker(worker.id) = worker
    //将work端点的地址保存起来
    addressToWorker(workerAddress) = worker
    true
  }

最后看一下completeRecovery()方法

  // 完成恢复
  def completeRecovery() {
    // Ensure "only-once" recovery semantics using a short synchronization period.
    //使用短的同步时间确保“只有一次”恢复语义。
    synchronized {
      //清理机制:1.从内存缓存结构中移除。2.从相关的组件的内存中移除。3.从持久化存储中移除
      if (state != RecoveryState.RECOVERING) { return }
      //将状态修改为正在恢复
      state = RecoveryState.COMPLETING_RECOVERY
    }

    // Kill off any workers and apps that didn't respond to us.
    // 过滤出来任何对我们没有回应的worker和Apps,根据workstate和applicationstate判断是否为unknown
    // 然后分别执行removerWorker和finishApplication,来删除worker和application
    // 删除worker
    workers.filter(_.state == WorkerState.UNKNOWN).foreach(removeWorker)
    //删除application
    apps.filter(_.state == ApplicationState.UNKNOWN).foreach(finishApplication)

    // Reschedule drivers which were not claimed by any workers
    //重新调度 那些没有回应worker的 drivers
    drivers.filter(_.worker.isEmpty).foreach { d =>
      logWarning(s"Driver ${d.id} was not found after master recovery")
      if (d.desc.supervise) {
        logWarning(s"Re-launching ${d.id}")
        //重新启动driver
        relaunchDriver(d)
      } else {
        //删除driver
        removeDriver(d.id, DriverState.ERROR, None)
        logWarning(s"Did not re-launch ${d.id} because it was not supervised")
      }
    }
    //将state转为alive,代表恢复完成
    state = RecoveryState.ALIVE
    //重新调用schedule()恢复完成
    schedule()
    logInfo("Recovery complete - resuming operations!")
  }
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值