Spark学习-2.4.0-源码分析-3-Spark 核心篇-Master注册机制原理剖析


1.Master 与 Driver, Worker, Application 注册关系阐述

如果拿一个公司来类比的话,可以这样理解:  (类比出处)

  • Master 想像成公司里的总经理
  • Driver 就是客户
  • Worker 是每个项目的技术领导
  • Executor 是实际干活的工程师

在实际情况下,他们三个会相互沟通,总经理一般都不会直接跟工程师沟通。但客户、技术领导和工程师一般都会进行沟通。

用这个例子,你就可以理解在 Spark 的世界中Master、Driver、Worker 三者会进行沟通,Executor、Driver、Worker 三者也会进行沟通,而 Master 不会直接向 Excecutor 进行沟通

Driver, Worker, Application向Master注册流程

Driver, Worker, Application向Master注册流程图

备注:

  1. Master 接受注册的对象主要是 Driver, Application 和 Worker, 需要补充说明的是 Executor 不会注册给 Master,Executor 是注册给 Driver 中的 SchedulerBackend 的

2. Worker注册流程分析

  Worker 是在启动之后主动向Master 注册的,这样设计有一个很大的好处,就是在生产环境下如果想把新的Worker 加入到已经运行的Spark 集群上,此时不需要重新启动Spark 集群就能够使用新加入的Worker 以提升处理能力。
Worker端:

  1. Worker启动时调用onStart()方法,并在里面调用registerWithMaster( )来向Master注册
  2. registerWithMaster( )又会先调用tryRegisterAllMasters( )
  3. tryRegisterAllMasters( )发送一个 RegisterWorkercase class
  4. receive()接收Master注册操作后的返回消息

Master端:

  1. receive()函数中接收到Worker发来的注册消息–RegisterWorker
  2. 判断一下当前的Master是否是standby Master
  3. 判断Worker是否已经注册过cotain(id)
  4. Master 如果决定接收注册的工人,首先会创建 WorkerInfo对象来保存注册的 Worker 的信息
  5. 接着就是注册此Worker:
      - 先过滤掉状态为DEAD的Worker,对于状态为UNKNOWN的Worker,使用removeWorker清理掉旧的Worker信息(包括清理该worker下的 Executors 和driver),替换为新的Worker信息
      - 然后将worker加入内存缓存中
  6. 使用persistenceEngine()将 Worker信息持久化
  7. send()通知Worker注册成功
  8. 调用Schedule()进行调度

Worker端:

  1. 等待接收Master返回的Response: Case RegisterWorkerResponse => handleRegisterResponse(msg)
  2. handleRegisterResponse中,如果Case RegisteredWorker,那么
    • 将当前状态修改为Registered
    • 将修改MasterRef为当前的Master
    • 定时使用masterRef.send()向Master发送HeartBeat。Master每60s查看Worker连接情况,Worker端每15s发送一次心跳(参考Spark Rpc之Master实现)
      在这里插入图片描述
    • 通过masterRef向Master发送自己的WorkerLatestState,主要之让Master去判断与Worker相关的Executor和Driver是否应该继续运行,如果不,那么Masster会通知Worker去KillExecutor、KillDriver
  3. 针对Master的消息进行Kill.
    Worker注册流程分析

Worker注册流程分析

主要代码如下:

  • worker端:
  override def onStart() {
    assert(!registered)
    logInfo("Starting Spark worker %s:%d with %d cores, %s RAM".format(
      host, port, cores, Utils.megabytesToString(memory)))
    logInfo(s"Running Spark version ${org.apache.spark.SPARK_VERSION}")
    logInfo("Spark home: " + sparkHome)
    createWorkDir()
    startExternalShuffleService()
    webUi = new WorkerWebUI(this, workDir, webUiPort)
    webUi.bind()

    workerWebUiUrl = s"http://$publicAddress:${webUi.boundPort}"
    registerWithMaster()

    metricsSystem.registerSource(workerSource)
    metricsSystem.start()
    // Attach the worker metrics servlet handler to the web ui after the metrics system is started.
    metricsSystem.getServletHandlers.foreach(webUi.attachHandler)
  }
-----------------------------------------------------------------------------
  private def registerWithMaster() {
    // onDisconnected may be triggered multiple times, so don't attempt registration
    // if there are outstanding registration attempts scheduled.
    registrationRetryTimer match {
      case None =>
        registered = false
        registerMasterFutures = tryRegisterAllMasters()
        connectionAttemptCount = 0
        registrationRetryTimer = Some(forwordMessageScheduler.scheduleAtFixedRate(
          new Runnable {
            override def run(): Unit = Utils.tryLogNonFatalError {
              Option(self).foreach(_.send(ReregisterWithMaster))
            }
          },
          INITIAL_REGISTRATION_RETRY_INTERVAL_SECONDS,
          INITIAL_REGISTRATION_RETRY_INTERVAL_SECONDS,
          TimeUnit.SECONDS))
      case Some(_) =>
        logInfo("Not spawning another attempt to register with the master, since there is an" +
          " attempt scheduled already.")
    }
  }
------------------------------------------------------------------------------
  private def tryRegisterAllMasters(): Array[JFuture[_]] = {
    masterRpcAddresses.map { masterAddress =>
      registerMasterThreadPool.submit(new Runnable {
        override def run(): Unit = {
          try {
            logInfo("Connecting to master " + masterAddress + "...")
            val masterEndpoint = rpcEnv.setupEndpointRef(masterAddress, Master.ENDPOINT_NAME)
            sendRegisterMessageToMaster(masterEndpoint)
          } catch {
            case ie: InterruptedException => // Cancelled
            case NonFatal(e) => logWarning(s"Failed to connect to master $masterAddress", e)
          }
        }
      })
    }
  }
----------------------------------------------------------------------------
  override def receive: PartialFunction[Any, Unit] = synchronized {
    case msg: RegisterWorkerResponse =>
      handleRegisterResponse(msg)
  	
  	...
  }

  private def handleRegisterResponse(msg: RegisterWorkerResponse): Unit = synchronized {
    msg match {
      case RegisteredWorker(masterRef, masterWebUiUrl, masterAddress) =>
        registered = true
        changeMaster(masterRef, masterWebUiUrl, masterAddress)

		/**
		* 定时发送Heartbeat信息
		*/
        forwordMessageScheduler.scheduleAtFixedRate(new Runnable {
          override def run(): Unit = Utils.tryLogNonFatalError {
            self.send(SendHeartbeat)
          }
        }, 0, HEARTBEAT_MILLIS, TimeUnit.MILLISECONDS)
        // 清除Worker目录
        if (CLEANUP_ENABLED) {
          logInfo(
            s"Worker cleanup enabled; old application directories will be deleted in: $workDir")
          forwordMessageScheduler.scheduleAtFixedRate(new Runnable {
            override def run(): Unit = Utils.tryLogNonFatalError {
              self.send(WorkDirCleanup)
            }
          }, CLEANUP_INTERVAL_MILLIS, CLEANUP_INTERVAL_MILLIS, TimeUnit.MILLISECONDS)
        }

        val execs = executors.values.map { e =>
          new ExecutorDescription(e.appId, e.execId, e.cores, e.state)
        }
        masterRef.send(WorkerLatestState(workerId, execs.toList, drivers.keys.toSeq))

      case RegisterWorkerFailed(message) =>
        if (!registered) {
          logError("Worker registration failed: " + message)
          System.exit(1)
        }

      case MasterInStandby =>
        // Ignore. Master not yet ready.
    }
  }
 --------------------------------------------------------------

  • Master端:
 override def receive: PartialFunction[Any, Unit] = {
 	...
     case RegisterWorker(
	      id, workerHost, workerPort, workerRef, cores, memory, workerWebUiUrl, masterAddress) =>
	      logInfo("Registering worker %s:%d with %d cores, %s RAM".format(
	        workerHost, workerPort, cores, Utils.megabytesToString(memory)))
	      if (state == RecoveryState.STANDBY) {
	        workerRef.send(MasterInStandby)
	      } else if (idToWorker.contains(id)) {
	        workerRef.send(RegisterWorkerFailed("Duplicate worker ID"))
	      } else {
	        val worker = new WorkerInfo(id, workerHost, workerPort, cores, memory,
	          workerRef, workerWebUiUrl)
	        if (registerWorker(worker)) {
	          persistenceEngine.addWorker(worker)
	          workerRef.send(RegisteredWorker(self, masterWebUiUrl, masterAddress))
	          schedule()
	        } else {
	          val workerAddress = worker.endpoint.address
	          logWarning("Worker registration failed. Attempted to re-register worker at same " +
	            "address: " + workerAddress)
	          workerRef.send(RegisterWorkerFailed("Attempted to re-register worker at same address: "
	            + workerAddress))
	        }
	      }

	   case Heartbeat(workerId, worker) =>
		      idToWorker.get(workerId) match {
		        case Some(workerInfo) =>
		          workerInfo.lastHeartbeat = System.currentTimeMillis()
		        case None =>
		          if (workers.map(_.id).contains(workerId)) {
		            logWarning(s"Got heartbeat from unregistered worker $workerId." +
		              " Asking it to re-register.")
		            worker.send(ReconnectWorker(masterUrl))
		          } else {
		            logWarning(s"Got heartbeat from unregistered worker $workerId." +
		              " This worker was never registered, so ignoring the heartbeat.")
		          }
		      }

	   case WorkerLatestState(workerId, executors, driverIds) =>
		      idToWorker.get(workerId) match {
		        case Some(worker) =>
		          for (exec <- executors) {
		            val executorMatches = worker.executors.exists {
		              case (_, e) => e.application.id == exec.appId && e.id == exec.execId
		            }
		            if (!executorMatches) {
		              // master doesn't recognize this executor. So just tell worker to kill it.
		              worker.endpoint.send(KillExecutor(masterUrl, exec.appId, exec.execId))
		            }
		          }
		
		          for (driverId <- driverIds) {
		            val driverMatches = worker.drivers.exists { case (id, _) => id == driverId }
		            if (!driverMatches) {
		              // master doesn't recognize this driver. So just tell worker to kill it.
		              worker.endpoint.send(KillDriver(driverId))
		            }
		          }
		        case None =>
		          logWarning("Worker state from unknown worker: " + workerId)
		      }
      ....
}
------------------------------------------------------------

private[spark] class WorkerInfo(
    val id: String,
    val host: String,
    val port: Int,
    val cores: Int,
    val memory: Int,
    val endpoint: RpcEndpointRef,
    val webUiAddress: String)
  extends Serializable {...}
  
--------------------------------------------------------------
  val workers = new HashSet[WorkerInfo]

  private val idToWorker = new HashMap[String, WorkerInfo]
  private val addressToWorker = new HashMap[RpcAddress, WorkerInfo]

  private def registerWorker(worker: WorkerInfo): Boolean = {
    // There may be one or more refs to dead workers on this same node (w/ different ID's),
    // remove them.
    workers.filter { w =>
      (w.host == worker.host && w.port == worker.port) && (w.state == WorkerState.DEAD)
    }.foreach { w =>
      workers -= w
    }

    val workerAddress = worker.endpoint.address
    if (addressToWorker.contains(workerAddress)) {
      val oldWorker = addressToWorker(workerAddress)
      if (oldWorker.state == WorkerState.UNKNOWN) {
        // A worker registering from UNKNOWN implies that the worker was restarted during recovery.
        // The old worker must thus be dead, so we will remove it and accept the new worker.
        removeWorker(oldWorker, "Worker replaced by a new worker with same address")
      } else {
        logInfo("Attempted to re-register worker at same address: " + workerAddress)
        return false
      }
    }

    workers += worker
    idToWorker(worker.id) = worker
    addressToWorker(workerAddress) = worker
    true
  }
-------------------------------------------------------------------------------------------

3. Application注册流程分析

  Application注册是基于Driver的(Application应用程序运行基于Driver进程);
也就是说,在Driver进程中,首先就需要去构造SparkContext,而在此过程中,TaskScheduler会通过一个后台进程去连接Master,向Master注册Application.

而这个后台的进程是在SchedulerBackend的start()方法中创建的:

在这里插入图片描述

Application注册流程分析

主要代码如下:

Application端(ClientEndpoint):

StandaloneAppClient.scala

    override def onStart(): Unit = {
      try {
        registerWithMaster(1)
      } catch {
        case e: Exception =>
          logWarning("Failed to connect to master", e)
          markDisconnected()
          stop()
      }
    }

    /**
     *  Register with all masters asynchronously and returns an array `Future`s for cancellation.
     */
    private def tryRegisterAllMasters(): Array[JFuture[_]] = {
      for (masterAddress <- masterRpcAddresses) yield {
        registerMasterThreadPool.submit(new Runnable {
          override def run(): Unit = try {
            if (registered.get) {
              return
            }
            logInfo("Connecting to master " + masterAddress.toSparkURL + "...")
            val masterRef = rpcEnv.setupEndpointRef(masterAddress, Master.ENDPOINT_NAME)
            masterRef.send(RegisterApplication(appDescription, self))
          } catch {
            case ie: InterruptedException => // Cancelled
            case NonFatal(e) => logWarning(s"Failed to connect to master $masterAddress", e)
          }
        })
      }
    }

    /**
     * Register with all masters asynchronously. It will call `registerWithMaster` every
     * REGISTRATION_TIMEOUT_SECONDS seconds until exceeding REGISTRATION_RETRIES times.
     * Once we connect to a master successfully, all scheduling work and Futures will be cancelled.
     *
     * nthRetry means this is the nth attempt to register with master.
     */
    private def registerWithMaster(nthRetry: Int) {
      registerMasterFutures.set(tryRegisterAllMasters())
      registrationRetryTimer.set(registrationRetryThread.schedule(new Runnable {
        override def run(): Unit = {
          if (registered.get) {
            registerMasterFutures.get.foreach(_.cancel(true))
            registerMasterThreadPool.shutdownNow()
          } else if (nthRetry >= REGISTRATION_RETRIES) {
            markDead("All masters are unresponsive! Giving up.")
          } else {
            registerMasterFutures.get.foreach(_.cancel(true))
            registerWithMaster(nthRetry + 1)
          }
        }
      }, REGISTRATION_TIMEOUT_SECONDS, TimeUnit.SECONDS))
    }
-------------------------------------------------------------------------------
    override def receive: PartialFunction[Any, Unit] = {
      case RegisteredApplication(appId_, masterRef) =>
        // FIXME How to handle the following cases?
        // 1. A master receives multiple registrations and sends back multiple
        // RegisteredApplications due to an unstable network.
        // 2. Receive multiple RegisteredApplication from different masters because the master is
        // changing.
        appId.set(appId_)
        registered.set(true)
        master = Some(masterRef)
        listener.connected(appId.get)

Master端:

  override def receive: PartialFunction[Any, Unit] = {
	...
    case RegisterApplication(description, driver) =>
      // TODO Prevent repeated registrations from some driver
      if (state == RecoveryState.STANDBY) {
        // ignore, don't send response
      } else {
        logInfo("Registering app " + description.name)
        val app = createApplication(description, driver)
        registerApplication(app)
        logInfo("Registered app " + description.name + " with ID " + app.id)
        persistenceEngine.addApplication(app)
        driver.send(RegisteredApplication(app.id, self))
        schedule()
      }
	...	
}
------------------------------------------------------------

private[spark] class ApplicationInfo(
    val startTime: Long,
    val id: String,
    val desc: ApplicationDescription,
    val submitDate: Date,
    val driver: RpcEndpointRef,
    defaultCores: Int)
  extends Serializable {...}
  
-----------------------------------------------------

  val idToApp = new HashMap[String, ApplicationInfo]
  private val waitingApps = new ArrayBuffer[ApplicationInfo]
  val apps = new HashSet[ApplicationInfo]
  
  private val endpointToApp = new HashMap[RpcEndpointRef, ApplicationInfo]
  private val addressToApp = new HashMap[RpcAddress, ApplicationInfo]

  private def registerApplication(app: ApplicationInfo): Unit = {
    val appAddress = app.driver.address
    if (addressToApp.contains(appAddress)) {
      logInfo("Attempted to re-register application at same address: " + appAddress)
      return
    }

    applicationMetricsSystem.registerSource(app.appSource)
    apps += app
    idToApp(app.id) = app
    endpointToApp(app.driver) = app
    addressToApp(appAddress) = app
    waitingApps += app
  }


4.Driver注册流程分析

 由于这部分较为复杂,且涉及了两种提交网关,因此这部分在另外一篇文章中进行了详尽的描述。请移驾Spark学习-2.4.0-源码分析-3-Spark 核心篇-Spark Submit任务提交中的提交网关 :“RestSubmissionClient” && “Client”


致谢

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值