Spark 集群启动的流程

1.简介

源码版本spark-2.4.0
Spark 作为目前大数据处理的主流架构,吸引了不少人去学习它,本人也不例外,对Spark抱有强烈的好奇心,因此对Spark在执行我们编写的程序时,到底是怎么运行的,是我一直想要搞明白的事情。所以从本篇博客开始,我就详细的介绍Spark执行程序的流程。一方面为了巩固自己所学的知识,另一方面也是为了抛转引玉,希望能有更多的人来介绍Spark.
本篇博客为本系列的第一篇,Spark集群的启动以Standalone模式为例。欢迎不吝赐教。
我们都知道,在我们搭建好Spark 的环境后,要想运行我们的程序,首先必须要启动整个集群环境,这样我们才能提交程序,进而运行程序,得到我们想要的结果。因此我们麾下$SPARK_HOME/sbin/目录下执行start-all.sh 的命令,这个命令其实包含三个部分,一个是spark-config.sh用来检车配置参数的信息;一个是start-master.sh,用来启动master;还有一个是start-slaves.sh,用来启动worker。

2.master的启动

启动master 其实就是调用start-master.sh的脚本,下面来看一下这个脚本的内容。

#Starts the master on the machine this script is executed on.
//首先判断SPARK_HOME是否存在
if [ -z "${SPARK_HOME}" ]; then
  export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi

#NOTE: This exact class name is matched downstream by SparkSubmit.
#Any changes need to be reflected there.
CLASS="org.apache.spark.deploy.master.Master"

if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
  echo "Usage: ./sbin/start-master.sh [options]"
  pattern="Usage:"
  pattern+="\|Using Spark's default log4j profile:"
  pattern+="\|Registered signal handlers for"

  "${SPARK_HOME}"/bin/spark-class $CLASS --help 2>&1 | grep -v "$pattern" 1>&2
  exit 1
fi

ORIGINAL_ARGS="$@"
检查配置文件
. "${SPARK_HOME}/sbin/spark-config.sh"
检车环境参数
. "${SPARK_HOME}/bin/load-spark-env.sh"
Master的端口号为7077
if [ "$SPARK_MASTER_PORT" = "" ]; then
  SPARK_MASTER_PORT=7077
fi

if [ "$SPARK_MASTER_HOST" = "" ]; then
  case `uname` in
      (SunOS)
          SPARK_MASTER_HOST="`/usr/sbin/check-hostname | awk '{print $NF}'`"
          ;;
      (*)
          SPARK_MASTER_HOST="`hostname -f`"
          ;;
  esac
fi

if [ "$SPARK_MASTER_WEBUI_PORT" = "" ]; then
  SPARK_MASTER_WEBUI_PORT=8080
fi
直接启动CLASS,也就是上面的org.apache.spark.deploy.master.Master,
"${SPARK_HOME}/sbin"/spark-daemon.sh start $CLASS 1 \
  --host $SPARK_MASTER_HOST --port $SPARK_MASTER_PORT --webui-port $SPARK_MASTER_WEBUI_PORT \
  $ORIGINAL_ARGS

上面的启动start-master.sh其实做了两件事,一件事是检查配置文件和环境参数,另外一个是调用org.apache.spark.deploy.master.Master,接下来进入这个类里面:
首先进入main函数:

def main(argStrings: Array[String]) {
  Thread.setDefaultUncaughtExceptionHandler(new SparkUncaughtExceptionHandler(
    exitOnUncaughtException = false))
  Utils.initDaemon(log)
//创建了Spark 的config文件
  val conf = new SparkConf
//定义了Master的参数
  val args = new MasterArguments(argStrings, conf)
//创建RpcEnv环境,启动Rpc底层服务,为了和其他的Endpoint通信
  val (rpcEnv, _, _) = startRpcEnvAndEndpoint(args.host, args.port, args.webUiPort, conf)
//有消息处理就处理,没有就阻塞等待
  rpcEnv.awaitTermination()
}

可以看到在main函数中最主要的就是创建rpcEnv的环境,启动NettyRpcEnv环境,master向NettyRpcEnv注册,形成masterEndpoint。
接下里进入到startRpcEnvAndEndpoint方法:

def startRpcEnvAndEndpoint(
    host: String,
    port: Int,
    webUiPort: Int,
    conf: SparkConf): (RpcEnv, Int, Option[Int]) = {
//创建SecurityManager
  val securityMgr = new SecurityManager(conf)
//创建RpcEnv,其实是创建NettyRpcEnv
  val rpcEnv = RpcEnv.create(SYSTEM_NAME, host, port, conf, securityMgr)
//向NettyRpcEnv注册,形成MasterEndpoint
  val masterEndpoint = rpcEnv.setupEndpoint(ENDPOINT_NAME,
    new Master(rpcEnv, rpcEnv.address, webUiPort, securityMgr, conf))
  val portsResponse = masterEndpoint.askSync[BoundPortsResponse](BoundPortsRequest)
  (rpcEnv, portsResponse.webUIPort, portsResponse.restPort)
}

这里主要完成的事情:
1,创建RpcEnv的通信环境,其实Rpc 的通信实现是NettyRpcEnv;
2,master 向NettyRpcEnv注册,调用setupEndpoint,根据出入的参数形成masterEndpoint;这里涉及到Spark 的RPC通信机制,在本篇博客里不做介绍,会在后面的博客里介绍。Master作为一个Endpoint的实例,它的生命周期同样是Onstart->reveive or reveiveAndreply->onStop;
这样,Master就已经启动了。

3,worker的启动

Worker的启动是执行脚本的顺序为start-slaves.sh->slaves.sh->start-slave.sh,最后调用org.apache.spark.deploy.worker.Worker。
接下来进入到org.apache.spark.deploy.worker.Worker类中。首先看一下main函数

def main(argStrings: Array[String]) {
  Thread.setDefaultUncaughtExceptionHandler(new SparkUncaughtExceptionHandler(
    exitOnUncaughtException = false))
  Utils.initDaemon(log)
//创建配置文件
  val conf = new SparkConf
//创建Worker的参数
  val args = new WorkerArguments(argStrings, conf)
//创建RpcEnv的通信环境,并建立workerEndpoint的实例,用于和其他的Endpoint通信
  val rpcEnv = startRpcEnvAndEndpoint(args.host, args.port, args.webUiPort, args.cores,
    args.memory, args.masters, args.workDir, conf = conf)
  // With external shuffle service enabled, if we request to launch multiple workers on one host,
  // we can only successfully launch the first worker and the rest fails, because with the port
  // bound, we may launch no more than one external shuffle service on each host.
  // When this happens, we should give explicit reason of failure instead of fail silently. For
  // more detail see SPARK-20989.
  val externalShuffleServiceEnabled = conf.get(config.SHUFFLE_SERVICE_ENABLED)
  val sparkWorkerInstances = scala.sys.env.getOrElse("SPARK_WORKER_INSTANCES", "1").toInt
  require(externalShuffleServiceEnabled == false || sparkWorkerInstances <= 1,
    "Starting multiple workers on one host is failed because we may launch no more than one " +
      "external shuffle service on each host, please set spark.shuffle.service.enabled to " +
      "false or set SPARK_WORKER_INSTANCES to 1 to resolve the conflict.")
//阻塞等待
  rpcEnv.awaitTermination()
}

可以看到启动流程是和Master的启动类似的,都需要创建一个RpcEnv的环境,并向RpcEnv注册,形成一个Endpoint,返回一个EndpointRef的实例,用来和其他的Endpoint通信。接下来进入到startRpcEnvAndEndpoint方法中:

def startRpcEnvAndEndpoint(
    host: String,
    port: Int,
    webUiPort: Int,
    cores: Int,
    memory: Int,
    masterUrls: Array[String],
    workDir: String,
    workerNumber: Option[Int] = None,
    conf: SparkConf = new SparkConf): RpcEnv = {

  // The LocalSparkCluster runs multiple local sparkWorkerX RPC Environments
  val systemName = SYSTEM_NAME + workerNumber.map(_.toString).getOrElse("")
//创建SecurityManager
  val securityMgr = new SecurityManager(conf)
//根据传入的参数创建RpcEnv,其实也是创建NettyRpcEnv
  val rpcEnv = RpcEnv.create(systemName, host, port, conf, securityMgr)
//创建Endpoint
  val masterAddresses = masterUrls.map(RpcAddress.fromSparkURL(_))
  rpcEnv.setupEndpoint(ENDPOINT_NAME, new Worker(rpcEnv, webUiPort, cores, memory,
    masterAddresses, ENDPOINT_NAME, workDir, conf, securityMgr))
  rpcEnv
}

4.Worker 向Master 注册

Worker也是Endpoint的一个实例,它的声明周期和Master一样,都是经历Onstart->reveive or reveiveAndreply->onStop,worker启动后,还需要和Master进行通信,也就是worker需要向Master进行注册。
进入onStart的方法:

override def onStart() {
//假定这个worker还没有向Master注册
  assert(!registered)
  logInfo("Starting Spark worker %s:%d with %d cores, %s RAM".format(
    host, port, cores, Utils.megabytesToString(memory)))
  logInfo(s"Running Spark version ${org.apache.spark.SPARK_VERSION}")
  logInfo("Spark home: " + sparkHome)
//创建Worker的工作目录
  createWorkDir()
//启动外部的Shuffle服务
  startExternalShuffleService()
//创建Web的端口
  webUi = new WorkerWebUI(this, workDir, webUiPort)
//绑定端口
  webUi.bind()

  workerWebUiUrl = shttp://$publicAddress:${webUi.boundPort}
//向Master注册
  registerWithMaster()
//根据Worker的参数注册测量系统
  metricsSystem.registerSource(workerSource)
//启动测量系统
  metricsSystem.start()
  // Attach the worker metrics servlet handler to the web ui after the metrics system is started.
  metricsSystem.getServletHandlers.foreach(webUi.attachHandler)
}

上面完成Worker工作的一些参数,最终要是registerWithMaster,worker需要向Master进行注册,然后两者之间才能正常的通信。看一下registerWithMaster方法:

private def registerWithMaster() {
  // onDisconnected may be triggered multiple times, so don't attempt registration
  // if there are outstanding registration attempts scheduled.
  registrationRetryTimer match {
    case None =>
//worker没有注册
      registered = false
      //向所有的Master进行注册
registerMasterFutures = tryRegisterAllMasters()
//尝试连接的次数
      connectionAttemptCount = 0
//重试次数如果超时,就退出
      registrationRetryTimer = Some(forwordMessageScheduler.scheduleAtFixedRate(
        new Runnable {
          override def run(): Unit = Utils.tryLogNonFatalError {
            Option(self).foreach(_.send(ReregisterWithMaster))
          }
        },
        INITIAL_REGISTRATION_RETRY_INTERVAL_SECONDS,
        INITIAL_REGISTRATION_RETRY_INTERVAL_SECONDS,
        TimeUnit.SECONDS))
    case Some(_) =>
      logInfo("Not spawning another attempt to register with the master, since there is an" +
        " attempt scheduled already.")
  }
}

进入到tryRegisterAllMasters的方法中,

private def tryRegisterAllMasters(): Array[JFuture[_]] = {
//遍历所有的MasterRpcAddresses,
  masterRpcAddresses.map { masterAddress =>
    registerMasterThreadPool.submit(new Runnable {
      override def run(): Unit = {
        try {
          logInfo("Connecting to master " + masterAddress + "...")
//根据传入的参数创建一个EndpointRef的一个实例,用于向Endpoint发送消息
          val masterEndpoint = rpcEnv.setupEndpointRef(masterAddress, Master.ENDPOINT_NAME)
//通过EndpointRef向Master发送注册的信息。
          sendRegisterMessageToMaster(masterEndpoint)
        } catch {
          case ie: InterruptedException => // Cancelled
          case NonFatal(e) => logWarning(s"Failed to connect to master $masterAddress", e)
        }
      }
    })
  }
}

上面主要是遍历所有的Master的地址,创建相应的EndpointRef,然后向master 发送注册的消息。
再进入到sendRegisterMessageToMaster的方法:

private def sendRegisterMessageToMaster(masterEndpoint: RpcEndpointRef): Unit = {
/向Master发送RegisterWorker的消息
  masterEndpoint.send(RegisterWorker(
    workerId,
    host,
    port,
    self,
    cores,
    memory,
    workerWebUiUrl,
    masterEndpoint.address))
}

Worker 根据自身的一些参数,以及Master的地址,向Master发总注册的消息。Master在接收到worker发来的消息后,因为发送是用send的方法,所以调用receive的模式匹配消息。接下来,我们就看看Master 在接收到worker发送来的消息后,是如何处理的。进入到Master的receive方法里面的case registerworker。

case RegisterWorker(
  id, workerHost, workerPort, workerRef, cores, memory, workerWebUiUrl, masterAddress) =>
  logInfo("Registering worker %s:%d with %d cores, %s RAM".format(
    workerHost, workerPort, cores, Utils.megabytesToString(memory)))
//首先判断master的状态,如果是SRANDBY,就向Worker发送master处于STANDBY的状态
  if (state == RecoveryState.STANDBY) {
    workerRef.send(MasterInStandby)
//如果这个worker的ID已经存在,则向master发送注册失败的消息
  } else if (idToWorker.contains(id)) {
    workerRef.send(RegisterWorkerFailed("Duplicate worker ID"))
  } else {
//如果不是上面的两种情况,那就正常注册worker,把worker的信息封装成WorkerInfo,赋给worker
    val worker = new WorkerInfo(id, workerHost, workerPort, cores, memory,
      workerRef, workerWebUiUrl)
//根据参数,现在开始注册worker
    if (registerWorker(worker)) {
//把worker用持久化引擎进行持久化
      persistenceEngine.addWorker(worker)
//向worker发送消息,根据情况,是发送失败还是成功的消息
      workerRef.send(RegisteredWorker(self, masterWebUiUrl, masterAddress))
//最后进行资源的调度
      schedule()
    } else {
      val workerAddress = worker.endpoint.address
      logWarning("Worker registration failed. Attempted to re-register worker at same " +
        "address: " + workerAddress)
      workerRef.send(RegisterWorkerFailed("Attempted to re-register worker at same address: "
        + workerAddress))
    }
  }

再进入到registerWorker的方法中:

private def registerWorker(worker: WorkerInfo): Boolean = {
  // There may be one or more refs to dead workers on this same node (w/ different ID's),
  // remove them.
//把处于DEAD状态的worker过滤掉
  workers.filter { w =>
    (w.host == worker.host && w.port == worker.port) && (w.state == WorkerState.DEAD)
  }.foreach { w =>
    workers -= w
  }

  val workerAddress = worker.endpoint.address
//如果已经包含了这个worker的地址,把这个旧的worker移除
  if (addressToWorker.contains(workerAddress)) {
    val oldWorker = addressToWorker(workerAddress)
    if (oldWorker.state == WorkerState.UNKNOWN) {
      // A worker registering from UNKNOWN implies that the worker was restarted during recovery.
      // The old worker must thus be dead, so we will remove it and accept the new worker.
      removeWorker(oldWorker, "Worker replaced by a new worker with same address")
    } else {
      logInfo("Attempted to re-register worker at same address: " + workerAddress)
      return false
    }
  }
//更新变量,把worker加入到workers
  workers += worker
//加入worker的ID
  idToWorker(worker.id) = worker
//加入worker的地址
  addressToWorker(workerAddress) = worker
  true
}

Worker 注册完毕后会master会想worker发送消息,告知worker是否注册成功。调用workerRef.send(RegisteredWorker(…))的方法,worker 在接收到消息后是如何处理的,接下来看一下Worker中receive方法中的case RegistereWorkerResponse

case msg: RegisterWorkerResponse =>
//调用下面的方法来进行处理
  handleRegisterResponse(msg)

进入到handleRegisterResponse方法:

private def handleRegisterResponse(msg: RegisterWorkerResponse): Unit = synchronized {
  msg match {
//master 成功注册worker
    case RegisteredWorker(masterRef, masterWebUiUrl, masterAddress) =>
      if (preferConfiguredMasterAddress) {
        logInfo("Successfully registered with master " + masterAddress.toSparkURL)
      } else {
        logInfo("Successfully registered with master " + masterRef.address.toSparkURL)
      }
//设置成功注册的标志
      registered = true
//更新master的映射信息
      changeMaster(masterRef, masterWebUiUrl, masterAddress)
      forwordMessageScheduler.scheduleAtFixedRate(new Runnable {
        override def run(): Unit = Utils.tryLogNonFatalError {
//向master发送心跳
          self.send(SendHeartbeat)
        }
      }, 0, HEARTBEAT_MILLIS, TimeUnit.MILLISECONDS)
      if (CLEANUP_ENABLED) {
        logInfo(
          s"Worker cleanup enabled; old application directories will be deleted in: $workDir")
        forwordMessageScheduler.scheduleAtFixedRate(new Runnable {
          override def run(): Unit = Utils.tryLogNonFatalError {
//worker向本身发送WorkDirCleanup消息,进行清理旧的工作目录
            self.send(WorkDirCleanup)
          }
        }, CLEANUP_INTERVAL_MILLIS, CLEANUP_INTERVAL_MILLIS, TimeUnit.MILLISECONDS)
      }

      val execs = executors.values.map { e =>
        new ExecutorDescription(e.appId, e.execId, e.cores, e.state)
      }
//向Master发送自己最新的状态
      masterRef.send(WorkerLatestState(workerId, execs.toList, drivers.keys.toSeq))

Worker会每隔段时间就会想Master发送心跳,master会专门启动一个线程每隔段时间就检查一下上次接收到的worker的时间,一旦这个时间超时了,那么就会把这个worker移除。
以上就建立了完成了master和worker的启动,以及两者之间的通信。通过心跳机制来检测worker是否处于alive状态,如果处于DEAD的状态就会把这个worker从映射中移除。

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值