Spark源码学习(二)---Master和Worker的启动以及Actor通信流程

在《Spark源码学习(一)》中通过Spark的启动脚本,我们看到Spark启动Master的时候实际上是启动了org.apache.spark.deploy.master.Master,下面我们就从这2个类入手,通过阅读Spark的源码,了解Spark的启动流程。

1,首先看一下org.apache.spark.deploy.master.Master:

(1)从Master的main方法开始:

 val conf = new SparkConf
 val args = new MasterArguments(argStrings, conf)

val (actorSystem, _, _, _) = startSystemAndActor(args.host, args.port, args.webUiPort, conf)

(2)startSystemAndActor方法的关键代码:
val (actorSystem, boundPort) = AkkaUtils.createActorSystem(systemName, host, port, conf = conf,
  securityManager = securityMgr)
val actor = actorSystem.actorOf(
  Props(classOf[Master], host, boundPort, webUiPort, securityMgr, conf), actorName)

(3)createActorSystem关键代码:
val startService: Int => (ActorSystem, Int) = { actualPort =>
  doCreateActorSystem(name, host, actualPort, conf, securityManager)
}

这个函数执行后,会返回一个ActorSystem和被绑定的端口


(4)在(2)中actorSystem.actorOf的参数classOf[Master]:相当于java中的Master.class,此时会调用Master的构造方法和生命周期方法;----preStart()方法:

context.system.scheduler.schedule(0 millis, WORKER_TIMEOUT millis,
 self, CheckForWorkerTimeOut)
这句话是启动一个定时器,向自身发送一个CheckForWorkerTimeOut(检测Worker超时)
消息,通过查看源代码,CheckForWorkerTimeOut被定义在MasterMessages中,是一个
case object CheckForWorkerTimeOut

(5)Master的receiveWithLogging方法中:
case CheckForWorkerTimeOut => {
  timeOutDeadWorkers()
}

表示Master接收到检测超时消息后的处理,通过查看timeOutDeadWorkers的代码:会把超时的Work从内存中移除.(6)下面我们来看一下Worker是如何注册的:org.apache.spark.deploy.worker.Worker中我们直接从preStart()方法看起:registerWithMaster()表示向Master发送注册消息,关键代码:
registrationRetryTimer = Some {
  context.system.scheduler.schedule(INITIAL_REGISTRATION_RETRY_INTERVAL,
    INITIAL_REGISTRATION_RETRY_INTERVAL, self, ReregisterWithMaster)
}

向自身发送一个ReregisterWithMaster消息;---
case ReregisterWithMaster =>
  reregisterWithMaster()
---
master ! RegisterWorker(
  workerId, host, port, cores, memory, webUi.boundPort, publicAddress)

(7)Master接收到RegisterWorker消息,进行处理:
case RegisterWorker(id, workerHost, workerPort, cores, memory, workerUiPort, publicAddress) =>
......如果当前节点未注册过,把节点信息记录到内存,并返回注册成功消息,否则返回注册失败
persistenceEngine.addWorker(worker)
sender ! RegisteredWorker(masterUrl, masterWebUiUrl)
schedule()

(8)Worker接收到RegisteredWorker注册成功消息:
case RegisteredWorker(
//更新Master的地址信息并定时向自身发送心跳消息:
changeMaster(masterUrl, masterWebUiUrl)
context.system.scheduler.schedule(0 millis, HEARTBEAT_MILLIS millis, 
self, SendHeartbeat)
----自身接收到心跳信息,判断如果和Master是正常连接状态,就向Master发送一个心跳消息:
case SendHeartbeat =>
if (connected) { master ! Heartbeat(workerId) }

(9)Master接收到Worker的心跳消息:
case Heartbeat(workerId) => {
......如果内存中存在该Worker,则更新“最近一次连接成功时间”,否则向Worker发送一个重连
消息:
idToWorker.get(workerId) match {
  case Some(workerInfo) =>
    workerInfo.lastHeartbeat = System.currentTimeMillis()
  case None =>
    if (workers.map(_.id).contains(workerId)) {
      logWarning(s"Got heartbeat from unregistered worker $workerId." +
        " Asking it to re-register.")
      sender ! ReconnectWorker(masterUrl)
    } else {
      logWarning(s"Got heartbeat from unregistered worker $workerId." +
        " This worker was never registered, so ignoring the heartbeat.")
    }
}

(10)如果Worker接收到ReconnectWorker消息,则进行重连:
case ReconnectWorker(masterUrl) =>
  logInfo(s"Master with url $masterUrl requested this worker to reconnect.")
  registerWithMaster()

以上就是Spark的Master和Worker的启动以及Actor通信的主体流程!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值