spark启动的worker节点是localhost_「Spark源码分析1」Spark standalone模式Master和Worker启动流程...

1.简述

spark 源码分析第一篇,准备从最基本的集群搭建入手,全面剖析spark。希望自己能对spark又更深入的理解。希望对读者有所帮助。言归正传,Spark standalone 模式,架构图:

cc8e05d8d631874d54abc28107e33742.png

这里先讨论Master和worker启动,以及之间的通讯:worker向master注册,worker向master发送heartbeat。

2.Master及启动流程

继承ThreadSafeRpcEndpoint类。启动master,会执行它自己的onStart函数。

2.1.执行start-master.sh脚本,->spark-daemon.sh "org.apache.spark.deploy.master.Master" -> spark-class start ->org.apache.spark.launcher.Main ->org.apache.spark.deploy.master.Master

2.2.调用Master的main方法,执行

val (rpcEnv, _, _) = startRpcEnvAndEndpoint(args.host, args.port, args.webUiPort, conf)

在此方法里创建RpcEnv对象和master对象:

def startRpcEnvAndEndpoint( host: String, port: Int, webUiPort: Int, conf: SparkConf): (RpcEnv, Int, Option[Int]) = { val securityMgr = new SecurityManager(conf) val rpcEnv = RpcEnv.create(SYSTEM_NAME, host, port, conf, securityMgr) val masterEndpoint = rpcEnv.setupEndpoint(ENDPOINT_NAME, new Master(rpcEnv, rpcEnv.address, webUiPort, securityMgr, conf)) val portsResponse = masterEndpoint.askSync[BoundPortsResponse](BoundPortsRequest) (rpcEnv, portsResponse.webUIPort, portsResponse.restPort) }

2.3.创建master对象,执行onstart方法。

(1).onstart函数做了如下的事情:

启动web UI

发送CheckForWorkerTimeOut消息给自己,移除超时的worker。

根据参数判断是否启动rest的接口,

注册master resources到master MetricsSystem

根据参数指定的recovery mode进行恢复。

(2) receive 函数接收了哪些消息

ElectedLeader

CompleteRecovery

RevokedLeadership 收回leader权力

RegisterWorker 注册worker :注册成功后,发送RegisteredWorker消息给worker

RegisterApplication 注册application

ExecutorStateChanged

DriverStateChanged

Heartbeat

MasterChangeAcknowledged

WorkerSchedulerStateResponse

WorkerLatestState

CheckForWorkerTimeOut

RequestSubmitDriver 请求提交driver

RequestKillDriver

RequestDriverStatus

RequestMasterState

BoundPortsRequest

3.worker及启动流程

同样继承ThreadSafeRpcEndpoint类.

3.1.执行start-slave.sh脚本 -> spark-daemon.sh start "org.apache.spark.deploy.worker.Worker" ->-> spark-class start ->org.apache.spark.launcher.Main ->org.apache.spark.deploy.worker.Worker

3.2.执行Worker的main方法

启动rpcEnv对象和Worker对象。

val rpcEnv = startRpcEnvAndEndpoint(args.host, args.port, args.webUiPort, args.cores, args.memory, args.masters, args.workDir, conf = conf)
def startRpcEnvAndEndpoint( host: String, port: Int, webUiPort: Int, cores: Int, memory: Int, masterUrls: Array[String], workDir: String, workerNumber: Option[Int] = None, conf: SparkConf = new SparkConf): RpcEnv = {  // The LocalSparkCluster runs multiple local sparkWorkerX RPC Environments val systemName = SYSTEM_NAME + workerNumber.map(_.toString).getOrElse("") val securityMgr = new SecurityManager(conf) val rpcEnv = RpcEnv.create(systemName, host, port, conf, securityMgr) val masterAddresses = masterUrls.map(RpcAddress.fromSparkURL(_)) rpcEnv.setupEndpoint(ENDPOINT_NAME, new Worker(rpcEnv, webUiPort, cores, memory, masterAddresses, ENDPOINT_NAME, workDir, conf, securityMgr)) rpcEnv }

3.3.创建worker对象,执行onstart方法。

(1).onstart函数做了如下的事情:

在SPAKR_HOME目录下创建work目录

启动外部shuffle服务

启动workerweb UI

注册到master上,根据 val masterEndpoint = rpcEnv.setupEndpointRef(masterAddress, Master.ENDPOINT_NAME)获取masterRef。

启动MetricsSystem

3.4.接收到master注册完成的消息RegisteredWorker之后调用handleRegisterResponse方法

private def handleRegisterResponse(msg: RegisterWorkerResponse): Unit = synchronized { msg match { case RegisteredWorker(masterRef, masterWebUiUrl, masterAddress) => if (preferConfiguredMasterAddress) { logInfo("Successfully registered with master " + masterAddress.toSparkURL) } else { logInfo("Successfully registered with master " + masterRef.address.toSparkURL) } registered = true //设置masterRef changeMaster(masterRef, masterWebUiUrl, masterAddress) //定时发送heartbeat forwordMessageScheduler.scheduleAtFixedRate(new Runnable { override def run(): Unit = Utils.tryLogNonFatalError { self.send(SendHeartbeat) } }, 0, HEARTBEAT_MILLIS, TimeUnit.MILLISECONDS) //是否清理workdir if (CLEANUP_ENABLED) { logInfo( s"Worker cleanup enabled; old application directories will be deleted in: $workDir") forwordMessageScheduler.scheduleAtFixedRate(new Runnable { override def run(): Unit = Utils.tryLogNonFatalError { self.send(WorkDirCleanup) } }, CLEANUP_INTERVAL_MILLIS, CLEANUP_INTERVAL_MILLIS, TimeUnit.MILLISECONDS) }  val execs = executors.values.map { e => new ExecutorDescription(e.appId, e.execId, e.cores, e.state) } masterRef.send(WorkerLatestState(workerId, execs.toList, drivers.keys.toSeq))  case RegisterWorkerFailed(message) => if (!registered) { logError("Worker registration failed: " + message) System.exit(1) }  case MasterInStandby => // Ignore. Master not yet ready. } }

4.总结

最好是跟着上面的讲接把源码理一遍,就清晰很多了, 接下来会分析driver的启动。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值