Spark-Core源码学习记录
该系列作为Spark源码回顾学习的记录,旨在捋清Spark分发程序运行的机制和流程,对部分关键源码进行追踪,争取做到知其所以然,对枝节部分源码仅进行文字说明,不深入下钻,避免混淆主干内容。
本文承接上文,我们分别进入org.apache.spark.deploy.master.Master
和org.apache.spark.deploy.worker.Worker
中查看启动流程。
org.apache.spark.deploy.master.Master 主类
private[deploy] object Master extends Logging {
val SYSTEM_NAME = "sparkMaster"
val ENDPOINT_NAME = "Master"
def main(argStrings: Array[String]) {
Thread.setDefaultUncaughtExceptionHandler(new SparkUncaughtExceptionHandler(
exitOnUncaughtException = false))
Utils.initDaemon(log)
// Load any spark.* system properties
// 实例化SparkConf,其中会加载spark.*格式的配置信息
val conf = new SparkConf
// MasterArguments对传参argStrings和conf进行封装,执行一些初始化操作,主要是加载配置信息
val args = new MasterArguments(argStrings, conf)
val (rpcEnv, _, _) = startRpcEnvAndEndpoint(args.host, args.port, args.webUiPort, conf)
// 其实是 dispatcher中的线程池在等待所有线程退出,dispatcher在Rpc篇幅中介绍
rpcEnv.awaitTermination()
}
/**
* Start the Master and return a three tuple of:
* (1) The Master RpcEnv
* (2) The web UI bound port
* (3) The REST server bound port, if any
*/
def startRpcEnvAndEndpoint(host: String,port: Int,conf: SparkConf):(RpcEnv, Int, Option[Int]) = {
// SecurityManager负责安全,此时无需关注
val securityMgr = new SecurityManager(conf)
// 引入RpcEnv
val rpcEnv = RpcEnv.create(SYSTEM_NAME, host, port, conf, securityMgr)
val masterEndpoint = rpcEnv.setupEndpoint(ENDPOINT_NAME,
new Master(rpcEnv, rpcEnv.address, webUiPort, securityMgr, conf))
val portsResponse = masterEndpoint.askSync[BoundPortsResponse](BoundPortsRequest)
(rpcEnv, portsResponse.webUIPort, portsResponse.restPort)
}
}
startRpcEnvAndEndpoint
方法中,首次出现RpcEnv,此为Spark中负责通信的关键,因此在附篇中展开,跳转阅读:附篇抽象RpcEnv介绍,再阅读了附篇后,我们继续往下进行。
RpcEnv.create、rpcEnv.setupEndpoint
和asterEndpoint.askSync
都在Rpc篇幅中介绍过了,我们现在重点去关注new Master实例化的部分源码。
private[deploy] class Master(
override val rpcEnv: RpcEnv,
address: RpcAddress,
webUiPort: Int,
val securityMgr: SecurityManager,
val conf: SparkConf)
extends ThreadSafeRpcEndpoint with Logging with LeaderElectable {
// 获取hadoop的相关配置
private val hadoopConf = SparkHadoopUtil.get.newConfiguration(conf)
// hashset 用于记录注册的 worker,后续源码中会介绍 worker如何向 master注册
val workers = new HashSet[WorkerInfo]
// 记录 ApplicationInfo
val apps = new HashSet[ApplicationInfo]
// 记录 DriverInfo
private val drivers = new HashSet[DriverInfo]
//当前 Master的状态:STANDBY, ALIVE, RECOVERING, COMPLETING_RECOVERY
private var state = RecoveryState.STANDBY
// 覆写的 onStart方法
override def onStart(): Unit = {...}
}
Master的实例化过程初始化了很多容器,用来承载worker、driver及app等之间的关联信息,下面重点是去onStart方法里一探究竟:
override def onStart(): Unit = {
// 这里省略一些初始化 wedui和绑定端口的代码
//用于移除心跳超时多次的 worker,scheduleAtFixedRate线程会按指定时间间隔重复运行
checkForWorkerTimeOutTask = forwardMessageThread.scheduleAtFixedRate(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
self.send(CheckForWorkerTimeOut)
}
}, 0, workerTimeoutMs, TimeUnit.MILLISECONDS)
if (restServerEnabled) {
val port = conf.get(MASTER_REST_SERVER_PORT)
// restServer用于之后接受应用的提交
restServer = Some(new StandaloneRestServer(address.host, port, conf, self, masterUrl))
}
restServerBoundPort = restServer.map(_.start())
// 实例化一个序列器
val serializer = new JavaSerializer(conf)
// recoveryMode 恢复模式,也叫持久化机制
val (persistenceEngine_, leaderElectionAgent_) = recoveryMode match {
case "ZOOKEEPER" =>
...
case "FILESYSTEM" =>
...
case "CUSTOM" =>
...
case _ =>
...
}
persistenceEngine = persistenceEngine_
leaderElectionAgent = leaderElectionAgent_
}
简单总结一下Master的启动
- 初始化NettyRpcEnv,内部实例化了分配器Dispatcher等重要内容
- 实例化Master对象,初始化了很多容器,记录之后注册进来的组件之间的关系
- 将Master对象注册到NettyRpcEnv中
- 给自身发送一个BoundPortsRequest消息
下面回到Worker的启动上来,看看与Master的区别:
private[deploy] object Worker extends Logging {
val SYSTEM_NAME = "sparkWorker"
val ENDPOINT_NAME = "Worker"
private val SSL_NODE_LOCAL_CONFIG_PATTERN = """\-Dspark\.ssl\.useNodeLocalConf\=(.+)""".r
def main(argStrings: Array[String]) {
// 实例化 SparkConf,注意,此时已经在另一台主机上了,自然 jvm环境里还不存在 SparkConf
val conf = new SparkConf
val args = new WorkerArguments(argStrings, conf)
// 参数比Master中的要多一些
val rpcEnv = startRpcEnvAndEndpoint(args.host, args.port, args.webUiPort, args.cores,args.memory, args.masters, args.workDir, conf = conf)
// 用于 一台主机启动多个 worker的情况,暂时不用考虑
val externalShuffleServiceEnabled = conf.get(config.SHUFFLE_SERVICE_ENABLED)
val sparkWorkerInstances = scala.sys.env.getOrElse("SPARK_WORKER_INSTANCES", "1").toInt
require(...)
rpcEnv.awaitTermination()
}
可以看到与Master
的main方法类似,只不过startRpcEnvAndEndpoint
方法的参数比Master中的多了一些,继续往下看
def startRpcEnvAndEndpoint(...): RpcEnv = {
// The LocalSparkCluster runs multiple local sparkWorkerX RPC Environments
// 多个worker需要命名区别开
val systemName = SYSTEM_NAME + workerNumber.map(_.toString).getOrElse("")
val securityMgr = new SecurityManager(conf)
// 与 Master一样,最终实例化一个 NettyRpcEnv
val rpcEnv = RpcEnv.create(systemName, host, port, conf, securityMgr)
// 此处根据传入的 URL转化为 Master对应的 RpcAddress
val masterAddresses = masterUrls.map(RpcAddress.fromSparkURL(_))
// 与 Master类似,实例化一个Worker,同时向 rpcEnv注册,届时调用自身的onStart方法
rpcEnv.setupEndpoint(ENDPOINT_NAME, new Worker(rpcEnv, webUiPort, cores, memory,
masterAddresses, ENDPOINT_NAME, workDir, conf, securityMgr))
rpcEnv
}
其中setupEndpoint过程与Master
类似,实例化一个Worker
,同时向rpcEnv注册,届时调用自身的onStart方法,进入Worker
的实例化过程:
private[deploy] class Worker(...) extends ThreadSafeRpcEndpoint with Logging {
// 用于记录 master的引用
private var master: Option[RpcEndpointRef] = None
// 标志状态
private var registered = false
// 下面是初始化一些容器
var workDir: File = null
val finishedExecutors = new LinkedHashMap[String, ExecutorRunner]
val drivers = new HashMap[String, DriverRunner]
val executors = new HashMap[String, ExecutorRunner]
val finishedDrivers = new LinkedHashMap[String, DriverRunner]
val appDirectories = new HashMap[String, Seq[String]]
val finishedApps = new HashSet[String]
// 初始化一个线程池,大小为传入的masterurl个数,用于同时注册
private val registerMasterThreadPool = ThreadUtils.newDaemonCachedThreadPool(
"worker-register-master-threadpool",
masterRpcAddresses.length // Make sure we can register with all masters at the same time
)
override def onStart() {
...
}
}
向rpcEnv注册后调用自身的onStart方法:
override def onStart() {
registerWithMaster()
}
private def registerWithMaster() {
registered = false
// 调用 tryRegisterAllMasters方法去注册
registerMasterFutures = tryRegisterAllMasters()
registrationRetryTimer = Some(forwordMessageScheduler.scheduleAtFixedRate(
new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
Option(self).foreach(_.send(ReregisterWithMaster))
}
},INITIAL_REGISTRATION_RETRY_INTERVAL_SECONDS,INITIAL_REGISTRATION_RETRY_INTERVAL_SECONDS,TimeUnit.SECONDS))
}
进入tryRegisterAllMasters方法
private def tryRegisterAllMasters(): Array[JFuture[_]] = {
// 遍历 masterRpcAddresses
masterRpcAddresses.map { masterAddress =>
// 在上面开启的线程池中分别运行
registerMasterThreadPool.submit(new Runnable {
override def run(): Unit = {
try {
// 根据masterAddress和name获取对应master的引用
val masterEndpoint = rpcEnv.setupEndpointRef(masterAddress, Master.ENDPOINT_NAME)
//像 Master发送注册信息 ,实际是调用 masterEndpoint的 send方法
sendRegisterMessageToMaster(masterEndpoint)
} catch {}
}})}
}
private def sendRegisterMessageToMaster(masterEndpoint: RpcEndpointRef): Unit = {
masterEndpoint.send(RegisterWorker(workerId,host,port,self,cores,memory,workerWebUiUrl,masterEndpoint.address))
}
看一看setupEndpointRef的调用流程
def setupEndpointRef(address: RpcAddress, endpointName: String): RpcEndpointRef = {
// 调用下面方法
setupEndpointRefByURI(RpcEndpointAddress(address, endpointName).toString)
}
def setupEndpointRefByURI(uri: String): RpcEndpointRef = {
// 其实是调用 asyncSetupEndpointRefByURI方法
defaultLookupTimeout.awaitResult(asyncSetupEndpointRefByURI(uri))
}
def asyncSetupEndpointRefByURI(uri: String): Future[RpcEndpointRef] = {
val addr = RpcEndpointAddress(uri)
// 根据传入 addr实例化 NettyRpcEndpointRef
val endpointRef = new NettyRpcEndpointRef(conf, addr, this)
val verifier = new NettyRpcEndpointRef(
conf, RpcEndpointAddress(addr.rpcAddress, RpcEndpointVerifier.NAME), this)
/** A message used to ask the remote [[RpcEndpointVerifier]] if an `RpcEndpoint` exists. */
/*case class CheckExistence(name: String)*/
verifier.ask[Boolean](RpcEndpointVerifier.CheckExistence(endpointRef.name)).flatMap { find =>
if (find) {
Future.successful(endpointRef)
} else {
Future.failed(new RpcEndpointNotFoundException(uri))
}
}(ThreadUtils.sameThread)
}
回到masterEndpoint.send(RegisterWorker(...)
方法,这里顺便一提,RpcEndpointRef 的send方法,其实是将消息封装成RequestMessage对象,然后调用RpcEnv的send方法
private[netty] def send(message: RequestMessage): Unit = {
val remoteAddr = message.receiver.address
if (remoteAddr == address) { //是否为本地调用
// Message to a local RPC endpoint.
try {
// postOneWayMessage方法就是对 RequestMessage进行二次封装,最终通过 postMessage方法,将 message放入 EndpointData的inbox中等待线程池调用
dispatcher.postOneWayMessage(message)
} catch {...}
} else {
// Message to a remote RPC endpoint.远程调用
postToOutbox(message.receiver, OneWayOutboxMessage(message.serialize(this)))
}
}
所以绕了一圈,最终是远程的Master
调用本地的send,然后被自身的receive接收
override def receive: PartialFunction[Any, Unit] = {
// 注意此时传过来的一些重要参数,例如 workerRef cores memory
case RegisterWorker(id, workerHost, workerPort, workerRef, cores, memory, workerWebUiUrl, masterAddress) =>
// 此处有一些状态判断的代码,省略
val worker = new WorkerInfo(id, workerHost, workerPort, cores, memory,workerRef, workerWebUiUrl)
// registerWorker其实就是将 worker加入到Master初始化时生成那些记录关联关系的容器中,当然会先做一些状态判断
if (registerWorker(worker)) {
//持久化 worker的信息
persistenceEngine.addWorker(worker)
//像 worker发送 RegisteredWorker消息,携带了自身的引用
workerRef.send(RegisteredWorker(self, masterWebUiUrl, masterAddress))
// 重要方法,每当资源有变化时都会调用该方法,用于分配当前可用的资源,后面详解
schedule()
} else {...}
}
看看Worker
端receive收到RegisteredWorker会做什么操作:
override def receive: PartialFunction[Any, Unit] = synchronized {
case msg: RegisterWorkerResponse =>
handleRegisterResponse(msg)
}
private def handleRegisterResponse(msg: RegisterWorkerResponse): Unit = synchronized {
msg match {
case RegisteredWorker(masterRef, masterWebUiUrl, masterAddress) =>
// 更新标志位
registered = true
// 更新 master信息,同时调用 cancelLastRegistrationRetry取消正在重试的注册
changeMaster(masterRef, masterWebUiUrl, masterAddress)
// 定时发送心跳,看似调用 worker自身 send方法,经过 receive逻辑处理后其实是 masterRef.send(message)
forwordMessageScheduler.scheduleAtFixedRate(new Runnable {
override def run(): Unit = Utils.tryLogNonFatalError {
self.send(SendHeartbeat)
}}, 0, HEARTBEAT_MILLIS, TimeUnit.MILLISECONDS)
// 初始化 executors
val execs = executors.values.map { e =>
new ExecutorDescription(e.appId, e.execId, e.cores, e.state)
}
// 像 masterRef 发送 WorkerLatestState消息
masterRef.send(WorkerLatestState(workerId, execs.toList, drivers.keys.toSeq))
}
}
我们最后来看Master
中receive匹配到WorkerLatestState后的操作,就是核对Master
中的executors和drivers是否与Worker
匹配,不匹配的直接让Worker
杀掉。
case WorkerLatestState(workerId, executors, driverIds) =>
idToWorker.get(workerId) match {
case Some(worker) =>
for (exec <- executors) {
val executorMatches = worker.executors.exists {
case (_, e) => e.application.id == exec.appId && e.id == exec.execId
}
if (!executorMatches) {
// master doesn't recognize this executor. So just tell worker to kill it.
worker.endpoint.send(KillExecutor(masterUrl, exec.appId, exec.execId))
}
}
for (driverId <- driverIds) {
val driverMatches = worker.drivers.exists { case (id, _) => id == driverId }
if (!driverMatches) {
// master doesn't recognize this driver. So just tell worker to kill it.
worker.endpoint.send(KillDriver(driverId))
}
}
case None =>
logWarning("Worker state from unknown worker: " + workerId)
}
至此,Master与Worker启动完成,并完成了Worker想Master的注册,以及相互之间各种信息的绑定,可以通过记录的对方引用RpcEndpointRef来进行通信。余下一个schedule()
方法,后续再次提及时详述。
参考: