Spark源码阅读02-Spark核心原理之消息通信原理(1)

2301_79055814

已于 2024-04-30 11:18:36 修改

阅读量730

点赞数 18

分类专栏：程序员文章标签： spark javascript 前端

于 2024-04-30 11:18:34 首次发布

本文链接：https://blog.csdn.net/2301_79055814/article/details/138339751

版权

程序员专栏收录该内容

191 篇文章 0 订阅

订阅专栏

}

（2）ApplicationClientEndpoint接收到Master发送RegisteredApplication消息，需要把注册表示registered改为true，Master注册线程获取状态变化后，完成注册Application。

override def receive: PartialFunction[Any, Unit] = {

//Master注册线程获取状态变化后，完成注册Application进程

case RegisteredApplication(appId_, masterRef) =>

// FIXME How to handle the following cases?

// 1. A master receives multiple registrations and sends back multiple

// RegisteredApplications due to an unstable network.

// 2. Receive multiple RegisteredApplication from different masters because the master is

// changing.

appId.set(appId_)

registered.set(true)

master = Some(masterRef)

listener.connected(appId.get)

…

}

(3)在Master类的startExecutorOnWorker方法中分配资源运行应用程序时，调用allocationWorkerResourceToExecutor方法实现Worker启动Executor。

override def receive: PartialFunction[Any, Unit] = synchronized {

…

case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>

…

//创建Executor执行目录

val executorDir = new File(workDir, appId + “/” + execId)

if (!executorDir.mkdirs()) {

throw new IOException("Failed to create directory " + executorDir)

}

//通过SPARK_EXECUTOR_DIRS环境变量，在Worker中创建Executor中创建Executor执行目录，

//当程序执行完后由Worker进行删除

val appLocalDirs = appDirectories.getOrElse(appId, {

val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)

val dirs = localRootDirs.flatMap { dir =>

try {

val appDir = Utils.createDirectory(dir, namePrefix = “executor”)

Utils.chmod700(appDir)

Some(appDir.getAbsolutePath())

} catch {

case e: IOException =>

logWarning(s"${e.getMessage}. Ignoring this directory.")

None

}

}.toSeq

if (dirs.isEmpty) {

throw new IOException("No subfolder can be created in " +

s"${localRootDirs.mkString(“,”)}.")

}

dirs

})

appDirectories(appId) = appLocalDirs

//在ExecutorRunner中创建CoarseGrainedExecutorBackend对象，创建的是使用应用信息中的

//command，而command在SparkDeploySchedulerBackend的start方法中构建

val manager = new ExecutorRunner(

appId,

execId,

appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),

cores_,

memory_,

self,

workerId,

host,

webUi.boundPort,

publicAddress,

sparkHome,

executorDir,

workerUri,

conf,

appLocalDirs, ExecutorState.RUNNING)

executors(appId + “/” + execId) = manager

manager.start()

coresUsed += cores_

memoryUsed += memory_

//向Master发送消息，表示Executor状态已经被更改ExecutorState.RUNNING

sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))

} catch {

case e: Exception =>

logError(s"Failed to launch executor $a pp I d /$ execId for ${appDesc.name}.", e)

if (executors.contains(appId + “/” + execId)) {

executors(appId + “/” + execId).kill()

executors -= appId + “/” + execId

}

sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,

Some(e.toString), None))

}

…

}

在Executor创建中调用了fetchAndRunExecutor方法进行实现。

private def fetchAndRunExecutor() {

try {

// Launch the process

val subsOpts = appDesc.command.javaOpts.map {

Utils.substituteAppNExecIds(_, appId, execId.toString)

}

val subsCommand = appDesc.command.copy(javaOpts = subsOpts)

//通过应用程序的信息和环境配置创建构造器builder

val builder = CommandUtils.buildProcessBuilder(subsCommand, new SecurityManager(conf),

memory, sparkHome.getAbsolutePath, substituteVariables)

val command = builder.command()

val formattedCommand = command.asScala.mkString(“”“, “” “”, “””)

logInfo(s"Launch command: $formattedCommand")

//在构造器builder中添加执行目录信息

builder.directory(executorDir)

builder.environment.put(“SPARK_EXECUTOR_DIRS”, appLocalDirs.mkString(File.pathSeparator))

// In case we are running this from within the Spark Shell, avoid creating a “scala”

// parent process for the executor command

builder.environment.put(“SPARK_LAUNCH_WITH_SCALA”, “0”)

// Add webUI log urls

//在构造器builder中添加监控页面输入日志地址信息

val baseUrl =

if (conf.getBoolean(“spark.ui.reverseProxy”, false)) {

s"/proxy/ $w or k er I d / l o g P a g e / ? a pp I d =$ appId&executorId=$execId&logType="

} else {

s"http:// $p u b l i c A dd ress :$ webUiPort/logPage/?appId= $KaTeX parse error: Expected 'EOF', got '&' at position 6: appId&̲executorId=$ execId&logType="

}

builder.environment.put(“SPARK_LOG_URL_STDERR”, s"${baseUrl}stderr")

builder.environment.put(“SPARK_LOG_URL_STDOUT”, s"${baseUrl}stdout")

//启动构造器，创建CoarseGrainedExecutorBackend实例

process = builder.start()

val header = “Spark Executor Command: %s\n%s\n\n”.format(

formattedCommand, “=” * 40)

// Redirect its stdout and stderr to files

//输出创建CoarseGrainedExecutorBackend实例运行信息

val stdout = new File(executorDir, “stdout”)

stdoutAppender = FileAppender(process.getInputStream, stdout, conf)

val stderr = new File(executorDir, “stderr”)

Files.write(header, stderr, StandardCharsets.UTF_8)

stderrAppender = FileAppender(process.getErrorStream, stderr, conf)

// Wait for it to exit; executor may exit with code 0 (when driver instructs it to shutdown)

// or with nonzero exit code

//等待CoarseGrainedExecutorBackend运行结束，当结束时，向Worker发送退出状态信息

val exitCode = process.waitFor()

state = ExecutorState.EXITED

val message = "Command exited with code " + exitCode

worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))

} catch {

case interrupted: InterruptedException =>

logInfo(“Runner thread for executor " + fullId + " interrupted”)

state = ExecutorState.KILLED

killProcess(None)

case e: Exception =>

logError(“Error running executor”, e)

state = ExecutorState.FAILED

killProcess(Some(e.toString))

}

（4）Mater接收到Worker发送的ExecutorStateChanged消息

override def receive: PartialFunction[Any, Unit] = {

…

case ExecutorStateChanged(appId, execId, state, message, exitStatus) =>

val execOption = idToApp.get(appId).flatMap(app => app.executors.get(execId))

execOption match {

case Some(exec) =>

val appInfo = idToApp(appId)

val oldState = exec.state

exec.state = state

if (state == ExecutorState.RUNNING) {

assert(oldState == ExecutorState.LAUNCHING,

s"executor $execId state transfer from $oldState to RUNNING is illegal")

appInfo.resetRetryCount()

}

//向Driver发送ExecutorUpdated消息

exec.application.driver.send(ExecutorUpdated(execId, state, message, exitStatus, false))

if (ExecutorState.isFinished(state)) {

// Remove this executor from the worker and app

logInfo(s"Removing executor ${exec.fullId} because it is $state")

// If an application has already finished, preserve its

// state to display its information properly on the UI

if (!appInfo.isFinished) {

appInfo.removeExecutor(exec)

}

exec.worker.removeExecutor(exec)

val normalExit = exitStatus == Some(0)

// Only retry certain number of times so we don’t go into an infinite loop.

// Important note: this code path is not exercised by tests, so be very careful when

// changing this if condition.

if (!normalExit

&& appInfo.incrementRetryCount() >= MAX_EXECUTOR_RETRIES

&& MAX_EXECUTOR_RETRIES >= 0) { // < 0 disables this application-killing path

val execs = appInfo.executors.values

if (!execs.exists(_.state == ExecutorState.RUNNING)) {

logError(s"Application ${appInfo.desc.name} with ID ${appInfo.id} failed " +

s"${appInfo.retryCount} times; removing it")

removeApplication(appInfo, ApplicationState.FAILED)

}

schedule()

case None =>

logWarning(s"Got status update for unknown executor $a pp I d /$ execId")

}

…

}

(5)在DriverEndpoint终端点进行注册Executor。（在步骤（3）CoarseGrainedExecutorBackend启动方法Onstart中，会发送注册Executor消息给RegisterExecutor给DriverEndpoint）

override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {

case RegisterExecutor(executorId, executorRef, hostname, cores, logUrls) =>

if (executorDataMap.contains(executorId)) {

executorRef.send(RegisterExecutorFailed("Duplicate executor ID: " + executorId))

context.reply(true)

}

…

//记录executor的编号，以及该executor使用的核数

addressToExecutorId(executorAddress) = executorId

totalCoreCount.addAndGet(cores)

totalRegisteredExecutors.addAndGet(1)

val data = new ExecutorData(executorRef, executorAddress, hostname,

cores, cores, logUrls)

// This must be synchronized because variables mutated

// in this block are read when requesting executors

//创建executor编号和其具体信息的键值列表

CoarseGrainedSchedulerBackend.this.synchronized {

executorDataMap.put(executorId, data)

if (currentExecutorIdCounter < executorId.toInt) {

currentExecutorIdCounter = executorId.toInt

}

if (numPendingExecutors > 0) {

numPendingExecutors -= 1

logDebug(s"Decremented number of pending executors ($numPendingExecutors left)")

}

//回复executor完成注册消息

executorRef.send(RegisteredExecutor)

// Note: some tests expect the reply to come after we put the executor in the map

context.reply(true)

listenerBus.post(

SparkListenerExecutorAdded(System.currentTimeMillis(), executorId, data))

//分配运行任务资源并发送LaunchTask消息执行任务

makeOffers()

}

…

}

（6）当CoarseGrainedExecutorBackend接收到Executor注册成功的RegisteredExecutor消息时，在CoarseGrainedExecutorBackend容器中实例化Executor对象。

override def receive: PartialFunction[Any, Unit] = {

case RegisteredExecutor =>

logInfo(“Successfully registered with driver”)

try {

//根据环境变量的参数，启动Executor，在Spark中，它是真正任务的执行者

executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)

} catch {

case NonFatal(e) =>

exitExecutor(1, "Unable to create executor due to " + e.getMessage, e)

}

…

}

实例化的Executor对象会定时向Driver发送心跳信息，等待Driver下发任务。

private val heartbeater = ThreadUtils.newDaemonSingleThreadScheduledExecutor(“driver-heartbeater”)

private def startDriverHeartbeater(): Unit = {

//设置间隔时间

val intervalMs = HEARTBEAT_INTERVAL_MS

// Wait a random interval so the heartbeats don’t end up in sync

//等待随机时间间隔，这样心跳不会在同步中结束

val initialDelay = intervalMs + (math.random * intervalMs).asInstanceOf[Int]

val heartbeatTask = new Runnable() {

override def run(): Unit = Utils.logUncaughtExceptions(reportHeartBeat())

}

//发送心跳信息给Driver

heartbeater.scheduleAtFixedRate(heartbeatTask, initialDelay, intervalMs, TimeUnit.MILLISECONDS)

}

（7）CoarseGrainedExecutorBackend的Executor启动后，接收到从DriverEndpoint终端点发送的LaunchTask执行任务消息，任务执行是在Executor的launchTask方法实现的。

override def receive: PartialFunction[Any, Unit] = {

…

case LaunchTask(data) =>

if (executor == null) {

//当Executor没有成功启动时，输出异常日志并关闭Executor

exitExecutor(1, “Received LaunchTask command but executor was null”)

} else {

val taskDesc = TaskDescription.decode(data.value)

logInfo("Got assigned task " + taskDesc.taskId)

//启动TaskRunner进程执行任务

executor.launchTask(this, taskDesc)

}

…

}

调用executor的launchTask方法，在该方法中创建TaskRunner进程，然后把该进程加入到threadPool中，由Executor统一调度。

def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {

val tr = new TaskRunner(context, taskDescription)

runningTasks.put(taskDescription.taskId, tr)

threadPool.execute(tr)

}

（8）在TaskRunner执行任务完成时，会由向DriverEndpoint终端点发送状态变更StatusUpdate消息。

override def receive: PartialFunction[Any, Unit] = {

case StatusUpdate(executorId, taskId, state, data) =>

//调用TaskSchedulerImpl的statusUpdate方法，根据任务执行不同结果继续处理

最后

对于很多Java工程师而言，想要提升技能，往往是自己摸索成长，不成体系的学习效果低效漫长且无助。

整理的这些资料希望对Java开发的朋友们有所参考以及少走弯路，本文的重点是你有没有收获与成长，其余的都不重要，希望读者们能谨记这一点。

再分享一波我的Java面试真题+视频学习详解+技能进阶书籍

美团二面惜败，我的凉经复盘（附学习笔记+面试整理+进阶书籍）

TaskRunner进程，然后把该进程加入到threadPool中，由Executor统一调度。

def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {

val tr = new TaskRunner(context, taskDescription)

runningTasks.put(taskDescription.taskId, tr)

threadPool.execute(tr)

}

（8）在TaskRunner执行任务完成时，会由向DriverEndpoint终端点发送状态变更StatusUpdate消息。

override def receive: PartialFunction[Any, Unit] = {

case StatusUpdate(executorId, taskId, state, data) =>

//调用TaskSchedulerImpl的statusUpdate方法，根据任务执行不同结果继续处理

最后

对于很多Java工程师而言，想要提升技能，往往是自己摸索成长，不成体系的学习效果低效漫长且无助。

整理的这些资料希望对Java开发的朋友们有所参考以及少走弯路，本文的重点是你有没有收获与成长，其余的都不重要，希望读者们能谨记这一点。

再分享一波我的Java面试真题+视频学习详解+技能进阶书籍

[外链图片转存中…(img-7uglKmsW-1714447098339)]

本文已被CODING开源项目：【一线大厂Java面试题解析+核心总结学习笔记+最新讲解视频+实战项目源码】收录

2301_79055814

关注

18
点赞
踩
8

收藏

觉得还不错? 一键收藏
1
评论
Spark源码阅读02-Spark核心原理之消息通信原理(1)

对于很多Java工程师而言，想要提升技能，往往是自己摸索成长，不成体系的学习效果低效漫长且无助。整理的这些资料希望对Java开发的朋友们有所参考以及少走弯路，本文的重点是你有没有收获与成长，其余的都不重要，希望读者们能谨记这一点。再分享一波我的Java面试真题+视频学习详解+技能进阶书籍TaskRunner进程，然后把该进程加入到threadPool中，由Executor统一调度。
复制链接

扫一扫