}
}
(2)ApplicationClientEndpoint接收到Master发送RegisteredApplication消息,需要把注册表示registered改为true,Master注册线程获取状态变化后,完成注册Application。
override def receive: PartialFunction[Any, Unit] = {
//Master注册线程获取状态变化后,完成注册Application进程
case RegisteredApplication(appId_, masterRef) =>
// FIXME How to handle the following cases?
// 1. A master receives multiple registrations and sends back multiple
// RegisteredApplications due to an unstable network.
// 2. Receive multiple RegisteredApplication from different masters because the master is
// changing.
appId.set(appId_)
registered.set(true)
master = Some(masterRef)
listener.connected(appId.get)
…
}
(3)在Master类的startExecutorOnWorker方法中分配资源运行应用程序时,调用allocationWorkerResourceToExecutor方法实现Worker启动Executor。
override def receive: PartialFunction[Any, Unit] = synchronized {
…
case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>
…
//创建Executor执行目录
val executorDir = new File(workDir, appId + “/” + execId)
if (!executorDir.mkdirs()) {
throw new IOException("Failed to create directory " + executorDir)
}
//通过SPARK_EXECUTOR_DIRS环境变量,在Worker中创建Executor中创建Executor执行目录,
//当程序执行完后由Worker进行删除
val appLocalDirs = appDirectories.getOrElse(appId, {
val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)
val dirs = localRootDirs.flatMap { dir =>
try {
val appDir = Utils.createDirectory(dir, namePrefix = “executor”)
Utils.chmod700(appDir)
Some(appDir.getAbsolutePath())
} catch {
case e: IOException =>
logWarning(s"${e.getMessage}. Ignoring this directory.")
None
}
}.toSeq
if (dirs.isEmpty) {
throw new IOException("No subfolder can be created in " +
s"${localRootDirs.mkString(“,”)}.")
}
dirs
})
appDirectories(appId) = appLocalDirs
//在ExecutorRunner中创建CoarseGrainedExecutorBackend对象,创建的是使用应用信息中的
//command,而command在SparkDeploySchedulerBackend的start方法中构建
val manager = new ExecutorRunner(
appId,
execId,
appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),
cores_,
memory_,
self,
workerId,
host,
webUi.boundPort,
publicAddress,
sparkHome,
executorDir,
workerUri,
conf,
appLocalDirs, ExecutorState.RUNNING)
executors(appId + “/” + execId) = manager
manager.start()
coresUsed += cores_
memoryUsed += memory_
//向Master发送消息,表示Executor状态已经被更改ExecutorState.RUNNING
sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))
} catch {
case e: Exception =>
logError(s"Failed to launch executor a p p I d / appId/ appId/execId for ${appDesc.name}.", e)
if (executors.contains(appId + “/” + execId)) {
executors(appId + “/” + execId).kill()
executors -= appId + “/” + execId
}
sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,
Some(e.toString), None))
}
}
…
}
在Executor创建中调用了fetchAndRunExecutor方法进行实现。
private def fetchAndRunExecutor() {
try {
// Launch the process
val subsOpts = appDesc.command.javaOpts.map {
Utils.substituteAppNExecIds(_, appId, execId.toString)
}
val subsCommand = appDesc.command.copy(javaOpts = subsOpts)
//通过应用程序的信息和环境配置创建构造器builder
val builder = CommandUtils.buildProcessBuilder(subsCommand, new SecurityManager(conf),
memory, sparkHome.getAbsolutePath, substituteVariables)
val command = builder.command()
val formattedCommand = command.asScala.mkString(“”“, “” “”, “””)
logInfo(s"Launch command: $formattedCommand")
//在构造器builder中添加执行目录信息
builder.directory(executorDir)
builder.environment.put(“SPARK_EXECUTOR_DIRS”, appLocalDirs.mkString(File.pathSeparator))
// In case we are running this from within the Spark Shell, avoid creating a “scala”
// parent process for the executor command
builder.environment.put(“SPARK_LAUNCH_WITH_SCALA”, “0”)
// Add webUI log urls
//在构造器builder中添加监控页面输入日志地址信息
val baseUrl =
if (conf.getBoolean(“spark.ui.reverseProxy”, false)) {
s"/proxy/ w o r k e r I d / l o g P a g e / ? a p p I d = workerId/logPage/?appId= workerId/logPage/?appId=appId&executorId=$execId&logType="
} else {
s"http:// p u b l i c A d d r e s s : publicAddress: publicAddress:webUiPort/logPage/?appId=KaTeX parse error: Expected 'EOF', got '&' at position 6: appId&̲executorId=execId&logType="
}
builder.environment.put(“SPARK_LOG_URL_STDERR”, s"${baseUrl}stderr")
builder.environment.put(“SPARK_LOG_URL_STDOUT”, s"${baseUrl}stdout")
//启动构造器,创建CoarseGrainedExecutorBackend实例
process = builder.start()
val header = “Spark Executor Command: %s\n%s\n\n”.format(
formattedCommand, “=” * 40)
// Redirect its stdout and stderr to files
//输出创建CoarseGrainedExecutorBackend实例运行信息
val stdout = new File(executorDir, “stdout”)
stdoutAppender = FileAppender(process.getInputStream, stdout, conf)
val stderr = new File(executorDir, “stderr”)
Files.write(header, stderr, StandardCharsets.UTF_8)
stderrAppender = FileAppender(process.getErrorStream, stderr, conf)
// Wait for it to exit; executor may exit with code 0 (when driver instructs it to shutdown)
// or with nonzero exit code
//等待CoarseGrainedExecutorBackend运行结束,当结束时,向Worker发送退出状态信息
val exitCode = process.waitFor()
state = ExecutorState.EXITED
val message = "Command exited with code " + exitCode
worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))
} catch {
case interrupted: InterruptedException =>
logInfo(“Runner thread for executor " + fullId + " interrupted”)
state = ExecutorState.KILLED
killProcess(None)
case e: Exception =>
logError(“Error running executor”, e)
state = ExecutorState.FAILED
killProcess(Some(e.toString))
}
}
}
(4)Mater接收到Worker发送的ExecutorStateChanged消息
override def receive: PartialFunction[Any, Unit] = {
…
case ExecutorStateChanged(appId, execId, state, message, exitStatus) =>
val execOption = idToApp.get(appId).flatMap(app => app.executors.get(execId))
execOption match {
case Some(exec) =>
val appInfo = idToApp(appId)
val oldState = exec.state
exec.state = state
if (state == ExecutorState.RUNNING) {
assert(oldState == ExecutorState.LAUNCHING,
s"executor $execId state transfer from $oldState to RUNNING is illegal")
appInfo.resetRetryCount()
}
//向Driver发送ExecutorUpdated消息
exec.application.driver.send(ExecutorUpdated(execId, state, message, exitStatus, false))
if (ExecutorState.isFinished(state)) {
// Remove this executor from the worker and app
logInfo(s"Removing executor ${exec.fullId} because it is $state")
// If an application has already finished, preserve its
// state to display its information properly on the UI
if (!appInfo.isFinished) {
appInfo.removeExecutor(exec)
}
exec.worker.removeExecutor(exec)
val normalExit = exitStatus == Some(0)
// Only retry certain number of times so we don’t go into an infinite loop.
// Important note: this code path is not exercised by tests, so be very careful when
// changing this if
condition.
if (!normalExit
&& appInfo.incrementRetryCount() >= MAX_EXECUTOR_RETRIES
&& MAX_EXECUTOR_RETRIES >= 0) { // < 0 disables this application-killing path
val execs = appInfo.executors.values
if (!execs.exists(_.state == ExecutorState.RUNNING)) {
logError(s"Application ${appInfo.desc.name} with ID ${appInfo.id} failed " +
s"${appInfo.retryCount} times; removing it")
removeApplication(appInfo, ApplicationState.FAILED)
}
}
}
schedule()
case None =>
logWarning(s"Got status update for unknown executor a p p I d / appId/ appId/execId")
}
…
}
(5)在DriverEndpoint终端点进行注册Executor。(在步骤(3)CoarseGrainedExecutorBackend启动方法Onstart中,会发送注册Executor消息给RegisterExecutor给DriverEndpoint)
override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {
case RegisterExecutor(executorId, executorRef, hostname, cores, logUrls) =>
if (executorDataMap.contains(executorId)) {
executorRef.send(RegisterExecutorFailed("Duplicate executor ID: " + executorId))
context.reply(true)
}
…
//记录executor的编号,以及该executor使用的核数
addressToExecutorId(executorAddress) = executorId
totalCoreCount.addAndGet(cores)
totalRegisteredExecutors.addAndGet(1)
val data = new ExecutorData(executorRef, executorAddress, hostname,
cores, cores, logUrls)
// This must be synchronized because variables mutated
// in this block are read when requesting executors
//创建executor编号和其具体信息的键值列表
CoarseGrainedSchedulerBackend.this.synchronized {
executorDataMap.put(executorId, data)
if (currentExecutorIdCounter < executorId.toInt) {
currentExecutorIdCounter = executorId.toInt
}
if (numPendingExecutors > 0) {
numPendingExecutors -= 1
logDebug(s"Decremented number of pending executors ($numPendingExecutors left)")
}
}
//回复executor完成注册消息
executorRef.send(RegisteredExecutor)
// Note: some tests expect the reply to come after we put the executor in the map
context.reply(true)
listenerBus.post(
SparkListenerExecutorAdded(System.currentTimeMillis(), executorId, data))
//分配运行任务资源并发送LaunchTask消息执行任务
makeOffers()
}
…
}
(6)当CoarseGrainedExecutorBackend接收到Executor注册成功的RegisteredExecutor消息时,在CoarseGrainedExecutorBackend容器中实例化Executor对象。
override def receive: PartialFunction[Any, Unit] = {
case RegisteredExecutor =>
logInfo(“Successfully registered with driver”)
try {
//根据环境变量的参数,启动Executor,在Spark中,它是真正任务的执行者
executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
} catch {
case NonFatal(e) =>
exitExecutor(1, "Unable to create executor due to " + e.getMessage, e)
}
…
}
实例化的Executor对象会定时向Driver发送心跳信息,等待Driver下发任务。
private val heartbeater = ThreadUtils.newDaemonSingleThreadScheduledExecutor(“driver-heartbeater”)
/
private def startDriverHeartbeater(): Unit = {
//设置间隔时间
val intervalMs = HEARTBEAT_INTERVAL_MS
// Wait a random interval so the heartbeats don’t end up in sync
//等待随机时间间隔,这样心跳不会在同步中结束
val initialDelay = intervalMs + (math.random * intervalMs).asInstanceOf[Int]
val heartbeatTask = new Runnable() {
override def run(): Unit = Utils.logUncaughtExceptions(reportHeartBeat())
}
//发送心跳信息给Driver
heartbeater.scheduleAtFixedRate(heartbeatTask, initialDelay, intervalMs, TimeUnit.MILLISECONDS)
}
}
(7)CoarseGrainedExecutorBackend的Executor启动后,接收到从DriverEndpoint终端点发送的LaunchTask执行任务消息,任务执行是在Executor的launchTask方法实现的。
override def receive: PartialFunction[Any, Unit] = {
…
case LaunchTask(data) =>
if (executor == null) {
//当Executor没有成功启动时,输出异常日志并关闭Executor
exitExecutor(1, “Received LaunchTask command but executor was null”)
} else {
val taskDesc = TaskDescription.decode(data.value)
logInfo("Got assigned task " + taskDesc.taskId)
//启动TaskRunner进程执行任务
executor.launchTask(this, taskDesc)
}
…
}
调用executor的launchTask方法,在该方法中创建TaskRunner进程,然后把该进程加入到threadPool中,由Executor统一调度。
def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {
val tr = new TaskRunner(context, taskDescription)
runningTasks.put(taskDescription.taskId, tr)
threadPool.execute(tr)
}
(8)在TaskRunner执行任务完成时,会由向DriverEndpoint终端点发送状态变更StatusUpdate消息。
override def receive: PartialFunction[Any, Unit] = {
case StatusUpdate(executorId, taskId, state, data) =>
//调用TaskSchedulerImpl的statusUpdate方法,根据任务执行不同结果继续处理
最后
对于很多Java工程师而言,想要提升技能,往往是自己摸索成长,不成体系的学习效果低效漫长且无助。
整理的这些资料希望对Java开发的朋友们有所参考以及少走弯路,本文的重点是你有没有收获与成长,其余的都不重要,希望读者们能谨记这一点。
再分享一波我的Java面试真题+视频学习详解+技能进阶书籍
TaskRunner进程,然后把该进程加入到threadPool中,由Executor统一调度。
def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {
val tr = new TaskRunner(context, taskDescription)
runningTasks.put(taskDescription.taskId, tr)
threadPool.execute(tr)
}
(8)在TaskRunner执行任务完成时,会由向DriverEndpoint终端点发送状态变更StatusUpdate消息。
override def receive: PartialFunction[Any, Unit] = {
case StatusUpdate(executorId, taskId, state, data) =>
//调用TaskSchedulerImpl的statusUpdate方法,根据任务执行不同结果继续处理
最后
对于很多Java工程师而言,想要提升技能,往往是自己摸索成长,不成体系的学习效果低效漫长且无助。
整理的这些资料希望对Java开发的朋友们有所参考以及少走弯路,本文的重点是你有没有收获与成长,其余的都不重要,希望读者们能谨记这一点。
再分享一波我的Java面试真题+视频学习详解+技能进阶书籍
[外链图片转存中…(img-7uglKmsW-1714447098339)]