Spark源码阅读02-Spark核心原理之消息通信原理,2024年最新高级java面试题spring

先自我介绍一下,小编浙江大学毕业,去过华为、字节跳动等大厂,目前阿里P7

深知大多数程序员,想要提升技能,往往是自己摸索成长,但自己不成体系的自学效果低效又漫长,而且极易碰到天花板技术停滞不前!

因此收集整理了一份《2024年最新Java开发全套学习资料》,初衷也很简单,就是希望能够帮助到想自学提升又不知道该从何学起的朋友。
img
img
img
img
img
img

既有适合小白学习的零基础资料,也有适合3年以上经验的小伙伴深入学习提升的进阶课程,涵盖了95%以上Java开发知识点,真正体系化!

由于文件比较多,这里只是将部分目录截图出来,全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频,并且后续会持续更新

如果你需要这些资料,可以添加V获取:vip1024b (备注Java)
img

正文

(3)当Worker接受到注册后,会定时发送心跳信息Heartbeat给Master,使得Master能了解Worker的实时状态。

private def handleRegisterResponse(msg: RegisterWorkerResponse): Unit = synchronized {

msg match {

case RegisteredWorker(masterRef, masterWebUiUrl, masterAddress) =>

if (preferConfiguredMasterAddress) {

logInfo("Successfully registered with master " + masterAddress.toSparkURL)

} else {

logInfo("Successfully registered with master " + masterRef.address.toSparkURL)

}

//如果设置清理以前应用使用的文件夹,则进行该动作

if (CLEANUP_ENABLED) {

logInfo(

s"Worker cleanup enabled; old application directories will be deleted in: $workDir")

forwordMessageScheduler.scheduleAtFixedRate(new Runnable {

override def run(): Unit = Utils.tryLogNonFatalError {

self.send(WorkDirCleanup)

}

}, CLEANUP_INTERVAL_MILLIS, CLEANUP_INTERVAL_MILLIS, TimeUnit.MILLISECONDS)

}

//向Master汇报Worker中Executor最新状态

val execs = executors.values.map { e =>

new ExecutorDescription(e.appId, e.execId, e.cores, e.state)

}

masterRef.send(WorkerLatestState(workerId, execs.toList, drivers.keys.toSeq))

case RegisterWorkerFailed(message) =>

if (!registered) {

logError("Worker registration failed: " + message)

System.exit(1)

}

case MasterInStandby =>

// Ignore. Master not yet ready.

}

}

/

private[deploy] object DeployMessages {

case object SendHeartbeat

}

Spark运行时消息消息通信


Spark运行消息通信的交互过程如下图:

在这里插入图片描述

其详细过程及源代码如下:

(1)执行应用程序需要启动SparkContext,在SparkContext的启动过程中,会先实例化SchedulerBackend对象(上图中创建的是SparkDeploySchedulerBackend对象,因为是独立运行模式),在该对象的启动中会继承DriverEndpoint和创建Appclient的ClientEndpoint的两个终端点。

在ClientEndpoint的tryRegisterAllMasters方法中创建注册线程池registerMasterThreadPool,在该线程池中启动注册线程并向Master发送RegisterApplication注册应用的消息。

private def tryRegisterAllMasters(): Array[JFuture[_]] = {

//由于HA等环境有多个Master,需要遍历所有的Master发送消息

for (masterAddress <- masterRpcAddresses) yield {

//向线程池中启动注册线程,当该线程读到应用注册成功标志registered=ture时,退出注册线程

registerMasterThreadPool.submit(new Runnable {

override def run(): Unit = try {

if (registered.get) {

return

}

logInfo("Connecting to master " + masterAddress.toSparkURL + “…”)

//获取Master终端点的引用,发送注册应用的消息

val masterRef = rpcEnv.setupEndpointRef(masterAddress, Master.ENDPOINT_NAME)

masterRef.send(RegisterApplication(appDescription, self))

} catch {

case ie: InterruptedException => // Cancelled

case NonFatal(e) => logWarning(s"Failed to connect to master $masterAddress", e)

}

})

}

}

当Master接收到注册应用的消息时,在registerApplication方法中记录应用消息并把该消息加入到等待运行应用列表中,注册完毕发送RegisteredApplication给ClientEndpoint,同时调用startExecutorOnWorker方法运行应用,通知Worker启动Executor。

private def startExecutorsOnWorkers(): Unit = {

// Right now this is a very simple FIFO scheduler. We keep trying to fit in the first app

// in the queue, then the second app, etc.

//使用FIFO调度算法运行应用,先注册的应用先运行

for (app <- waitingApps) {

val coresPerExecutor = app.desc.coresPerExecutor.getOrElse(1)

// If the cores left is less than the coresPerExecutor,the cores left will not be allocated

if (app.coresLeft >= coresPerExecutor) {

// Filter out workers that don’t have enough resources to launch an executor

val usableWorkers = workers.toArray.filter(_.state == WorkerState.ALIVE)

.filter(worker => worker.memoryFree >= app.desc.memoryPerExecutorMB &&

worker.coresFree >= coresPerExecutor)

.sortBy(_.coresFree).reverse

//确定运行在哪些Worker上和每个Worker分配用于运行的核数,分配算法有两种,一种时把应用

//运行在尽可能多的Worker上,相反,另一种是运行在尽可能少的Worker上

val assignedCores = scheduleExecutorsOnWorkers(app, usableWorkers, spreadOutApps)

// Now that we’ve decided how many cores to allocate on each worker, let’s allocate them

//通知分配的Worker,启动Worker

for (pos <- 0 until usableWorkers.length if assignedCores(pos) > 0) {

allocateWorkerResourceToExecutors(

app, assignedCores(pos), app.desc.coresPerExecutor, usableWorkers(pos))

}

}

}

}

(2)ApplicationClientEndpoint接收到Master发送RegisteredApplication消息,需要把注册表示registered改为true,Master注册线程获取状态变化后,完成注册Application。

override def receive: PartialFunction[Any, Unit] = {

//Master注册线程获取状态变化后,完成注册Application进程

case RegisteredApplication(appId_, masterRef) =>

// FIXME How to handle the following cases?

// 1. A master receives multiple registrations and sends back multiple

// RegisteredApplications due to an unstable network.

// 2. Receive multiple RegisteredApplication from different masters because the master is

// changing.

appId.set(appId_)

registered.set(true)

master = Some(masterRef)

listener.connected(appId.get)

}

(3)在Master类的startExecutorOnWorker方法中分配资源运行应用程序时,调用allocationWorkerResourceToExecutor方法实现Worker启动Executor。

override def receive: PartialFunction[Any, Unit] = synchronized {

case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) =>

//创建Executor执行目录

val executorDir = new File(workDir, appId + “/” + execId)

if (!executorDir.mkdirs()) {

throw new IOException("Failed to create directory " + executorDir)

}

//通过SPARK_EXECUTOR_DIRS环境变量,在Worker中创建Executor中创建Executor执行目录,

//当程序执行完后由Worker进行删除

val appLocalDirs = appDirectories.getOrElse(appId, {

val localRootDirs = Utils.getOrCreateLocalRootDirs(conf)

val dirs = localRootDirs.flatMap { dir =>

try {

val appDir = Utils.createDirectory(dir, namePrefix = “executor”)

Utils.chmod700(appDir)

Some(appDir.getAbsolutePath())

} catch {

case e: IOException =>

logWarning(s"${e.getMessage}. Ignoring this directory.")

None

}

}.toSeq

if (dirs.isEmpty) {

throw new IOException("No subfolder can be created in " +

s"${localRootDirs.mkString(“,”)}.")

}

dirs

})

appDirectories(appId) = appLocalDirs

//在ExecutorRunner中创建CoarseGrainedExecutorBackend对象,创建的是使用应用信息中的

//command,而command在SparkDeploySchedulerBackend的start方法中构建

val manager = new ExecutorRunner(

appId,

execId,

appDesc.copy(command = Worker.maybeUpdateSSLSettings(appDesc.command, conf)),

cores_,

memory_,

self,

workerId,

host,

webUi.boundPort,

publicAddress,

sparkHome,

executorDir,

workerUri,

conf,

appLocalDirs, ExecutorState.RUNNING)

executors(appId + “/” + execId) = manager

manager.start()

coresUsed += cores_

memoryUsed += memory_

//向Master发送消息,表示Executor状态已经被更改ExecutorState.RUNNING

sendToMaster(ExecutorStateChanged(appId, execId, manager.state, None, None))

} catch {

case e: Exception =>

logError(s"Failed to launch executor a p p I d / appId/ appId/execId for ${appDesc.name}.", e)

if (executors.contains(appId + “/” + execId)) {

executors(appId + “/” + execId).kill()

executors -= appId + “/” + execId

}

sendToMaster(ExecutorStateChanged(appId, execId, ExecutorState.FAILED,

Some(e.toString), None))

}

}

}

在Executor创建中调用了fetchAndRunExecutor方法进行实现。

private def fetchAndRunExecutor() {

try {

// Launch the process

val subsOpts = appDesc.command.javaOpts.map {

Utils.substituteAppNExecIds(_, appId, execId.toString)

}

val subsCommand = appDesc.command.copy(javaOpts = subsOpts)

//通过应用程序的信息和环境配置创建构造器builder

val builder = CommandUtils.buildProcessBuilder(subsCommand, new SecurityManager(conf),

memory, sparkHome.getAbsolutePath, substituteVariables)

val command = builder.command()

val formattedCommand = command.asScala.mkString(“”“, “” “”, “””)

logInfo(s"Launch command: $formattedCommand")

//在构造器builder中添加执行目录信息

builder.directory(executorDir)

builder.environment.put(“SPARK_EXECUTOR_DIRS”, appLocalDirs.mkString(File.pathSeparator))

// In case we are running this from within the Spark Shell, avoid creating a “scala”

// parent process for the executor command

builder.environment.put(“SPARK_LAUNCH_WITH_SCALA”, “0”)

// Add webUI log urls

//在构造器builder中添加监控页面输入日志地址信息

val baseUrl =

if (conf.getBoolean(“spark.ui.reverseProxy”, false)) {

s"/proxy/ w o r k e r I d / l o g P a g e / ? a p p I d = workerId/logPage/?appId= workerId/logPage/?appId=appId&executorId=$execId&logType="

} else {

s"http:// p u b l i c A d d r e s s : publicAddress: publicAddress:webUiPort/logPage/?appId=KaTeX parse error: Expected 'EOF', got '&' at position 6: appId&̲executorId=execId&logType="

}

builder.environment.put(“SPARK_LOG_URL_STDERR”, s"${baseUrl}stderr")

builder.environment.put(“SPARK_LOG_URL_STDOUT”, s"${baseUrl}stdout")

//启动构造器,创建CoarseGrainedExecutorBackend实例

process = builder.start()

val header = “Spark Executor Command: %s\n%s\n\n”.format(

formattedCommand, “=” * 40)

// Redirect its stdout and stderr to files

//输出创建CoarseGrainedExecutorBackend实例运行信息

val stdout = new File(executorDir, “stdout”)

stdoutAppender = FileAppender(process.getInputStream, stdout, conf)

val stderr = new File(executorDir, “stderr”)

Files.write(header, stderr, StandardCharsets.UTF_8)

stderrAppender = FileAppender(process.getErrorStream, stderr, conf)

// Wait for it to exit; executor may exit with code 0 (when driver instructs it to shutdown)

// or with nonzero exit code

//等待CoarseGrainedExecutorBackend运行结束,当结束时,向Worker发送退出状态信息

val exitCode = process.waitFor()

state = ExecutorState.EXITED

val message = "Command exited with code " + exitCode

worker.send(ExecutorStateChanged(appId, execId, state, Some(message), Some(exitCode)))

} catch {

case interrupted: InterruptedException =>

logInfo(“Runner thread for executor " + fullId + " interrupted”)

state = ExecutorState.KILLED

killProcess(None)

case e: Exception =>

logError(“Error running executor”, e)

state = ExecutorState.FAILED

killProcess(Some(e.toString))

}

}

}

(4)Mater接收到Worker发送的ExecutorStateChanged消息

override def receive: PartialFunction[Any, Unit] = {

case ExecutorStateChanged(appId, execId, state, message, exitStatus) =>

val execOption = idToApp.get(appId).flatMap(app => app.executors.get(execId))

execOption match {

case Some(exec) =>

val appInfo = idToApp(appId)

val oldState = exec.state

exec.state = state

if (state == ExecutorState.RUNNING) {

assert(oldState == ExecutorState.LAUNCHING,

s"executor $execId state transfer from $oldState to RUNNING is illegal")

appInfo.resetRetryCount()

}

//向Driver发送ExecutorUpdated消息

exec.application.driver.send(ExecutorUpdated(execId, state, message, exitStatus, false))

if (ExecutorState.isFinished(state)) {

// Remove this executor from the worker and app

logInfo(s"Removing executor ${exec.fullId} because it is $state")

// If an application has already finished, preserve its

// state to display its information properly on the UI

if (!appInfo.isFinished) {

appInfo.removeExecutor(exec)

}

exec.worker.removeExecutor(exec)

val normalExit = exitStatus == Some(0)

// Only retry certain number of times so we don’t go into an infinite loop.

// Important note: this code path is not exercised by tests, so be very careful when

// changing this if condition.

if (!normalExit

&& appInfo.incrementRetryCount() >= MAX_EXECUTOR_RETRIES

&& MAX_EXECUTOR_RETRIES >= 0) { // < 0 disables this application-killing path

val execs = appInfo.executors.values

if (!execs.exists(_.state == ExecutorState.RUNNING)) {

logError(s"Application ${appInfo.desc.name} with ID ${appInfo.id} failed " +

s"${appInfo.retryCount} times; removing it")

removeApplication(appInfo, ApplicationState.FAILED)

}

}

}

schedule()

case None =>

logWarning(s"Got status update for unknown executor a p p I d / appId/ appId/execId")

}

}

(5)在DriverEndpoint终端点进行注册Executor。(在步骤(3)CoarseGrainedExecutorBackend启动方法Onstart中,会发送注册Executor消息给RegisterExecutor给DriverEndpoint)

override def receiveAndReply(context: RpcCallContext): PartialFunction[Any, Unit] = {

case RegisterExecutor(executorId, executorRef, hostname, cores, logUrls) =>

if (executorDataMap.contains(executorId)) {

executorRef.send(RegisterExecutorFailed("Duplicate executor ID: " + executorId))

context.reply(true)

}

//记录executor的编号,以及该executor使用的核数

addressToExecutorId(executorAddress) = executorId

总结

总的来说,面试是有套路的,一面基础,二面架构,三面个人。

最后,小编这里收集整理了一些资料,其中包括面试题(含答案)、书籍、视频等。希望也能帮助想进大厂的朋友

三面蚂蚁金服成功拿到offer后,他说他累了

三面蚂蚁金服成功拿到offer后,他说他累了

网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。

需要这份系统化的资料的朋友,可以添加V获取:vip1024b (备注Java)
img

一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!

RegisterExecutor(executorId, executorRef, hostname, cores, logUrls) =>

if (executorDataMap.contains(executorId)) {

executorRef.send(RegisterExecutorFailed("Duplicate executor ID: " + executorId))

context.reply(true)

}

//记录executor的编号,以及该executor使用的核数

addressToExecutorId(executorAddress) = executorId

总结

总的来说,面试是有套路的,一面基础,二面架构,三面个人。

最后,小编这里收集整理了一些资料,其中包括面试题(含答案)、书籍、视频等。希望也能帮助想进大厂的朋友

[外链图片转存中…(img-p2VSAw5L-1713647969321)]

[外链图片转存中…(img-VC2cVNyl-1713647969322)]

网上学习资料一大堆,但如果学到的知识不成体系,遇到问题时只是浅尝辄止,不再深入研究,那么很难做到真正的技术提升。

需要这份系统化的资料的朋友,可以添加V获取:vip1024b (备注Java)
[外链图片转存中…(img-vXqmV2Mw-1713647969322)]

一个人可以走的很快,但一群人才能走的更远!不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人,都欢迎加入我们的的圈子(技术交流、学习资源、职场吐槽、大厂内推、面试辅导),让我们一起学习成长!

  • 25
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值