driver:发送任务。检测executor状态。
executor:接受任务并执行,完成task任务。给driver发送消息反馈自己状态。
driver和executor之间通信老版本有2中方法,netty和akka,新版本只有netty通信。
先说一下akka通信:
一张图就把关系解释清楚了,akka定时给executor发送心跳,executor接受到心跳后反馈自身信息,如果driver没有收到某个节点的反馈信息,当达到一定次数都没收到反馈信息时候,driver就给executor标记死亡信息。将其移除。如果有新的节点,会把新的节点加入进来。
RegisteredExecutor 注册executor信息
/**
* Driver和Executor之间仍然使用Netty进行通信,在Driver端send一个LaunchTask的消息后,在Executor
* 端会对应有一个receive方法接收消息。在下面代码中,可以看到Executor除了响应LaunchTask之外还能处理的
* 其他事件。这些事件都继承自CoarseGrainedClusterMessage类。
* */
override def receive: PartialFunction[Any, Unit] = {
// 注册Executor消息
case RegisteredExecutor =>
logInfo("Successfully registered with driver")
try {
executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
} catch {
case NonFatal(e) =>
exitExecutor(1, "Unable to create executor due to " + e.getMessage, e)
}
executor的几种情况
// 杀死Task事件
case KillTask(taskId, _, interruptThread, reason) =>
if (executor == null) {
exitExecutor(1, "Received KillTask command but executor was null")
} else {
executor.killTask(taskId, interruptThread, reason)
}
// 停止Executor事件
case StopExecutor =>
stopping.set(true)
logInfo("Driver commanded a shutdown")
// Cannot shutdown here because an ack may need to be sent back to the caller. So send
// a message to self to actually do the shutdown.
self.send(Shutdown)
// 终止事件
case Shutdown =>
stopping.set(true)
new Thread("CoarseGrainedExecutorBackend-stop-executor") {
override def run(): Unit = {
// executor.stop() will call `SparkEnv.stop()` which waits until RpcEnv stops totally.
// However, if `executor.stop()` runs in some thread of RpcEnv, RpcEnv won't be able to
// stop until `executor.stop()` returns, which becomes a dead-lock (See SPARK-14180).
// Therefore, we put this line in a new thread.
executor.stop()
}
DisassociatedEvent 当driver和executor有一方退出时,会做的反映
/**
* 1.5 版本 推出的时候,Application只记得跟Executor打招呼,却忘记了Master。
* 但是:Akka的通讯机制保证当互相通讯的任意一方异常退出,另外一方都会收到DisassociatedEvent。
* case DisassociatedEvent
*
*/
override def onDisconnected(remoteAddress: RpcAddress): Unit = {
if (stopping.get()) {
logInfo(s"Driver from $remoteAddress disconnected during shutdown")
} else if (driver.exists(_.address == remoteAddress)) {
exitExecutor(1, s"Driver $remoteAddress disassociated! Shutting down.", null,
notifyDriver = false)
} else {
logWarning(s"An unknown ($remoteAddress) driver disconnected.")
}
}