Spark-executor
@(spark)[executor]
ExecutorExitCode
/**
* These are exit codes that executors should use to provide the master with information about
* executor failures assuming that cluster management framework can capture the exit codes (but
* perhaps not log files). The exit code constants here are chosen to be unlikely to conflict
* with "natural" exit statuses that may be caused by the JVM or user code. In particular,
* exit codes 128+ arise on some Unix-likes as a result of signals, and it appears that the
* OpenJDK JVM may use exit code 1 in some of its own "last chance" code.
*/
private[spark]
object ExecutorExitCode {
ExecutorSource
主要就是一些metric
CoarseGrainedExecutorBackend
class CoarseGrainedExecutorBackend其实是个Actor,它是有main函数的:
1. 启动一个叫做fetcher的actorSystem,从driver处获取sparkConf
2. 关闭fetcher
3. createExecutorEnv即SparkEnv
4. 启动CoarseGrainedExecutorBackend这个Actor
- 向driver注册自己
- 等待收消息
override def receiveWithLogging = {
case RegisteredExecutor =>
logInfo("Successfully registered with driver")
val (hostname, _) = Utils.parseHostPort(hostPort)
executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
case RegisterExecutorFailed(message) =>
logError("Slave registration failed: " + message)
System.exit(1)
case LaunchTask(data) =>
if (executor == null) {
logError("Received LaunchTask command but executor was null")
System.exit(1)
} else {
val ser = env.closureSerializer.newInstance()
val taskDesc = ser.deserialize[TaskDescription](data.value)
logInfo("Got assigned task " + taskDesc.taskId)
executor.launchTask(this, taskId = taskDesc.taskId, attemptNumber = taskDesc.attemptNumber,
taskDesc.name, taskDesc.serializedTask)
}
case KillTask(taskId, _, interruptThread) =>
if (executor == null) {
logError("Received KillTask command but executor was null")
System.exit(1)
} else {
executor.killTask(taskId, interruptThread)
}
case x: DisassociatedEvent =>
if (x.remoteAddress == driver.anchorPath.address) {
logError(s"Driver $x disassociated! Shutting down.")
System.exit(1)
} else {
logWarning(s"Received irrelevant DisassociatedEvent $x")
}
case StopExecutor =>
logInfo("Driver commanded a shutdown")
executor.stop()
context.stop(self)
context.system.shutdown()
}
值得关注的message其实是LaunchTask,它会调用executor.launchTask
Executor
/**
* Spark executor used with Mesos, YARN, and the standalone scheduler.
* In coarse-grained mode, an existing actor system is provided.
*/
private[spark] class Executor(
executorId: String,
executorHostname: String,
env: SparkEnv,
userClassPath: Seq[URL] = Nil,
isLocal: Boolean = false)
extends Logging
其中最重要的函数是:
def launchTask(
context: ExecutorBackend,
taskId: Long,
attemptNumber: Int,
taskName: String,
serializedTask: ByteBuffer) {
val tr = new TaskRunner(context, taskId = taskId, attemptNumber = attemptNumber, taskName,
serializedTask)
runningTasks.put(taskId, tr)
threadPool.execute(tr)
}
显然这是一个基于线程池的异步执行:TaskRunner的Run函数逻辑如下:
1. 调用 Task.deserializeWithDependencies,得到依赖的文件和jar
2. 根据cache的状况决定是get文件还是用cache的文件;文件是否更新是由filename和timestamp共同决定的
3. 反序列化,得到真正的task
4. call task.run真正去执行task
5. 根据结果大小决定是把结果直接返回还是写入blockManager
6. 通过execBackEnd的statusUpdate把结果返回driver
从上面的流程可以看出:在整个过程中有下面几种数据流动:
1. taskFiles和taskJar的url
2. taskFiles和taskJars的文件,如果有cache的话,可以没有
3. Task的序列化byte
4. 结果(block地址或者一个比较小的结果)
即一个task的网络传输不会很多。
task的真正执行过程和task的调度在scheduler中。