executor作用
Spark executor,由一个线程池支持运行任务(tasks)。
源码剖析
CoarseGrainedExecutorBackend类
worker中为application启动的executor,实际上是启动的这个CoarseGrainedExecutorBackend进程.
向driver注册executor
/**
* leen
* 相当于是向driver 发送RegisterExecutor
*/
override def onStart() {
logInfo("Connecting to driver: " + driverUrl)
rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
// This is a very fast action so we can use "ThreadUtils.sameThread"
driver = Some(ref)
ref.ask[Boolean](RegisterExecutor(executorId, self, hostname, cores, extractLogUrls))
}(ThreadUtils.sameThread).onComplete {
// This is a very fast action so we can use "ThreadUtils.sameThread"
case Success(msg) =>
// Always receive `true`. Just ignore it
case Failure(e) =>
exitExecutor(1, s"Cannot register with driver: $driverUrl", e, notifyDriver = false)
}(ThreadUtils.sameThread)
}
接收向driver注册executor的返回消息
- 当driver注册好executor成功之后,创建executor句柄
- 使用executor句柄的launchTask()方法,启动task
/**
* leen
* 当driver注册好executor之后 ,返回RegisteredExecutor消息
* 此时,coarseGrainedExecutorBackend 创建一个executor
* 大部分功能都是通过executor实现的
*/
override def receive: PartialFunction[Any, Unit] = {
case RegisteredExecutor =>
logInfo("Successfully registered with driver")
try {
executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
} catch {
case NonFatal(e) =>
exitExecutor(1, "Unable to create executor due to " + e.getMessage, e)
}
/**
* 启动task
*/
case LaunchTask(data) =>
if (executor == null) {
exitExecutor(1, "Received LaunchTask command but executor was null")
} else {
//反序列化
val taskDesc = TaskDescription.decode(data.value)
logInfo("Got assigned task " + taskDesc.taskId)
//使用executor句柄的launchTask()方法,启动task
executor.launchTask(this, taskDesc)
}
}
Executor类
由一个线程池支持运行任务(tasks)。
launchTask具体实现
- 对于每一个task都需要创建一个taskRunner 【线程】
- 将线程[taskRunner]丢入线程池threadPool中进行执行;
- 调用Executor的run()方法,为运行tasks的做前提准备,及后续调用task.run()运行tasks
/**
* leen
* launchTask
*/
def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {
// 对于每一个task都需要创建一个taskRunner 【线程】
// TaskRunner实际上是继承Java的Runnable接口
val tr = new TaskRunner(context, taskDescription)
// 将TaskRunner放入内存缓存中,runningTasks维护运行任务列表。
runningTasks.put(taskDescription.taskId, tr)
// 这里讲task封装到一个线程中【taskRunner】
// 直接将线程丢入线程池threadPool中进行执行;
// 线程池是自动实现排队功能的,taskRunner 排队执行
threadPool.execute(tr)
}