背景
一个Task任务发送到一个worker上的完成运行过程源码分析
过程
位于Drive的CoarseGrainedSchedulerBackend会向work节点的CoarseGrainedExecutorBackend发送运行task的请求,于是位于worker的Task是从CoarseGrainedExecutorBackend的receive()方法开始的。
(1) CoarseGrainedExecutorBackend的receive()方法会实例化Executor对象,并以序列化出来的TaskDescription对象为参数,调用Executor.launchTask方法
override def receive: PartialFunction[Any, Unit] = {
// 注册 请求
case RegisteredExecutor =>
logInfo("Successfully registered with driver")
try {
executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false)
} catch {
case NonFatal(e) =>
exitExecutor(1, "Unable to create executor due to " + e.getMessage, e)
}
case RegisterExecutorFailed(message) =>
exitExecutor(1, "Slave registration failed: " + message)
// 启动task 请求
case LaunchTask(data) =>
if (executor == null) {
exitExecutor(1, "Received LaunchTask command but executor was null")
} else {
// 实例化TaskDescription
val taskDesc = TaskDescription.decode(data.value)
logInfo("Got assigned task " + taskDesc.taskId)
// 启动executor
executor.launchTask(this, taskDesc)
}
………………
}
(2) launchTask会实例化一个TaskRunner处理线程,并放到线程池里运行,Executor.launchTask如下:
def launchTask(context: ExecutorBackend, taskDescription: TaskDescription): Unit = {
val tr = new TaskRunner(context, taskDescription)
runningTasks.put(taskDescription.taskId, tr)
threadPool.execute(tr)
}
(3)TaskRunner的run方法,会记录当前时间,获取blockManager、类加载器和memoryManager等对象,并更新
CoarseGrainedExecutorBackend中的Task状态,并反序列化出Task对象,生成task实例,然后调用task实例的run方法,需要注意:
a.TaskRunner是线程而Task对象只是个普通的类的对象
b.Task类是个抽象类,他有两个实现:ShuffleMapTask和ResultTask
c.因为task任务的运行是跟partition挂钩的,所以在这一步实例化出task对象的时候,已经传入相应的partiontID信息
(4)ShuffleMapTask.run方法,run方法会调用自己的runTask方法运行具体的任务,其实,下一步就是调用Rdd自己的iterator计算相应的patition了,如下面代码中所说明,具体的ShuffleMapTask.runTask如下:
override def runTask(context: TaskContext): MapStatus = {
// Deserialize the RDD using the broadcast variable.
val threadMXBean = ManagementFactory.getThreadMXBean
val deserializeStartTime = System.currentTimeMillis()
val deserializeStartCpuTime = if (threadMXBean.isCurrentThreadCpuTimeSupported) {
threadMXBean.getCurrentThreadCpuTime
} else 0L
val ser = SparkEnv.get.closureSerializer.newInstance()
//反序列化出rdd和其对象依赖
val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](
ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)
_executorDeserializeTime = System.currentTimeMillis() - deserializeStartTime
_executorDeserializeCpuTime = if (threadMXBean.isCurrentThreadCpuTimeSupported) {
threadMXBean.getCurrentThreadCpuTime - deserializeStartCpuTime
} else 0L
var writer: ShuffleWriter[Any, Any] = null
try {
//创建结果写入writer对象
val manager = SparkEnv.get.shuffleManager
writer = manager.getWriter[Any, Any](dep.shuffleHandle, partitionId, context)
//调用Rdd的iterator 计算相应的partition
writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])
writer.stop(success = true).get
} catch {
case e: Exception =>
try {
if (writer != null) {
writer.stop(success = false)
}
} catch {
case e: Exception =>
log.debug("Could not stop writer", e)
}
throw e
}
}
(5)具体到RDD中的iterator了,rdd.iterator方法如下:
final def iterator(split: Partition, context: TaskContext): Iterator[T] = {
if (storageLevel != StorageLevel.NONE) {
//可能缓存过,如果没有则计算
getOrCompute(split, context)
} else {
//没有缓存过,但是可能持久化到硬盘了,checkpoint会将rdd处久化到硬盘
computeOrReadCheckpoint(split, context)
}
}
(6) computeOrReadCheckPoint方法会调用rdd实现类的comput方法计算,rdd有两个实现类,这里选择MapPartitionRDD的源码,computeOrReadCheckPoint源码如下:
private[spark] def computeOrReadCheckpoint(split: Partition, context: TaskContext): Iterator[T] =
{
if (isCheckpointedAndMaterialized) {
firstParent[T].iterator(split, context)
} else {
调用计算partition的算法
compute(split, context)
}
}
(7) MapPartitionRDD.compute调用我会预先写好的代码方法进行计算,其中f方法就是我们自定义的代码部分:
f: (TaskContext, Int, Iterator[T]) => Iterator[U], // (TaskContext, partition index, iterator)
override def compute(split: Partition, context: TaskContext): Iterator[U] =
f(context, split.index, firstParent[T].iterator(split, context))
运行完成后,会通过ShuffleMapTask.runTask中的write写出
结论
具体一个Executor上Task运行的过程实际上是使用RDD中的compute计算某一个分区的过程,partition的分区数目,也决定了并行的数目。