Jobmanager处理SubmitJob
先将submitJob的主要步骤总结写在开头,然后一步步分析。
- 1,通过JobGraph生成ExecutionGraph;
- 2,恢复状态CheckpointedState,或者Savepoint;
- 3,提交Execution给Scheduler进行调度;
- 3.1 获取ExecutionGraph中所有vertice,并为其分配slot资源;
- 3.2 通知TaskManager,将每个vertice部署在分配好的资源中。
下面一步一步分析:
1,通过JobGraph生成ExecutionGraph
- jobManager接收到SubmitJob消息后,生成了一个jobInfo对象装载job信息,然后调用submitJob方法。
case SubmitJob(jobGraph, listeningBehaviour) =>
val client = sender()
val jobInfo = new JobInfo(client, listeningBehaviour, System.currentTimeMillis(),
jobGraph.getSessionTimeout)
submitJob(jobGraph, jobInfo)
- 深入submitJob方法,首先判断jobGraph是否为空,如果为空,返回JobResultFailure消息;
if (jobGraph == null) {
jobInfo.notifyClients(
decorateMessage(JobResultFailure(
new SerializedThrowable(
new JobSubmissionException(null, "JobGraph must not be null.")))))
}
- 接着向类库缓存管理器注册该Job相关的库文件、类路径;必须确保该步骤在第一步执行,因为后续产生任何异常可以确保上传的类库和Jar等成功从类库缓存管理器移除。
libraryCacheManager.registerJob(
jobGraph.getJobID, jobGraph.getUserJarBlobKeys, jobGraph.getClasspaths)
- 接下来是获得用户代码的类加载器classLoader以及发生失败时的重启策略restartStrategy;
val userCodeLoader = libraryCacheManager.getClassLoader(jobGraph.getJobID)
...
val restartStrategy =
Option(jobGraph.getSerializedExecutionConfig()
.deserializeValue(userCodeLoader)
.getRestartStrategy())
.map(RestartStrategyFactory.createRestartStrategy)
.filter(p => p != null) match {
case Some(strategy) => strategy
case None => restartStrategyFactory.createRestartStrategy()
}
- 接着,获取ExecutionGraph对象的实例。首先尝试从缓存中查找,如果缓存中存在则直接返回,否则直接创建然后加入缓存;
val registerNewGraph = currentJobs.get(jobGraph.getJobID) match {
case Some((graph, currentJobInfo)) =>
executionGraph = graph
currentJobInfo.setLastActive()
false
case None =>
true
}
val allocationTimeout: Long = flinkConfiguration.getLong(
JobManagerOptions.SLOT_REQUEST_TIMEOUT)
val resultPartitionLocationTrackerProxy: ResultPartitionLocationTrackerProxy =
new ResultPartitionLocationTrackerProxy(flinkConfiguration)
executionGraph = ExecutionGraphBuilder.buildGraph(
executionGraph,
jobGraph,
flinkConfiguration,
futureExecutor,
ioExecutor,
scheduler,
userCodeLoader,
checkpointRecoveryFactory,
Time.of(timeout.length, timeout.unit),
restartStrategy,