上节我们看到了JobGraph的生成,在集群环境中,jobGraph实在客户端生成的最终需要将生成的JobGraph和其他相关依赖一起上传到集群中,flink集群才可以才可以真正运行我们的代码
JobGraph的上传
我们还是和以前一样从代码的源头开始看,先简单回顾一下JobGraph的生成过程
到了 这一步我们的JobGraph就正式生成了,在这个方法里出来生成jobgraph,还生成了集群的描述器,其中主要记录了集群的JobManager内存和TaskManager内存,每个taskmanger有多少slot等配置,这里还生成了一个集群客户端用于提交jobgraph,采用异步的方式提交jobgraph,并一直jobmanager保持通讯,接受jobmanager执行job的返回结果。
这个代码主要做的:
1.在bin目录下创建一个名为flink-jobgraph的文件,将jobgraph持久化到磁盘上
2.将jobgraph文件,jar包,相关依赖上传到HDFS
3.jobgraph上传成功后,删除本地的Jobgraph文件,这样jobgraph和作业相关的资料flink集群就可以轻易的获取到了
@Override
public CompletableFuture<JobID> submitJob(@Nonnull JobGraph jobGraph) {
//在bin目录下创建一个名为flink-jobgraph的文件,将jobgraph持久化到磁盘上
CompletableFuture<java.nio.file.Path> jobGraphFileFuture =
CompletableFuture.supplyAsync(
() -> {
try {
final java.nio.file.Path jobGraphFile =
Files.createTempFile("flink-jobgraph", ".bin");
try (ObjectOutputStream objectOut =
new ObjectOutputStream(
Files.newOutputStream(jobGraphFile))) {
objectOut.writeObject(jobGraph);
}
return jobGraphFile;
} catch (IOException e) {
throw new CompletionException(
new FlinkException("Failed to serialize JobGraph.", e));
}
},
executorService);
//将jar包,jobgraph,相关依赖上传HDFS
CompletableFuture<Tuple2<JobSubmitRequestBody, Collection<FileUpload>>> requestFuture =
jobGraphFileFuture.thenApply(
jobGraphFile -> {
List<String> jarFileNames = new ArrayList<>(8);
List<JobSubmitRequestBody.DistributedCacheFile> artifactFileNames =
new ArrayList<>(8);
Collection<FileUpload> filesToUpload = new ArrayList<>(8);
filesToUpload.add(
new FileUpload(
jobGraphFile, RestConstants.CONTENT_TYPE_BINARY));
for (Path jar : jobGraph.getUserJars()) {
jarFileNames.add(jar.getName());
filesToUpload.add(
new FileUpload(
Paths.get(jar.toUri()),
RestConstants.CONTENT_TYPE_JAR));
}
for (Map.Entry<String, DistributedCache.DistributedCacheEntry>
artifacts : jobGraph.getUserArtifacts().entrySet()) {
final Path artifactFilePath =
new Path(artifacts.getValue().filePath);
try {
// Only local artifacts need to be uploaded.
if (!artifactFilePath.getFileSystem().isDistributedFS()) {
artifactFileNames.add(
new JobSubmitRequestBody.DistributedCacheFile(
artifacts.getKey(),
artifactFilePath.getName()));
filesToUpload.add(
new FileUpload(
Paths.get(artifacts.getValue().filePath),
RestConstants.CONTENT_TYPE_BINARY));
}
} catch (IOException e) {
throw new CompletionException(
new FlinkException(
"Failed to get the FileSystem of artifact "
+ artifactFilePath
+ ".",
e));
}
}
final JobSubmitRequestBody requestBody =
new JobSubmitRequestBody(
jobGraphFile.getFileName().toString(),
jarFileNames,
artifactFileNames);
return Tuple2.of(
requestBody, Collections.unmodifiableCollection(filesToUpload));
});
final CompletableFuture<JobSubmitResponseBody> submissionFuture =
requestFuture.thenCompose(
requestAndFileUploads ->
sendRetriableRequest(
JobSubmitHeaders.getInstance(),
EmptyMessageParameters.getInstance(),
requestAndFileUploads.f0,
requestAndFileUploads.f1,
isConnectionProblemOrServiceUnavailable()));
//上传成功后删除对应的jobgraph文件
submissionFuture
.thenCombine(jobGraphFileFuture, (ignored, jobGraphFile) -> jobGraphFile)
.thenAccept(
jobGraphFile -> {
try {
Files.delete(jobGraphFile);
} catch (IOException e) {
LOG.warn("Could not delete temporary file {}.", jobGraphFile, e);
}
});
return submissionFuture
.thenApply(ignore -> jobGraph.getJobID())
.exceptionally(
(Throwable throwable) -> {
throw new CompletionException(
new JobSubmissionException(
jobGraph.getJobID(),
"Failed to submit JobGraph.",
ExceptionUtils.stripCompletionException(throwable)));
});
}