前言
之前的文章中已对flink-yarn-session的独立模式下job的提交流程进行了分析,今天在此基础上,对flink-yarn-session的共享模式下job提交流程进行分析。
flink-yarn-seesion共享模式job提交流程分析
前面文章中已经介绍了flink-yarn-session的独立模式与共享模式出现的分支节点在org.apache.flink.client.cli.CliFrontend这个类的runProgram方法中,该方法见下:
private <T> void runProgram(
CustomCommandLine<T> customCommandLine,
CommandLine commandLine,
RunOptions runOptions,
PackagedProgram program) throws ProgramInvocationException, FlinkException {
final ClusterDescriptor<T> clusterDescriptor = customCommandLine.createClusterDescriptor(commandLine);
try {
final T clusterId = customCommandLine.getClusterId(commandLine);
final ClusterClient<T> client;
// directly deploy the job if the cluster is started in job mode and detached
if (clusterId == null && runOptions.getDetachedMode()) {
int parallelism = runOptions.getParallelism() == -1 ? defaultParallelism : runOptions.getParallelism();
final JobGraph jobGraph = PackagedProgramUtils.createJobGraph(program, configuration, parallelism);
final ClusterSpecification clusterSpecification = customCommandLine.getClusterSpecification(commandLine);
client = clusterDescriptor.deployJobCluster(
clusterSpecification,
jobGraph,
runOptions.getDetachedMode());
logAndSysout("Job has been submitted with JobID " + jobGraph.getJobID());
try {
client.shutdown();
} catch (Exception e) {
LOG.info("Could not properly shut down the client.", e);
}
} else {
final Thread shutdownHook;
if (clusterId != null) {
client = clusterDescriptor.retrieve(clusterId);
shutdownHook = null;
} else {
// also in job mode we have to deploy a session cluster because the job
// might consist of multiple parts (e.g. when using collect)
final ClusterSpecification clusterSpecification = customCommandLine.getClusterSpecification(commandLine);
client = clusterDescriptor.deploySessionCluster(clusterSpecification);
// if not running in detached mode, add a shutdown hook to shut down cluster if client exits
// there's a race-condition here if cli is killed before shutdown hook is installed
if (!runOptions.getDetachedMode() && runOptions.isShutdownOnAttachedExit()) {
shutdownHook = ShutdownHookUtil.addShutdownHook(client::shutDownCluster, client.getClass().getSimpleName(), LOG);
} else {
shutdownHook = null;
}
}
try {
client.setPrintStatusDuringExecution(runOptions.getStdoutLogging());
client.setDetached(runOptions.getDetachedMode());
LOG.debug("{}", runOptions.getSavepointRestoreSettings());
int userParallelism = runOptions.getParallelism();
LOG.debug("User parallelism is set to {}", userParallelism);
if (ExecutionConfig.PARALLELISM_DEFAULT == userParallelism) {
userParallelism = defaultParallelism;
}
executeProgram(program, client, userParallelism);
} finally {
if (clusterId == null && !client.isDetached()) {
// terminate the cluster only if we have started it before and if it's not detached
try {
client.shutDownCluster();
} catch (final Exception e) {
LOG.info("Could not properly terminate the Flink cluster.", e);
}
if (shutdownHook != null) {
// we do not need the hook anymore as we have just tried to shutdown the cluster.
ShutdownHookUtil.removeShutdownHook(shutdownHook, client.getClass().getSimpleName(), LOG);
}
}
try {
client.shutdown();
} catch (Exception e) {
LOG.info("Could not properly shut down the client.", e);
}
}
}
} finally {
try {
clusterDescriptor.close();
} catch (Exception e) {
LOG.info("Could not properly close the cluster descriptor.", e);
}
}
}
我们继续来分析flink-yarn-session的共享模式下的job提交流程。在分析之前,我得提2个问题:flink-yarn-session共享的是对象是什么?如何与共享对象进行交互?下面的分析也将围绕着这2个问题进行展。
用户在提交job的时候,如果传递了clusterId的值(该值对应的是yarn的applicationId),则会根据该值来获取ClusterClient对象;如果没有设置,则会先创建ClusterClient对象。我们先看一下,源码中是如何根据clusterId来获取ClusterClient对象,另外ClusterClient是抽象类,我们最终需要的ClusterClient实例到底是什么?我们来看一下clusterDescriptor.retrieve(clusterId)这个方法。在之前的文章中已经介绍了这里的clusterDescriptor实际上是YarnClusterDescriptor的对象,所以我们来到YarnClusterDescriptor类的父类AbstractYarnClusterDescriptor中来看retrieve方法:
@Override
public ClusterClient<ApplicationId> retrieve(ApplicationId applicationId) throws ClusterRetrieveException {
try {
// check if required Hadoop environment variables are set. If not, warn user
if (System.getenv("HADOOP_CONF_DIR") == null &&
System.getenv("YARN_CONF_DIR") == null) {
LOG.warn("Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set." +
"The Flink YARN Client needs one of these to be set to properly load the Hadoop " +
"configuration for accessing YARN.");
}
final ApplicationReport appReport = yarnClient.getApplicationReport(applicationId);
if (appReport.getFinalApplicationStatus() != FinalApplicationStatus.UNDEFINED) {
// Flink cluster is not running anymore
LOG.error("The application {} doesn't run anymore. It has previously completed with final status: {}",
applicationId, appReport.getFinalApplicationStatus());
throw new RuntimeException("The Yarn application " + applicationId + " doesn't run anymore.");
}
final String host = appReport.getHost();
final int rpcPort = appReport.getRpcPort();
LOG.info("Found application JobManager host name '{}' and port '{}' from supplied application id '{}'",
host, rpcPort, applicationId);
flinkConfiguration.setString(JobManagerOptions.ADDRESS, host);
flinkConfiguration.setInteger(JobManagerOptions.PORT, rpcPort);
flinkConfiguration.setString(RestOptions.ADDRESS, host);
flinkConfiguration.setInteger(RestOptions.PORT, rpcPort);
return createYarnClusterClient(
this,
-1, // we don't know the number of task managers of a started Flink cluster
-1, // we don't know how many slots each task manager has for a started Flink cluster
appReport,
flinkConfiguration,
false);
} catch (Exception e) {
throw new ClusterRetrieveException("Couldn't retrieve Yarn cluster", e);
}
}
通过上面代码,我们知道了这样的一个流程:首先yarnClient根据clusterId来获取ApplicationReport对象(ApplicationReport是yarn client里的类,该类的作用是保存通过yarnclient获取到的应用状态信息)。这里ApplicationReport对象保存的信息实际上就是flink jobManager的信息。那么如何与jobManager进行通信呢?答案就是ApplicationReport对象的host与RpcPort值。这就是与jobmanager通信的地址与端口。下面有点好奇的是,同样的数据为啥要在flinkConfiguration设置2次,而且2次的属性名不一样?我们还是来看一下源码吧:
/**
* Configuration options for the JobManager.
*/
@PublicEvolving
public class JobManagerOptions {
/**
* The config parameter defining the network address to connect to
* for communication with the job manager.
*
* <p>This value is only interpreted in setups where a single JobManager with static
* name or address exists (simple standalone setups, or container setups with dynamic
* service name resolution). It is not used in many high-availability setups, when a
* leader-election service (like ZooKeeper) is used to elect and discover the JobManager
* leader from potentially multiple standby JobManagers.
*/
public static final ConfigOption<String> ADDRESS =
key("jobmanager.rpc.address")
.noDefaultValue()
.withDescription("The config parameter defining the network address to connect to" +
" for communication with the job manager." +
" This value is only interpreted in setups where a single JobManager with static" +
" name or address exists (simple standalone setups, or container setups with dynamic" +
" service name resolution). It is not used in many high-availability setups, when a" +
" leader-election service (like ZooKeeper) is used to elect and discover the JobManager" +
" leader from potentially multiple standby JobManagers.");
JobManagerOptions.ADDRESS对应的key为:jobmanager.rpc.address,该值为与jobManager通信的地址,该值不适用于高可用环境下。我们再看一下RestOptions代码:
/**
* The address that should be used by clients to connect to the server.
*/
public static final ConfigOption<String> ADDRESS =
key("rest.address")
.noDefaultValue()
.withFallbackKeys(JobManagerOptions.ADDRESS.key())
.withDescription("The address that should be used by clients to connect to the server.");
RestOptions.ADDRESS对应的key为:rest.addrees。该地址是client与server进行通信的地址,是restful格式的地址。通过对比我们知道了为啥要设置2次。设置的2次对应的key是不同的,也就是处于不同的业务语境。
到这里我们已经知道了共享模式下共享的对象就是jobManager以及jobManager所管理的taskManager。也就是共享一套flink集群。我们回头再看一下AbstractYarnClusterDescriptor中来看retrieve方法。最终通过调用createYarnClusterClient来返回ClusterClient。由于createYarnClusterClient是抽象方法,我们得看一下具体的实现方法:
@Override
protected ClusterClient<ApplicationId> createYarnClusterClient(
AbstractYarnClusterDescriptor descriptor,
int numberTaskManagers,
int slotsPerTaskManager,
ApplicationReport report,
Configuration flinkConfiguration,
boolean perJobCluster) throws Exception {
return new RestClusterClient<>(
flinkConfiguration,
report.getApplicationId());
}
到这里,我们知道了在runProgram方法里,共享模式下,通过clusterId获取的ClusterClient实际上是RestClusterClient对象。我们再看一下,如果没有传递clusterId,flink是如何获取ClusterClient?获取的ClusterClient实际对象又是什么呢?在没有传递clusterId的时候,flink先去在yarn部署一个集群。调用的方法是
client = clusterDescriptor.deploySessionCluster(clusterSpecification);
之前的分析中已经知道了clusterDescriptor实际上就是YarnClusterDescriptor的实例,因此我们来看看是如何发布的:
@Override
public ClusterClient<ApplicationId> deploySessionCluster(ClusterSpecification clusterSpecification) throws ClusterDeploymentException {
try {
return deployInternal(
clusterSpecification,
"Flink session cluster",
getYarnSessionClusterEntrypoint(),
null,
false);
} catch (Exception e) {
throw new ClusterDeploymentException("Couldn't deploy Yarn session cluster", e);
}
}
在这个deploySessionCluster方法中直接调用了deployInternal方法。看过之前的文章,就知道deployInternal在独立模式下提交job最终也被调用了。现在只不过参数有点变化而已。而在deployInternal方法里最终也调用了前面所讲的createYarnClusterClient方法,最终返回的也是RestClusterClient对象。从字面上看,RestClusterClient与Server(JobManager)通信走的是Http协议并且是restful风格。那到底是不是呢?我们看一下RestClusterClient类上的注释吧:
/**
* A {@link ClusterClient} implementation that communicates via HTTP REST requests.
*/
public class RestClusterClient<T> extends ClusterClient<T> implements NewClusterClient {
//此处省略n字符
}
没错!!!也就是说,在AM/JobManager启动的时候,实际上启动了一个web应用,对应的就是web监控页面。
我们需要的ClusterClient也有了,接下来,我们看一下executeProgram方法执行过程:
protected void executeProgram(PackagedProgram program, ClusterClient<?> client, int parallelism) throws ProgramMissingJobException, ProgramInvocationException {
logAndSysout("Starting execution of program");
final JobSubmissionResult result = client.run(program, parallelism);
if (null == result) {
throw new ProgramMissingJobException("No JobSubmissionResult returned, please make sure you called " +
"ExecutionEnvironment.execute()");
}
if (result.isJobExecutionResult()) {
logAndSysout("Program execution finished");
JobExecutionResult execResult = result.getJobExecutionResult();
System.out.println("Job with JobID " + execResult.getJobID() + " has finished.");
System.out.println("Job Runtime: " + execResult.getNetRuntime() + " ms");
Map<String, Object> accumulatorsResult = execResult.getAllAccumulatorResults();
if (accumulatorsResult.size() > 0) {
System.out.println("Accumulator Results: ");
System.out.println(AccumulatorHelper.getResultsFormatted(accumulatorsResult));
}
} else {
logAndSysout("Job has been submitted with JobID " + result.getJobID());
}
}
在executeProgram中又调用了RestClient的run方法,并且返回JobSubmissionResult对象。我们继续跟踪代码,看一下RestClient的run方法:
/**
* General purpose method to run a user jar from the CliFrontend in either blocking or detached mode, depending
* on whether {@code setDetached(true)} or {@code setDetached(false)}.
* @param prog the packaged program
* @param parallelism the parallelism to execute the contained Flink job
* @return The result of the execution
* @throws ProgramMissingJobException
* @throws ProgramInvocationException
*/
public JobSubmissionResult run(PackagedProgram prog, int parallelism)
throws ProgramInvocationException, ProgramMissingJobException {
final ClassLoader contextClassLoader = Thread.currentThread().getContextClassLoader();
try {
Thread.currentThread().setContextClassLoader(prog.getUserCodeClassLoader());
if (prog.isUsingProgramEntryPoint()) {
final JobWithJars jobWithJars = prog.getPlanWithJars();
return run(jobWithJars, parallelism, prog.getSavepointSettings());
}
else if (prog.isUsingInteractiveMode()) {
log.info("Starting program in interactive mode (detached: {})", isDetached());
final List<URL> libraries = prog.getAllLibraries();
ContextEnvironmentFactory factory = new ContextEnvironmentFactory(this, libraries,
prog.getClasspaths(), prog.getUserCodeClassLoader(), parallelism, isDetached(),
prog.getSavepointSettings());
ContextEnvironment.setAsContext(factory);
try {
// invoke main method
prog.invokeInteractiveModeForExecution();
if (lastJobExecutionResult == null && factory.getLastEnvCreated() == null) {
throw new ProgramMissingJobException("The program didn't contain a Flink job.");
}
if (isDetached()) {
// in detached mode, we execute the whole user code to extract the Flink job, afterwards we run it here
return ((DetachedEnvironment) factory.getLastEnvCreated()).finalizeExecute();
}
else {
// in blocking mode, we execute all Flink jobs contained in the user code and then return here
return this.lastJobExecutionResult;
}
}
finally {
ContextEnvironment.unsetContext();
}
}
else {
throw new ProgramInvocationException("PackagedProgram does not have a valid invocation mode.");
}
}
finally {
Thread.currentThread().setContextClassLoader(contextClassLoader);
}
}
前面文章中,我们提到过PackagedProgram这个类。我们在顺便提一下。PackagedProgram这个类要干以下三件事情:
- 从jar中抽取出job依赖的jar包(解析jar目录结构,如果存在lib目录,则对该目录进行扫描,看是否有jar包,如果有,则会提取jar包并写入到本地的临时目录中,供后期使用。);
- 从jar包中获取job的入口类(解析jar中的manifest,如果有program-class配置,则优先使用该配置,否则再看一下是否有Main-Class配置);
- 从jar包中获取job的执行计划;
RestClient的run方法执行根据prog的入口类类型分为2个分支:如果存在入口类为org.apache.flink.api.common.Program的实现类,则走一个分支;如果存在入口类且不是org.apache.flink.api.common.Program的实现类,则走另外一个分支。下面先说第一个分支:
当入口类为org.apache.flink.api.common.Program的实现类时,flink回去掉用重载方法run:
/**
* Runs a program on the Flink cluster to which this client is connected. The call blocks until the
* execution is complete, and returns afterwards.
*
* @param jobWithJars The program to be executed.
* @param parallelism The default parallelism to use when running the program. The default parallelism is used
* when the program does not set a parallelism by itself.
*
* @throws CompilerException Thrown, if the compiler encounters an illegal situation.
* @throws ProgramInvocationException Thrown, if the program could not be instantiated from its jar file,
* or if the submission failed. That might be either due to an I/O problem,
* i.e. the job-manager is unreachable, or due to the fact that the
* parallel execution failed.
*/
public JobSubmissionResult run(JobWithJars jobWithJars, int parallelism, SavepointRestoreSettings savepointSettings)
throws CompilerException, ProgramInvocationException {
ClassLoader classLoader = jobWithJars.getUserCodeClassLoader();
if (classLoader == null) {
throw new IllegalArgumentException("The given JobWithJars does not provide a usercode class loader.");
}
OptimizedPlan optPlan = getOptimizedPlan(compiler, jobWithJars, parallelism);
return run(optPlan, jobWithJars.getJarFiles(), jobWithJars.getClasspaths(), classLoader, savepointSettings);
}
该方法是阻塞的,在该方法中获取优化后的执行计划,然后调用了重载方法run:
public JobSubmissionResult run(FlinkPlan compiledPlan,
List<URL> libraries, List<URL> classpaths, ClassLoader classLoader, SavepointRestoreSettings savepointSettings)
throws ProgramInvocationException {
JobGraph job = getJobGraph(flinkConfig, compiledPlan, libraries, classpaths, savepointSettings);
return submitJob(job, classLoader);
}
在这个方法中获取了JobGraph对象,然后调用了submitJob方法,我们继续跟:
@Override
public JobSubmissionResult submitJob(JobGraph jobGraph, ClassLoader classLoader) throws ProgramInvocationException {
log.info("Submitting job {} (detached: {}).", jobGraph.getJobID(), isDetached());
final CompletableFuture<JobSubmissionResult> jobSubmissionFuture = submitJob(jobGraph);
if (isDetached()) {
try {
return jobSubmissionFuture.get();
} catch (Exception e) {
throw new ProgramInvocationException("Could not submit job",
jobGraph.getJobID(), ExceptionUtils.stripExecutionException(e));
}
} else {
final CompletableFuture<JobResult> jobResultFuture = jobSubmissionFuture.thenCompose(
ignored -> requestJobResult(jobGraph.getJobID()));
final JobResult jobResult;
try {
jobResult = jobResultFuture.get();
} catch (Exception e) {
throw new ProgramInvocationException("Could not retrieve the execution result.",
jobGraph.getJobID(), ExceptionUtils.stripExecutionException(e));
}
try {
this.lastJobExecutionResult = jobResult.toJobExecutionResult(classLoader);
return lastJobExecutionResult;
} catch (JobExecutionException e) {
throw new ProgramInvocationException("Job failed.", jobGraph.getJobID(), e);
} catch (IOException | ClassNotFoundException e) {
throw new ProgramInvocationException("Job failed.", jobGraph.getJobID(), e);
}
}
}
在该方法中又调用了重载方法submitJob,我们继续跟踪:
/**
* Submits the given {@link JobGraph} to the dispatcher.
*
* @param jobGraph to submit
* @return Future which is completed with the submission response
*/
@Override
public CompletableFuture<JobSubmissionResult> submitJob(@Nonnull JobGraph jobGraph) {
// we have to enable queued scheduling because slot will be allocated lazily
jobGraph.setAllowQueuedScheduling(true);
CompletableFuture<java.nio.file.Path> jobGraphFileFuture = CompletableFuture.supplyAsync(() -> {
try {
final java.nio.file.Path jobGraphFile = Files.createTempFile("flink-jobgraph", ".bin");
try (ObjectOutputStream objectOut = new ObjectOutputStream(Files.newOutputStream(jobGraphFile))) {
objectOut.writeObject(jobGraph);
}
return jobGraphFile;
} catch (IOException e) {
throw new CompletionException(new FlinkException("Failed to serialize JobGraph.", e));
}
}, executorService);
CompletableFuture<Tuple2<JobSubmitRequestBody, Collection<FileUpload>>> requestFuture = jobGraphFileFuture.thenApply(jobGraphFile -> {
List<String> jarFileNames = new ArrayList<>(8);
List<JobSubmitRequestBody.DistributedCacheFile> artifactFileNames = new ArrayList<>(8);
Collection<FileUpload> filesToUpload = new ArrayList<>(8);
filesToUpload.add(new FileUpload(jobGraphFile, RestConstants.CONTENT_TYPE_BINARY));
for (Path jar : jobGraph.getUserJars()) {
jarFileNames.add(jar.getName());
filesToUpload.add(new FileUpload(Paths.get(jar.toUri()), RestConstants.CONTENT_TYPE_JAR));
}
for (Map.Entry<String, DistributedCache.DistributedCacheEntry> artifacts : jobGraph.getUserArtifacts().entrySet()) {
artifactFileNames.add(new JobSubmitRequestBody.DistributedCacheFile(artifacts.getKey(), new Path(artifacts.getValue().filePath).getName()));
filesToUpload.add(new FileUpload(Paths.get(artifacts.getValue().filePath), RestConstants.CONTENT_TYPE_BINARY));
}
final JobSubmitRequestBody requestBody = new JobSubmitRequestBody(
jobGraphFile.getFileName().toString(),
jarFileNames,
artifactFileNames);
return Tuple2.of(requestBody, Collections.unmodifiableCollection(filesToUpload));
});
final CompletableFuture<JobSubmitResponseBody> submissionFuture = requestFuture.thenCompose(
requestAndFileUploads -> sendRetriableRequest(
JobSubmitHeaders.getInstance(),
EmptyMessageParameters.getInstance(),
requestAndFileUploads.f0,
requestAndFileUploads.f1,
isConnectionProblemOrServiceUnavailable())
);
submissionFuture
.thenCombine(jobGraphFileFuture, (ignored, jobGraphFile) -> jobGraphFile)
.thenAccept(jobGraphFile -> {
try {
Files.delete(jobGraphFile);
} catch (IOException e) {
log.warn("Could not delete temporary file {}.", jobGraphFile, e);
}
});
return submissionFuture
.thenApply(
(JobSubmitResponseBody jobSubmitResponseBody) -> new JobSubmissionResult(jobGraph.getJobID()))
.exceptionally(
(Throwable throwable) -> {
throw new CompletionException(new JobSubmissionException(jobGraph.getJobID(), "Failed to submit JobGraph.", ExceptionUtils.stripCompletionException(throwable)));
});
}
到这里,这个方法是最终提交job的方法。这个方法主要干了如下事情:
- 将JobGraph对象内容写入到本地临时文件中;
- 向JobManager上传所需要的资源文件;
- 向jobManager发起submit请求最终返回JobSubmissionResult对象;
到此第一个分支已结束。我们再看第二个分支。在第二个分支中最终调用了:
prog.invokeInteractiveModeForExecution();方法
我们看一下这个方法具体实现:
/**
* This method assumes that the context environment is prepared, or the execution
* will be a local execution by default.
*/
public void invokeInteractiveModeForExecution() throws ProgramInvocationException{
if (isUsingInteractiveMode()) {
callMainMethod(mainClass, args);
} else {
throw new ProgramInvocationException("Cannot invoke a plan-based program directly.");
}
}
在这个方法中又调用了callMainMethod方法,我们继续跟踪:
private static void callMainMethod(Class<?> entryClass, String[] args) throws ProgramInvocationException {
Method mainMethod;
if (!Modifier.isPublic(entryClass.getModifiers())) {
throw new ProgramInvocationException("The class " + entryClass.getName() + " must be public.");
}
try {
mainMethod = entryClass.getMethod("main", String[].class);
} catch (NoSuchMethodException e) {
throw new ProgramInvocationException("The class " + entryClass.getName() + " has no main(String[]) method.");
}
catch (Throwable t) {
throw new ProgramInvocationException("Could not look up the main(String[]) method from the class " +
entryClass.getName() + ": " + t.getMessage(), t);
}
if (!Modifier.isStatic(mainMethod.getModifiers())) {
throw new ProgramInvocationException("The class " + entryClass.getName() + " declares a non-static main method.");
}
if (!Modifier.isPublic(mainMethod.getModifiers())) {
throw new ProgramInvocationException("The class " + entryClass.getName() + " declares a non-public main method.");
}
try {
mainMethod.invoke(null, (Object) args);
}
catch (IllegalArgumentException e) {
throw new ProgramInvocationException("Could not invoke the main method, arguments are not matching.", e);
}
catch (IllegalAccessException e) {
throw new ProgramInvocationException("Access to the main method was denied: " + e.getMessage(), e);
}
catch (InvocationTargetException e) {
Throwable exceptionInMethod = e.getTargetException();
if (exceptionInMethod instanceof Error) {
throw (Error) exceptionInMethod;
} else if (exceptionInMethod instanceof ProgramParametrizationException) {
throw (ProgramParametrizationException) exceptionInMethod;
} else if (exceptionInMethod instanceof ProgramInvocationException) {
throw (ProgramInvocationException) exceptionInMethod;
} else {
throw new ProgramInvocationException("The main method caused an error: " + exceptionInMethod.getMessage(), exceptionInMethod);
}
}
catch (Throwable t) {
throw new ProgramInvocationException("An error occurred while invoking the program's main method: " + t.getMessage(), t);
}
}
在这个方法中,最终调用job中的main方法。也就是说job直接在client端执行。
到此为止,针对flink源码分析-flink-yarn-session 共享模式下job提交流程已结束。