flink源码解析之系统启动和job提交

在这里插入图片描述

前沿

flink on yarn 主要有两种部署方式

1.on session
2.单独任务

我们主要讲一些第二种单独提交到yarn的任务,这种方式下提交任务的流程。

具体的可参考

https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/deployment/yarn_setup.html

启动的脚本 是
./bin/flink run -m yarn-cluster -yn 2 ./examples/batch/WordCount.jar
我目前看的代码是基于最新的从git上面下载的版本flink-1.8-SNAPSHOT

入口

通过查询bin/flink 的内容看到最终执行的是下面的类,我们进入这个类的main方法。
org.apache.flink.client.cli.CliFrontend

parseParameters ,不同的action进入不同的方法,我们这里./bin/flink后面跟的run,所以action是run,进入run方法。

运行程序进入run方法
final CustomCommandLine<?> customCommandLine = getActiveCustomCommandLine(commandLine);
根据参数获取CustomCommandLine的子类FlinkYarnSessionCli

进入runProgram


	private <T> void runProgram(
			CustomCommandLine<T> customCommandLine,
			CommandLine commandLine,
			RunOptions runOptions,
			PackagedProgram program) throws ProgramInvocationException, FlinkException {
			
			
		//这个customCommandLine就是上面FlinkYarnSessionCli的实例,通过createClusterDescriptor方法获取ClusterDescriptor的实现类YarnClusterDescriptor的实例
		final ClusterDescriptor<T> clusterDescriptor = customCommandLine.createClusterDescriptor(commandLine);

		try {
			final T clusterId = customCommandLine.getClusterId(commandLine);

			final ClusterClient<T> client;
			
			...............................
			
			
			//部署集群
			final ClusterSpecification clusterSpecification = customCommandLine.getClusterSpecification(commandLine);
					client = clusterDescriptor.deploySessionCluster(clusterSpecification);

在方法里获取AbstractYarnClusterDescriptor
org.apache.flink.yarn.cli.FlinkYarnSessionCli#createDescriptor

构造AbstractYarnClusterDescriptor,然后设置参数,返回实例


	private AbstractYarnClusterDescriptor createDescriptor(
			Configuration configuration,
			YarnConfiguration yarnConfiguration,
			String configurationDirectory,
			CommandLine cmd) {

		AbstractYarnClusterDescriptor yarnClusterDescriptor = getClusterDescriptor(
			configuration,
			yarnConfiguration,
			configurationDirectory);
      
      ..........................

    return yarnClusterDescriptor;

构造YarnClient,用于和yarn集群交互,yarn application的相关处理

	private AbstractYarnClusterDescriptor getClusterDescriptor(
			Configuration configuration,
			YarnConfiguration yarnConfiguration,
			String configurationDirectory) {
		final YarnClient yarnClient = YarnClient.createYarnClient();
		yarnClient.init(yarnConfiguration);
		yarnClient.start();

		return new YarnClusterDescriptor(
			configuration,
			yarnConfiguration,
			configurationDirectory,
			yarnClient,
			false);
	}


启动flink集群

final ClusterSpecification clusterSpecification = customCommandLine.getClusterSpecification(commandLine);
client = clusterDescriptor.deploySessionCluster(clusterSpecification);

跟踪代码,最后进入了org.apache.flink.yarn.AbstractYarnClusterDescriptor#deployInternal

在这个方法里,将向yarn申请资源,启动ApplicationMaster/JobManager。

这个方法东西比较多,会做一系列的检查,处理,如配置文件是否有效,yarn集群的资源等,然后调用startAppMaster方法启动ApplicationMaster/JobManager。

startAppMaster大概有400行,会做大量的检查和初始化的工作,如加载jar包等,最后通过yarn 客户端提交application。

	LOG.info("Submitting application master " + appId);
	yarnClient.submitApplication(appContext);

开始执行用户job

再回到CliFrontend的runProgram方法,上面的操作启动集群之后,然后开始真正执行用户的job

executeProgram(program, client, userParallelism);

然后调用run方法
org.apache.flink.client.program.ClusterClient#run(org.apache.flink.client.program.PackagedProgram, int)

最后通过反射方式法来执行

// invoke main method
prog.invokeInteractiveModeForExecution();

集群接收用户job

用户的任务只有在最后执行execute方法的时候才开始真正提交。

// set up the execution environment
		final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

首先构建一个StreamExecutionEnvironment流的子流,我们看到StreamExecutionEnvironment类有下面的子类。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-sziSHfyW-1590241819470)(https://note.youdao.com/yws/public/resource/d18a3afb979d44dad77ca88f753a7ecd/xmlnote/AA97CF5A575F4961B208594A405BA9CF/14495)]

我们以RemoteStreamEnvironment为例,进入其execute方法


	@Override
	public JobExecutionResult execute(String jobName) throws ProgramInvocationException {
		StreamGraph streamGraph = getStreamGraph();
		streamGraph.setJobName(jobName);
		transformations.clear();
		return executeRemotely(streamGraph, jarFiles);
	}

首先会根据job信息获取StreamGraph,


	public JobSubmissionResult run(FlinkPlan compiledPlan,
			List<URL> libraries, List<URL> classpaths, ClassLoader classLoader, SavepointRestoreSettings savepointSettings)
			throws ProgramInvocationException {
		JobGraph job = getJobGraph(flinkConfig, compiledPlan, libraries, classpaths, savepointSettings);
		return submitJob(job, classLoader);
	}

计入跟踪代码,进入ClusterClient的run方法,生成JobGraph。



	public JobSubmissionResult run(FlinkPlan compiledPlan,
			List<URL> libraries, List<URL> classpaths, ClassLoader classLoader, SavepointRestoreSettings savepointSettings)
			throws ProgramInvocationException {
		JobGraph job = getJobGraph(flinkConfig, compiledPlan, libraries, classpaths, savepointSettings);
		return submitJob(job, classLoader);
	}


进入ClusterClient子类
org.apache.flink.yarn.YarnClusterClient#submitJob方法,判断下执行的两种模式,是否是detach,分别执行不同的方法

	@Override
	public JobSubmissionResult submitJob(JobGraph jobGraph, ClassLoader classLoader) throws ProgramInvocationException {
		if (isDetached()) {
			if (newlyCreatedCluster) {
				stopAfterJob(jobGraph.getJobID());
			}
			return super.runDetached(jobGraph, classLoader);
		} else {
			return super.run(jobGraph, classLoader);
		}
	}


最后通过akka的客户端发给jobmanager的服务端。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值