Flink源码-Execution的生成

最新推荐文章于 2024-09-15 01:19:00 发布

源码挖掘机

最新推荐文章于 2024-09-15 01:19:00 发布

阅读量943

点赞数 18

分类专栏： flink 文章标签： flink 大数据

本文链接：https://blog.csdn.net/bigdatakenan/article/details/134829579

版权

flink 专栏收录该内容

23 篇文章 3 订阅

订阅专栏

上一节我们分析到了在jobmaster启动后，会将JobGraph转换成ExecutionGraph,同时也会将checkpoint相关配置传给executionGraph,并且还创建了checkpointCoordinator。下面我们接着上节的地方继续往下分析。

1.start

jobmaster启动后在选主完成后会调用它的start方法，start方法中会调用startJobExecution方法，开始job的执行。

2.startJobExecution

然后再startJobExecution方法中又调用了其自身的两个方法：

startJobMasterService：在这里才是真正启动jobMaster的服务

resetAndScheduler：重置和启动调度器

private Acknowledge startJobExecution(JobMasterId newJobMasterId) throws Exception {

		validateRunsInMainThread();

		checkNotNull(newJobMasterId, "The new JobMasterId must not be null.");

		if (Objects.equals(getFencingToken(), newJobMasterId)) {
			log.info("Already started the job execution with JobMasterId {}.", newJobMasterId);

			return Acknowledge.get();
		}

		setNewFencingToken(newJobMasterId);

		/*TODO 真正启动JobMaster服务*/
		startJobMasterServices();

		log.info("Starting execution of job {} ({}) under job master id {}.", jobGraph.getName(), jobGraph.getJobID(), newJobMasterId);

		/*TODO 重置和启动调度器*/
		resetAndStartScheduler();

		return Acknowledge.get();
	}

3.startJobMasterService

在这个方法里会做以下几件事：

1.开启和TaskManager和ResourceManager的心跳服务

2.启动slotPool,这个slotPool是jobmaster这边负责管理slot资源的，里面保存了该job持有的资源，如果slot资源不够时，slotPool会向ResourceManager去申请slot，这个ResourceManager是flink集群自身的ResourceManager，而非yarn的ResourceManager，当ResourceManager收到slotPool发过来的slot申请时，会去向TaskManager申请空闲的slot

3.建立与resourceManager的连接，这个resoureManager就是第二点中讲到的resourceManager

private void startJobMasterServices() throws Exception {
		/*TODO 启动心跳服务：taskmanager、resourcemanager*/
		startHeartbeatServices();

		// start the slot pool make sure the slot pool now accepts messages for this leader
		/*TODO 启动 slotpool*/
		slotPool.start(getFencingToken(), getAddress(), getMainThreadExecutor());

		//TODO: Remove once the ZooKeeperLeaderRetrieval returns the stored address upon start
		// try to reconnect to previously known leader
		reconnectToResourceManager(new FlinkException("Starting JobMaster component."));

		// job is ready to go, try to establish connection with resource manager
		//   - activate leader retrieval for the resource manager
		//   - on notification of the leader, the connection will be established and
		//     the slot pool will start requesting slots
		/*TODO 与ResourceManager建立连接，slotpool开始请求资源*/
		resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());
	}

4.resetAndStartScheduler

这个方法中为要调度的job创建了调度器并给该job进行了分配，然后调用startScheduling方法进行调度

private void resetAndStartScheduler() throws Exception {
		validateRunsInMainThread();

		final CompletableFuture<Void> schedulerAssignedFuture;

		if (schedulerNG.requestJobStatus() == JobStatus.CREATED) {
			schedulerAssignedFuture = CompletableFuture.completedFuture(null);
			schedulerNG.setMainThreadExecutor(getMainThreadExecutor());
		} else {
			suspendAndClearSchedulerFields(new FlinkException("ExecutionGraph is being reset in order to be rescheduled."));
			final JobManagerJobMetricGroup newJobManagerJobMetricGroup = jobMetricGroupFactory.create(jobGraph);

            //创建调度器，并且为该job分配调度器
			final SchedulerNG newScheduler = createScheduler(executionDeploymentTracker, newJobManagerJobMetricGroup);

			schedulerAssignedFuture = schedulerNG.getTerminationFuture().handle(
				(ignored, throwable) -> {
					newScheduler.setMainThreadExecutor(getMainThreadExecutor());
					assignScheduler(newScheduler, newJobManagerJobMetricGroup);
					return null;
				}
			);
		}
    
 //调用startScheduling方法对job进行调度，异步获取调度的结果
		FutureUtils.assertNoException(schedulerAssignedFuture.thenRun(this::startScheduling));
	}

在startScheduling方法中为该job注册了job状态的监听器后然后调用startScheduling方法进行调度

在startScheduling方法中判断了当前执行的线程是否为主线程，注册作业监控，开启算子协调器，然后调用startstartSchedulingInternal方法开始调度

然后再startSchedulingInternal方法中，调用prepareExecutionGraphForNgScheduling方法设置executionGraph的一些属性，比如将job状态从created改为running,然后调用scheduling.startScheduling方法开始调度，他有几个实现类，这里我们选择PipelinedRegionSchedulingStrategy,新版本中流式作业默认的调度策略就是PipelinedRegionSchedulingStrategy

5.startScheduling

schedulingTopology就是executionVertex的拓扑结构，首先获取到所有的pipelineRegions,然后获取source节点，调用maybeScheduleRegions方法进行调度

6.maybeScheduleRegions

获取对source节点的拓扑结构进行排序然后循环遍历set中的sourceRegion，调用maybeScheduleRegion方法进行调度

private void maybeScheduleRegions(final Set<SchedulingPipelinedRegion> regions) {
		final List<SchedulingPipelinedRegion> regionsSorted =
			SchedulingStrategyUtils.sortPipelinedRegionsInTopologicalOrder(schedulingTopology, regions);
		for (SchedulingPipelinedRegion region : regionsSorted) {
			maybeScheduleRegion(region);
		}
	}

7.maybeScheduleRegion

在这里就是获取slot准备开始部署对应的executionVertex了

	private void maybeScheduleRegion(final SchedulingPipelinedRegion region) {
		if (!areRegionInputsAllConsumable(region)) {
			return;
		}

		checkState(areRegionVerticesAllInCreatedState(region), "BUG: trying to schedule a region which is not in CREATED state");

		final List<ExecutionVertexDeploymentOption> vertexDeploymentOptions =
			SchedulingStrategyUtils.createExecutionVertexDeploymentOptions(
				regionVerticesSorted.get(region),
				id -> deploymentOption);
		schedulerOperations.allocateSlotsAndDeploy(vertexDeploymentOptions);
	}

下面就是如何分配资源的代码，比较繁琐，我这里简要列一下代码的互相调用，有兴趣的兄弟可以下去自己看看：

schedulerOperations # allocateSlotsAndDeploy

defaultScheduler # allocateSlotsAndDeploy

defaultScheduler # waitForAllSlotsAndDeploy

defaultScheduler # deployAll

defaultScheduler # deployOrHandleError

defaultScheduler # deployTaskSafe

DefaultExecutionVertexOperations # deploy

ExecutionVertex # deploy

Execution # deploy

public void deploy() throws JobException {
		assertRunningInJobMasterMainThread();

        //分配slot
		final LogicalSlot slot  = assignedResource;

		checkNotNull(slot, "In order to deploy the execution we first have to assign a resource via tryAssignResource.");

		// Check if the TaskManager died in the meantime
		// This only speeds up the response to TaskManagers failing concurrently to deployments.
		// The more general check is the rpcTimeout of the deployment call
		if (!slot.isAlive()) {
			throw new JobException("Target slot (TaskManager) for deployment is no longer alive.");
		}

		// make sure exactly one deployment call happens from the correct state
		// note: the transition from CREATED to DEPLOYING is for testing purposes only
		ExecutionState previous = this.state;
        
        //更新execution的状态为deploying
		if (previous == SCHEDULED || previous == CREATED) {
			if (!transitionState(previous, DEPLOYING)) {
				// race condition, someone else beat us to the deploying call.
				// this should actually not happen and indicates a race somewhere else
				throw new IllegalStateException("Cannot deploy task: Concurrent deployment call race.");
			}
		}
		else {
			// vertex may have been cancelled, or it was already scheduled
			throw new IllegalStateException("The vertex must be in CREATED or SCHEDULED state to be deployed. Found state " + previous);
		}

		if (this != slot.getPayload()) {
			throw new IllegalStateException(
				String.format("The execution %s has not been assigned to the assigned slot.", this));
		}

		try {

			// race double check, did we fail/cancel and do we need to release the slot?
			if (this.state != DEPLOYING) {
				slot.releaseSlot(new FlinkException("Actual state of execution " + this + " (" + state + ") does not match expected state DEPLOYING."));
				return;
			}

			LOG.info("Deploying {} (attempt #{}) with attempt id {} to {} with allocation id {}", vertex.getTaskNameWithSubtaskIndex(),
				attemptNumber, vertex.getCurrentExecutionAttempt().getAttemptId(), getAssignedResourceLocation(), slot.getAllocationId());

			if (taskRestore != null) {
				checkState(taskRestore.getTaskStateSnapshot().getSubtaskStateMappings().stream().allMatch(entry ->
					entry.getValue().getInputRescalingDescriptor().equals(InflightDataRescalingDescriptor.NO_RESCALE) &&
					entry.getValue().getOutputRescalingDescriptor().equals(InflightDataRescalingDescriptor.NO_RESCALE)),
					"Rescaling from unaligned checkpoint is not yet supported.");
			}

			// 将 IntermediateResultPartition 转化成 ResultPartition
			// 将 ExecutionEdge 转成 InputChannelDeploymentDescriptor（最终会在执行时转化成InputGate）
			final TaskDeploymentDescriptor deployment = TaskDeploymentDescriptorFactory
				.fromExecutionVertex(vertex, attemptNumber)
				.createDeploymentDescriptor(
					slot.getAllocationId(),
					slot.getPhysicalSlotNumber(),
					taskRestore,
					producedPartitions.values());

			// null taskRestore to let it be GC'ed
			taskRestore = null;

			final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();

			final ComponentMainThreadExecutor jobMasterMainThreadExecutor =
				vertex.getExecutionGraph().getJobMasterMainThreadExecutor();

			getVertex().notifyPendingDeployment(this);
			// We run the submission in the future executor so that the serialization of large TDDs does not block
			// the main thread and sync back to the main thread once submission is completed.
            //提交对应的task
			CompletableFuture.supplyAsync(() -> taskManagerGateway.submitTask(deployment, rpcTimeout), executor)
				.thenCompose(Function.identity())
				.whenCompleteAsync(
					(ack, failure) -> {
						if (failure == null) {
							vertex.notifyCompletedDeployment(this);
						} else {
							if (failure instanceof TimeoutException) {
								String taskname = vertex.getTaskNameWithSubtaskIndex() + " (" + attemptId + ')';

								markFailed(new Exception(
									"Cannot deploy task " + taskname + " - TaskManager (" + getAssignedResourceLocation()
										+ ") not responding after a rpcTimeout of " + rpcTimeout, failure));
							} else {
								markFailed(failure);
							}
						}
					},
					jobMasterMainThreadExecutor);

		}
		catch (Throwable t) {
			markFailed(t);

			if (isLegacyScheduling()) {
				ExceptionUtils.rethrow(t);
			}
		}
	}

到这里我们的job就已经部署成功了，这里的execution也就是我们常说的物理执行图，接下来就是要提交对应的task。