Flink作业提交流程


JobGraph是Flink客户端与集群运行时进行交互的、统一的数据结构。

Dispatcher接收JobGraph

客户端将JobGraph提交到集群,由Dispatcher组件负责接收(通过DispatcherLeaderProcess借助DispatcherGateway实现)。

/**
 * 将JobGraph提交到集群,由Dispatcher组件负责接收
 */
@Override
public CompletableFuture<Acknowledge> submitJob(JobGraph jobGraph, Time timeout) {
    log.info("Received JobGraph submission {} ({}).", jobGraph.getJobID(), jobGraph.getName());

    try {
        if (isDuplicateJob(jobGraph.getJobID())) {
            // 检查给定的JobGraph是否已经提交或执行,如果这个JobGraph此前已经提交执行过了,抛异常
            return FutureUtils.completedExceptionally(
                new DuplicateJobSubmissionException(jobGraph.getJobID()));
        } else if (isPartialResourceConfigured(jobGraph)) {
            return FutureUtils.completedExceptionally(
                new JobSubmissionException(jobGraph.getJobID(), "Currently jobs is not supported if parts of the vertices have " +
                                           "resources configured. The limitation will be removed in future versions."));
        } else {
            /**
			 * 核心:异步执行JobGraph
			 */
            return internalSubmitJob(jobGraph);
        }
    } catch (FlinkException e) {
        return FutureUtils.completedExceptionally(e);
    }
}

Dispatcher收到JobGraph后,会异步提交JobGraph。不管结果如何,都会返回给客户端

/**
 * 异步执行JobGraph,返回ACK消息
 */
private CompletableFuture<Acknowledge> internalSubmitJob(JobGraph jobGraph) {
    log.info("Submitting job {} ({}).", jobGraph.getJobID(), jobGraph.getName());

    /**
	 * 核心:异步提交JobGraph,包括对JobGraph的持久化和执行。其中persistAndRunJob()方法包含了提交和执行JobGraph的主要逻辑
	 */
    final CompletableFuture<Acknowledge> persistAndRunFuture = waitForTerminatingJobManager(jobGraph.getJobID(), jobGraph, this::persistAndRunJob)
        .thenApply(ignored -> Acknowledge.get());

    // 异步处理操作:判断是否出现运行异常,并进行后续处理。不管成功还是失败,都会将结果返回给客户端
    return persistAndRunFuture.handleAsync((acknowledge, throwable) -> {
        // Job提交失败,解析错误后抛给客户端
        if (throwable != null) {
            cleanUpJobData(jobGraph.getJobID(), true);

            final Throwable strippedThrowable = ExceptionUtils.stripCompletionException(throwable);
            log.error("Failed to submit job {}.", jobGraph.getJobID(), strippedThrowable);
            throw new CompletionException(
                new JobSubmissionException(jobGraph.getJobID(), "Failed to submit job.", strippedThrowable));
        } else {
            // 返回了ACK,表示JobGraph提交成功,并执行
            return acknowledge;
        }
    }, getRpcService().getExecutor());
}

JobGraph会先被持久化到JobGraphStore中,一旦发生异常就会从JobGraphStore中将JobGraph恢复出来重新执行。

/**
 * JobGraph的持久化和执行操作
 * 先将JobGraph保存到JobGraphStore中,然后再异步执行JobGraph,最后返回执行结果
 */
private CompletableFuture<Void> persistAndRunJob(JobGraph jobGraph) throws Exception {
    /**
	 * 使用JobGraphStore保存JobGraph。
	 * JobGraphStore是专门用于存储、删除JobGraph的接口,一旦发生异常就会从JobGraphStore中将JobGraph恢复出来,重新执行。
	 * 如果基于ZK实现了高可用,JobGraph会被存储到ZooKeeperJobGraphStore中
	 */
    jobGraphWriter.putJobGraph(jobGraph);

    /**
	 * 核心:异步执行JobGraph(包括异步启动JobManagerRunner)
	 */
    final CompletableFuture<Void> runJobFuture = runJob(jobGraph);

    // 异步执行JobGraph的后续处理逻辑
    return runJobFuture.whenComplete(BiConsumerWithException.unchecked((Object ignored, Throwable throwable) -> {
        if (throwable != null) {
            // 如果JobGraph执行异常,就将它从JobGraphStore中移除
            jobGraphWriter.removeJobGraph(jobGraph.getJobID());
        }
    }));
}

将JobGraph保存到JobGraphStore后,再去异步执行JobGraph。核心逻辑是:异步创建并启动JobManagerRunner,它是启动JobMasterRPC服务、初始化JobMaster的核心。

/**
 * 异步执行JobGraph:包括异步创建并启动JobManagerRunner
 */
private CompletableFuture<Void> runJob(JobGraph jobGraph) {
    // 检查映射关系为“JobID:CompletableFuture<JobManagerRunner>”的Map集合中是否有对应的JobGraph,如果有就抛异常(讲道理的话此时不会有)
    Preconditions.checkState(!jobManagerRunnerFutures.containsKey(jobGraph.getJobID()));

    /**
	 * 核心:异步创建JobManagerRunner(基于工厂模式)
	 */
    final CompletableFuture<JobManagerRunner> jobManagerRunnerFuture = createJobManagerRunner(jobGraph);

    // 按照“JobGraph的JobID:CompletableFuture<JobManagerRunner>”的映射关系,保存到Map集合中,防止Job被重复执行
    jobManagerRunnerFutures.put(jobGraph.getJobID(), jobManagerRunnerFuture);

    /**
	 * 核心:异步启动JobManagerRunner(它是启动JobMaster的RPC服务、初始化JobMaster中内部服务的核心),并返回给客户端
	 */
    return jobManagerRunnerFuture
        // CompletableFuture的串行关系:异步创建好JobManagerRunner后,启动JobManagerRunner
        .thenApply(FunctionUtils.uncheckedFunction(this::startJobManagerRunner))
        // 串行:对返回空值的处理
        .thenApply(FunctionUtils.nullFn())
        // 以上几个异步操作执行完毕后,不管是否发生异常,都会异步执行以下逻辑
        .whenCompleteAsync(
        (ignored, throwable) -> {
            if (throwable != null) {
                // 如果上述几个异步操作发生了异常,就从保存JobManagerRunner的Map集合中remove掉这个JobID
                jobManagerRunnerFutures.remove(jobGraph.getJobID());
            }
        },
        getMainThreadExecutor());
}

异步创建JobManagerRunner

Dispatcher在接收到JobGraph后,会异步创建JobManagerRunner(来启动JobMaster)。

/**
 * 基于工厂模式,异步创建JobManagerRunner(可作为LeaderContender去参与争抢LeaderShip)
 */
private CompletableFuture<JobManagerRunner> createJobManagerRunner(JobGraph jobGraph) {
    // 拿到Dispatcher作为RpcEndpoint的RpcService:用来启动RpcServer、获取RPC节点的Gateway
    final RpcService rpcService = getRpcService();

    // 创建带返回值的CompletableFuture
    return CompletableFuture.supplyAsync(
        // 当CompletableFuture创建完成后,会自动的异步执行以下逻辑
        CheckedSupplier.unchecked(() ->
                                  /**
          						   * 异步创建JobManagerRunner:同步构建JobMasterServiceFactory(用于创建JobMasterService),并交给JobManagerRunner的成员变量持有
          						   */
                                  jobManagerRunnerFactory.createJobManagerRunner(
                                      jobGraph,
                                      configuration,
                                      rpcService,
                                      highAvailabilityServices,
                                      heartbeatServices,
                                      jobManagerSharedServices,
                                      new DefaultJobManagerJobMetricGroupFactory(jobManagerMetricGroup),
                                      fatalErrorHandler)),
        rpcService.getExecutor());
}

JobManagerRunner是通过工厂模式创建的,且异步创建JobManagerRunner的过程中会一并构建JobMasterServiceFactory(用来构建JobMasterService),并保存到JobManagerRunner的成员变量中。

/**
 * 异步创建JobManagerRunner
 */
@Override
public JobManagerRunner createJobManagerRunner(
    JobGraph jobGraph,
    Configuration configuration,
    RpcService rpcService,
    HighAvailabilityServices highAvailabilityServices,
    HeartbeatServices heartbeatServices,
    JobManagerSharedServices jobManagerServices,
    JobManagerJobMetricGroupFactory jobManagerJobMetricGroupFactory,
    FatalErrorHandler fatalErrorHandler) throws Exception {

    final JobMasterConfiguration jobMasterConfiguration = JobMasterConfiguration.fromConfiguration(configuration);

    final SlotPoolFactory slotPoolFactory = DefaultSlotPoolFactory.fromConfiguration(configuration);
    final SchedulerFactory schedulerFactory = DefaultSchedulerFactory.fromConfiguration(configuration);
    final SchedulerNGFactory schedulerNGFactory = SchedulerNGFactoryFactory.createSchedulerNGFactory(configuration, jobManagerServices.getRestartStrategyFactory());
    // 通过SPI技术加载配置的ShuffleServiceFactory
    final ShuffleMaster<?> shuffleMaster = ShuffleServiceLoader.loadShuffleServiceFactory(configuration).createShuffleMaster(configuration);

    /**
	 * 专门用来创建JobMasterService(也就是JobMaster)的工厂
	 */
    final JobMasterServiceFactory jobMasterFactory = new DefaultJobMasterServiceFactory(
        jobMasterConfiguration,
        slotPoolFactory,
        schedulerFactory,
        rpcService,
        highAvailabilityServices,
        jobManagerServices,
        heartbeatServices,
        jobManagerJobMetricGroupFactory,
        fatalErrorHandler,
        schedulerNGFactory,
        shuffleMaster);

    // 初始化JobManagerRunnerImpl,并返回
    return new JobManagerRunnerImpl(
        jobGraph,
        // JobMasterServiceFactory(保存在JobManagerRunner的成员变量中)会被用来创建JobMasterService
        jobMasterFactory,
        highAvailabilityServices,
        jobManagerServices.getLibraryCacheManager(),
        jobManagerServices.getScheduledExecutorService(),
        fatalErrorHandler);
}

启动JobManagerRunner

异步创建JobManagerRunner会返回CompletableFuture,接下来就要“串行”执行下一个异步操作:启动JobManagerRunner。

/**
 * 启动JobManagerRunner(它是启动JobMaster的RPC服务、初始化JobMaster中内部服务的核心)
 */
private JobManagerRunner startJobManagerRunner(JobManagerRunner jobManagerRunner) throws Exception {
    
    // 省略部分代码...

    /**
	 * 核心:正式启动JobManagerRunner
	 */
    jobManagerRunner.start();

    return jobManagerRunner;
}

由于JobManagerRunner实现了LeaderContender接口,因此可以作为LeaderContender竞争LeaderShip,从而实现了JobManagerRunner的高可用

/**
 * Dispatcher负责启动JobManagerRunner
 * 由于JobManagerRunner实现了LeaderContender来保证高可用(竞争LeaderShip),因此JobManagerRunner的启动权就交给了LeaderElectionService。
 * JobManagerRunner启动后,就会被LeaderElectionService授予LeaderShip
 */
@Override
public void start() throws Exception {
    try {
        // LeaderElectionService服务启动JobManagerRunner
        leaderElectionService.start(this);
    } catch (Exception e) {
        log.error("Could not start the JobManager because the leader election service did not start.", e);
        throw new Exception("Could not start the leader election service.", e);
    }
}

一旦LeaderElectionService服务为JobManagerRunner选举出了Leader,就会直接将LeaderShip授予给当前LeaderContender,使之成为Leader。JobManagerRunner Leader会去启动JobMaster服务:

/**
 * 当前JobManagerRunner已经被LeaderElectionService服务认定为LeaderContender,
 * 现在要将Leadership授予给它,使其成为Leader
 */
@Override
public void grantLeadership(final UUID leaderSessionID) {
    // 加锁
    synchronized (lock) {
        if (shutdown) {
            log.info("JobManagerRunner already shutdown.");
            return;
        }

        // CompletableFuture的串行
        leadershipOperation = leadershipOperation.thenCompose(
            (ignored) -> {
                synchronized (lock) {
                    /**
					 * 验证Job的调度状态,并启动JobMaster服务。
					 * 只有尚未被执行过的Job才能启动一个新的JobMaster服务来调度执行JobGraph中的Task实例
					 */
                    return verifyJobSchedulingStatusAndStartJobManager(leaderSessionID);
                }
            });

        handleException(leadershipOperation, "Could not start the job manager.");
    }
}

对于一个新的尚未被执行过的JobGraph,JobManagerRunner Leader会启动新的JobMaster服务,以此来调度执行。

JobManagerRunner来启动JobMaster

在初始化JobManagerRunner时就已经创建好了JobMaster

public JobManagerRunnerImpl(
    final JobGraph jobGraph,
    // JobMasterServiceFactory是被用来创建JobMasterService实例的
    final JobMasterServiceFactory jobMasterFactory,
    final HighAvailabilityServices haServices,
    final LibraryCacheManager libraryCacheManager,
    final Executor executor,
    final FatalErrorHandler fatalErrorHandler) throws Exception {

    // 省略部分代码... 

    // 启动JobMaster需要借助JobMasterService,而JobMasterService是基于JobMasterServiceFactory利用工厂模式创建的
    this.jobMasterService = jobMasterFactory.createJobMasterService(jobGraph, this, userCodeLoader);

}

JobMasterServiceFactory创建JobMaster的逻辑如下:

@Override
public JobMaster createJobMasterService(
    JobGraph jobGraph,
    OnCompletionActions jobCompletionActions,
    ClassLoader userCodeClassloader) throws Exception {

    return new JobMaster(
        rpcService,
        jobMasterConfiguration,
        ResourceID.generate(),
        jobGraph,
        haServices,
        slotPoolFactory,
        schedulerFactory,
        jobManagerSharedServices,
        heartbeatServices,
        jobManagerJobMetricGroupFactory,
        jobCompletionActions,
        fatalErrorHandler,
        userCodeClassloader,
        schedulerNGFactory,
        shuffleMaster,
        lookup -> new JobMasterPartitionTrackerImpl(
            jobGraph.getJobID(),
            shuffleMaster,
            lookup
        ));
}

接下来JobManagerRunner就要启动JobMaster:

/**
 * 启动JobMaster服务:获取JobSchedulingStatus状态,判断当前Job是否启动过
 */
private CompletableFuture<Void> verifyJobSchedulingStatusAndStartJobManager(UUID leaderSessionId) {
    // 获取当前Job的调度状态
    final CompletableFuture<JobSchedulingStatus> jobSchedulingStatusFuture = getJobSchedulingStatus();

    // 异步判断Job的调度状态后,串行执行以下操作
    return jobSchedulingStatusFuture.thenCompose(
        jobSchedulingStatus -> {
            if (jobSchedulingStatus == JobSchedulingStatus.DONE) {
                // 如果JobSchedulingStatus为DONE,说明Job已经被其他JobMaster执行过了
                return jobAlreadyDone();
            } else {
                /**
				 * 核心:当前Job尚未执行,创建并启动新的JobMaster服务,让它调度执行JobGraph中的Task实例
				 */
                return startJobMaster(leaderSessionId);
            }
        });
}

在启动JobMaster之前,首先要判断Job的调度状态。如果状态为JobSchedulingStatus.DONE,说明这个Job已经被其他JobMaster调度执行过了。只有当前Job尚未被执行过,才能去启动1个新的JobMaster出来,让它去调度执行。

/**
 * 启动JobMaster:通过JobMaster(内部的调度器组件)来调度执行JobGraph
 */
private CompletionStage<Void> startJobMaster(UUID leaderSessionId) {
    log.info("JobManager runner for job {} ({}) was granted leadership with session id {} at {}.",
             jobGraph.getName(), jobGraph.getJobID(), leaderSessionId, jobMasterService.getAddress());

    try {
        // 将JobGraph的JobID注册到RunningJobsRegistry(运行Job的注册表,用于检查作业是否需要被运行),表示JobGraph将被执行
        runningJobsRegistry.setJobRunning(jobGraph.getJobID());
    } catch (IOException e) {
        return FutureUtils.completedExceptionally(
            new FlinkException(
                String.format("Failed to set the job %s to running in the running jobs registry.", jobGraph.getJobID()),
                e));
    }

    final CompletableFuture<Acknowledge> startFuture;
    try {
        /**
		 * 异步启动JobMaster:JobMasterService的实现子类就是JobMaster,
		 * 在JobManagerRunnerImpl的构造方法中已经通过JobMasterServiceFactory创建好了JobMasterService
		 */
        startFuture = jobMasterService.start(new JobMasterId(leaderSessionId));
    } catch (Exception e) {
        return FutureUtils.completedExceptionally(new FlinkException("Failed to start the JobMaster.", e));
    }

    final CompletableFuture<JobMasterGateway> currentLeaderGatewayFuture = leaderGatewayFuture;
    /**
	 * JobMaster异步启动成功后,执行的串行操作:JobManagerRunner需要确认“已接受LeaderShip”的信息,返回ACK信息
	 */
    return startFuture.thenAcceptAsync(
        (Acknowledge ack) -> confirmLeaderSessionIdIfStillLeader(
            leaderSessionId,
            jobMasterService.getAddress(),
            currentLeaderGatewayFuture),
        executor);
}

当JobMaster启动完成后,会返回ACK消息给JobManagerRunner,并串行执行“确认已接受LeaderShip”的操作。表示当前启动JobMaster的JobManagerRunner作为Leader,已经具备了LeaderShip。

/**
 * 确认当前JobManagerRunner是否已经具备了LeaderShip
 */
private void confirmLeaderSessionIdIfStillLeader(
    UUID leaderSessionId,
    String leaderAddress,
    CompletableFuture<JobMasterGateway> currentLeaderGatewayFuture) {

    // 由LeaderElectionService服务出面判断当前的JobManagerRunner Leader是否已经具备了LeaderShip
    if (leaderElectionService.hasLeadership(leaderSessionId)) {
        currentLeaderGatewayFuture.complete(jobMasterService.getGateway());
        // LeaderElectionService服务确认LeaderShip已经被接受了
        leaderElectionService.confirmLeadership(leaderSessionId, leaderAddress);
    } else {
        log.debug("Ignoring confirmation of leader session id because {} is no longer the leader.", getDescription());
    }
}

接下来我们着重看JobMaster的启动流程!

/**
 * 异步启动JobMaster服务
 */
public CompletableFuture<Acknowledge> start(final JobMasterId newJobMasterId) throws Exception {
    /**
     * 启动JobMaster:由于JobMaster实现了FencedRpcEndpoint,是一个RPC节点。启动成功后就能和其他RPC节点进行RPC通信了
     */
    start();

    /**
     * JobMaster(RPC节点)启动成功后,会在主线程调度、执行Job(不检查fencing令牌),如果作业启动成功就会将ACK信息返回给Dispatcher服务
     */
    return callAsyncWithoutFencing(() -> startJobExecution(newJobMasterId), RpcUtils.INF_TIMEOUT);
}

JobMaster的启动分2部分:启动JobMaster内部服务和组件、调度执行JobGraph

启动JobMaster对应的PRC服务

由于JobMaster继承了FencedRpcEndpoint,因此它就是一个RPC节点。它又实现了JobMasterGateway接口,因此JobMaster启动成功后就能和其他RPC节点进行RPC通信了。因此,启动JobMaster就是启动JobMaster对应的RPC服务。

public final void start() {
    // 启动RpcEndpoint中的RpcServer,底层调用的是AkkaInvocationHandler#start()方法
    rpcServer.start();
}

等JobMaster对应的RPC服务启动成功后,接着就是要启动JobMaster的内部服务、组件(包括:心跳服务、SlotPool和调度器),由调度器负责调度执行JobGraph。

/**
 * 初始化JobMaster的内部服务(包括:心跳服务、SlotPool和调度器),建立JobMaster和ResourceManager之间的RPC连接,
 * 然后让SlotPool向ResourceManager申请JobMaster所需的Slot,最后由调度器开始调度执行JobGraph
 */
private Acknowledge startJobExecution(JobMasterId newJobMasterId) throws Exception {

    // 检查当前RPC服务的主线程是否正常运行(因为要在JobMaster的主线程内开始Job的调度执行),不正常就抛异常
    validateRunsInMainThread();

    // JobMasterId不能为null
    checkNotNull(newJobMasterId, "The new JobMasterId must not be null.");

    // 获取Fencing Token,和JobMasterId进行比对
    if (Objects.equals(getFencingToken(), newJobMasterId)) {
        log.info("Already started the job execution with JobMasterId {}.", newJobMasterId);

        // 如果两者相同,说明这个JobMasterId对应的Job已经启动了。此时直接返回ACK
        return Acknowledge.get();
    }

    // 如果Fencing Token和JobMasterId不相等,说明Job尚未启动。那就将JobMasterId赋值给Fencing Token(用于RPC通信过程中各个组件之间的Token认证)
    setNewFencingToken(newJobMasterId);

    /**
	 * 核心:
	 * 		1.启动JobMaster的内部服务、组件(HeartbeatServices、SlotPool和调度器)
	 * 		2.让JobMaster和ResourceManager建立RPC连接,从而使SlotPool能够为JobMaster申请所需的Slot
	 */
    startJobMasterServices();

    log.info("Starting execution of job {} ({}) under job master id {}.", jobGraph.getName(), jobGraph.getJobID(), newJobMasterId);

    /**
	 * JobMaster已经得到了所需的Slot,现在开始由调度器开始调度执行
	 */
    resetAndStartScheduler();

    // 此时Job已经启动了,返回ACK
    return Acknowledge.get();
}

启动JobMaster的内部服务、组件

启动JobMaster的内部服务、组件,包括:心跳服务、SlotPool组件、Scheduler调度器。然后JobMaster会和ResourceManager建立RPC连接,并且SlotPool组件会向ResourceManager申请JobMaster所需的Slot,最后由调度器负责执行JobGraph。

/**
 * 启动JobMaster的内部服务、组件(HeartbeatServices、SlotPool和调度器),并跟ResourceManager建立RPC连接。
 * 成功后JobMaster会让SlotPool能够为JobMaster向ResourceManager申请所需的Slot
 */
private void startJobMasterServices() throws Exception {
    // 启动心跳服务:JobMaster会向TaskManager和ResourceManager发送心跳
    startHeartbeatServices();

    // 启动SlotPool:负责向ResourceManager申请Slot、管理“分配给JobManager“的Slot
    slotPool.start(getFencingToken(), getAddress(), getMainThreadExecutor());
    // 启动JobMaster中的Scheduler调度器:负责Task的调度、执行
    scheduler.start(getMainThreadExecutor());


    /**
	 * 创建JobMaster和ResourceManager之间的RPC连接,成功后JobMaster会让SlotPool向ResourceManager发送SlotRequest申请Slot
	 */
    reconnectToResourceManager(new FlinkException("Starting JobMaster component."));

    /**
	 * 使用LeaderRetrievalService服务,通过注册LeaderRetrievalListener,监听ResourceManager Leader的变更
	 */
    resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());
}

开始JobGraph的调度执行

在初始化JobMaster时就已经通过工厂模式创建好了SchedulerNG实例,且该过程中会一并将JobGraph转换成ExecutionGraph,而且JobMaster和ResourceManager之间的RPC连接也已经建立好了,作业运行所需的Slot资源也都已经“到位”了。现在就是要(在JobMaster的主线程)使用SchedulerNG去调度执行ExecutionGraph结构。

/**
 * 在初始化JobMaster时,就已经通过工厂模式创建好了SchedulerNG实例。
 * 构建SchedulerNG过程中会同步将JobGraph转换成ExecutionGraph,现在就是要去调度执行ExecutionGraph
 */
private void resetAndStartScheduler() throws Exception {
    
    // 省略部分代码...

    /**
    * 调度器正式开始调度执行ExecutionGraph(在构建SchedulerNG时会同步将JobGraph转换成ExecutionGraph结构)
    */
    schedulerAssignedFuture.thenRun(this::startScheduling);
}

JobMaster会初始化JobStatusListener,并注册到SchedulerNG中。以便能监听到“作业状态更改”,并及时通知给JobMaster

/**
 * 正式开始调度、执行ExecutionGraph中的节点,SchedulerNG通过JobStatusListener将Job状态及时通知给JobMaster
 */
private void startScheduling() {
    checkState(jobStatusListener == null);
    // 负责监听“作业状态更改”的监听器--JobStatusListener
    jobStatusListener = new JobManagerJobStatusListener();
    // JobMaster将JobStatusListener注册到SchedulerNG中,一旦Job状态发生变更,就能及时通知JobMaster
    schedulerNG.registerJobStatusListener(jobStatusListener);

    /**
    * 核心:SchedulerNG正式开始调度执行ExecutionGraph结构
    */
    schedulerNG.startScheduling();
}

接下来就是正式调度执行ExecutionGraph

/**
 * 开始调度执行ExecutionGraph结构
 */
@Override
public final void startScheduling() {
    mainThreadExecutor.assertRunningInMainThread();
    registerJobMetrics();
    // 核心:开始调度执行ExecutionGraph结构
    startSchedulingInternal();
}

DefaultScheduler首先会做前期准备,如:将ExecutionGraph的状态设为RUNNING、将调度器模式改为SchedulerNG。然后选择适合Streaming类型的调度策略–EagerSchedulingStrategy,开始调度执行ExecutionGraph结构

/**
 * 基于调度策略,正式开始调度执行ExecutionGraph结构。
 * 在JobMaster初始化时会通过工厂模式创建好SchedulerNG实例,构造SchedulerNG实例的构造方法中会通过工厂模式创建好SchedulingStrategy
 */
@Override
protected void startSchedulingInternal() {
    log.info("Starting scheduling with scheduling strategy [{}]", schedulingStrategy.getClass().getName());
    // 前期准备:将ExecutionGraph的状态设为RUNNING、将调度器模式改为SchedulerNG
    prepareExecutionGraphForNgScheduling();
    // 使用(适合Streaming类型的)调度策略--EagerSchedulingStrategy,开始调度执行ExecutionGraph结构
    schedulingStrategy.startScheduling();
}

调度策略会去分配Slot并调度执行ExecutionGraph中的ExecutionVertex子节点对应的Execution

/**
 * 使用适合Streaming类型的调度策略,正式调度执行ExecutionGraph结构
 */
@Override
public void startScheduling() {
    // 分配Slot资源,并调度执行ExecutionGraph结构中的Execution
    allocateSlotsAndDeploy(SchedulingStrategyUtils.getAllVertexIdsFromTopology(schedulingTopology));
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值