005.flink源码分析-jobmanager的启动

最新推荐文章于 2024-03-10 03:28:39 发布

colossus_bigdata

最新推荐文章于 2024-03-10 03:28:39 发布

阅读量785

点赞数

分类专栏： flink源码分析文章标签： flink java big data

本文链接：https://blog.csdn.net/englishsname/article/details/124592538

版权

flink源码分析专栏收录该内容

6 篇文章 3 订阅

订阅专栏

jobmanager概览

JobManager 是 Flink 集群的主节点，它包含几大重要的组件：
1、ResourceManager
Flink的集群资源管理器，只有一个，关于slot的管理和申请等工作，都由他负责。
2、Dispatcher
负责接收用户提交的 JobGragh, 然后启动一个 JobMaster，类似于 YARN 集群中的 AppMaster角色，类似于 Spark Job 中的 Driver 角色。
3、WebMonitorEndpoint
里面维护了很多很多的Handler，如果客户端通过 flink run 的方式来提交一个 job 到 flink集群，最终，是由 WebMonitorEndpoint 来接收，并且决定使用哪一个 Handler 来执行处理。
4、JobMaster/JobManager
负责一个具体的 Job 的执行，在一个集群中，可能会有多个 JobManager 同时执行，类似于 YARN集群中的AppMaster 角色，类似于 Spark Job 中的 Driver 角色。

关于 JobManager 的区分

1、如果我们将 FLink 是主从架构，那么这个 JobManager 就是指主节点，它包含上面讲述的三种角色
2、如果我们将 Job 提交到 YARN 运行的时候，事实上，可以通过启动一个小集群的方式来运行，这个小集群的主节点也是JobManager，你把job提交到 YARN 运行的时候，还有一种模式：job、sessioin, Container（JobManager）Container（StreamTask）

总之，Flink 集群的主节点内部运行着：ResourceManager 和Dispatcher，当 client 提交一个 job 到集群运行的时候（客户端会把该 Job 构建成一个 JobGragh 对象），Dispatcher 负责拉起JobManager/JobMaster 来负责这个 Job 内部的 Task 的执行，执行Task所需要的资源，JobManager 向 ResourceManager 申请。

根据上一篇的分析，JobManager的启动主类：StandaloneSessionClusterEntrypoint：

// 入口
StandaloneSessionClusterEntrypoint.main()
ClusterEntrypoint.runClusterEntrypoint(entrypoint);
clusterEntrypoint.startCluster();
runCluster(configuration, pluginManager);
// 第一步：初始化各种服务（7个服务）
initializeServices(configuration, pluginManager);
// 创建 DispatcherResourceManagerComponentFactory, 初始化各种组件的
工厂实例
// 其实内部包含了三个重要的成员变量：
// 创建 ResourceManager 的工厂实例
// 创建 Dispatcher 的工厂实例
// 创建 WebMonitorEndpoint 的工厂实例
createDispatcherResourceManagerComponentFactory(configuration);
// 创建 集群运行需要的一些组件：Dispatcher， ResourceManager 等
// 创建 ResourceManager
// 创建 Dispatcher
// 创建 WebMonitorEndpoint
clusterComponent =
dispatcherResourceManagerComponentFactory.create(...)

第一步 initializeServices() 中做了很多服务组件的初始化：

// 初始化和启动 AkkaRpcService，内部其实包装了一个 ActorSystem
commonRpcService = AkkaRpcServiceUtils.createRemoteRpcService(...)
// 初始化一个负责 IO 的线程池
ioExecutor = Executors.newFixedThreadPool(...)
// 初始化 HA 服务组件，负责 HA 服务的是：ZooKeeperHaServices
haServices = createHaServices(configuration, ioExecutor);
// 初始化 BlobServer 服务端
blobServer = new BlobServer(configuration, haServices.createBlobStore());
blobServer.start();
// 初始化心跳服务组件, heartbeatServices = HeartbeatServices
heartbeatServices = createHeartbeatServices(configuration);
// 初始化一个用来存储 ExecutionGraph 的 Store, 实现是：
FileArchivedExecutionGraphStore
archivedExecutionGraphStore = createSerializableExecutionGraphStore(...)

第二步 createDispatcherResourceManagerComponentFactory(configuration) 中负责初始化了很多组件的工厂实例：

1、DispatcherRunnerFactory，默认实现：DefaultDispatcherRunnerFactory
2、ResourceManagerFactory，默认实现：StandaloneResourceManagerFactory
3、RestEndpointFactory，默认实现：SessionRestEndpointFactory

DispatcherRunnerFactory 内部也实例化了一个SessionDispatcherLeaderProcessFactoryFactory 组件。
创建三个工厂的代码：

final DispatcherResourceManagerComponentFactory dispatcherResourceManagerComponentFactory = createDispatcherResourceManagerComponentFactory(configuration);

进入createDispatcherResourceManagerComponentFactory方法

protected DefaultDispatcherResourceManagerComponentFactory createDispatcherResourceManagerComponentFactory(Configuration configuration) {
   return DefaultDispatcherResourceManagerComponentFactory.createSessionComponentFactory(StandaloneResourceManagerFactory.getInstance());
}
//调用下面的方法
public static DefaultDispatcherResourceManagerComponentFactory createSessionComponentFactory(
      ResourceManagerFactory<?> resourceManagerFactory) {
   return new DefaultDispatcherResourceManagerComponentFactory(
      DefaultDispatcherRunnerFactory.createSessionRunner(SessionDispatcherFactory.INSTANCE),//创建SessionDispatcherFactory
      resourceManagerFactory,//创建resourceManagerFactory
      SessionRestEndpointFactory.INSTANCE);//创建SessionRestEndpointFactory
}

这样三个工厂就创建出来了
第三步 dispatcherResourceManagerComponentFactory.create(…) 中主要去创建三个重要的组件：

clusterComponent = dispatcherResourceManagerComponentFactory.create(
   configuration,
   ioExecutor,
   commonRpcService,
   haServices,
   blobServer,
   heartbeatServices,
   metricRegistry,
   archivedExecutionGraphStore,
   new RpcMetricQueryServiceRetriever(metricRegistry.getMetricQueryServiceRpcService()),
   this);

进入create方法：

webMonitorEndpoint的创建

首先创建第一个组件并启动：webMonitorEndpoint 组件。

/**
 * 如果用户通过flink run提交了一个job，那么最后是由WebMonitorEndpoint中的jobSubmitHandler来处理，处理完成后，
 * 交给dispatcher处理。创建webMonitorEndpoint，如果时yarn模式，则创建的是MiniDispatcherRestEndpoint
 */
webMonitorEndpoint = restEndpointFactory.createRestEndpoint(
   configuration,
   dispatcherGatewayRetriever,
   resourceManagerGatewayRetriever,
   blobServer,
   executor,
   metricFetcher,
   highAvailabilityServices.getClusterRestEndpointLeaderElectionService(),
   fatalErrorHandler);

log.debug("Starting Dispatcher REST endpoint.");
webMonitorEndpoint.start();

具体创建WebMonitorEndpoint的代码：

@Override
public WebMonitorEndpoint<DispatcherGateway> createRestEndpoint(
      Configuration configuration,
      LeaderGatewayRetriever<DispatcherGateway> dispatcherGatewayRetriever,
      LeaderGatewayRetriever<ResourceManagerGateway> resourceManagerGatewayRetriever,
      TransientBlobService transientBlobService,
      ScheduledExecutorService executor,
      MetricFetcher metricFetcher,
      LeaderElectionService leaderElectionService,
      FatalErrorHandler fatalErrorHandler) throws Exception {
   final RestHandlerConfiguration restHandlerConfiguration = RestHandlerConfiguration.fromConfiguration(configuration);

   return new DispatcherRestEndpoint(
      RestServerEndpointConfiguration.fromConfiguration(configuration),
      dispatcherGatewayRetriever,
      configuration,
      restHandlerConfiguration,
      resourceManagerGatewayRetriever,
      transientBlobService,
      executor,
      metricFetcher,
      leaderElectionService,
      RestEndpointFactory.createExecutionGraphCache(restHandlerConfiguration),
      fatalErrorHandler);
}

再来看webMonitorEndpoint.start();start方法封装在RestServerEndpoint中：webMonitorEndpoint继承自RestServerEndpoint，实际的启动方法start调用的是父类的方法：

/**
 * Starts this REST server endpoint.
 *
 * @throws Exception if we cannot start the RestServerEndpoint
 */
public final void start() throws Exception {
   synchronized (lock) {
      Preconditions.checkState(state == State.CREATED, "The RestServerEndpoint cannot be restarted.");

      log.info("Starting rest endpoint.");
      //初始化一个路由器
      final Router router = new Router();
      final CompletableFuture<String> restAddressFuture = new CompletableFuture<>();
      //初始化各种handlers
      handlers = initializeHandlers(restAddressFuture);

      /* sort the handlers such that they are ordered the following:
       * /jobs
       * /jobs/overview
       * /jobs/:jobid
       * /jobs/:jobid/config
       * /:*
       */
      Collections.sort(
         handlers,
         RestHandlerUrlComparator.INSTANCE);
      //检查唯一性
      checkAllEndpointsAndHandlersAreUnique(handlers);
      handlers.forEach(handler -> registerHandler(router, handler, log));//注册handler到router里面
      //netty例行程序
      ChannelInitializer<SocketChannel> initializer = new ChannelInitializer<SocketChannel>() {

         @Override
         protected void initChannel(SocketChannel ch) {
            RouterHandler handler = new RouterHandler(router, responseHeaders);

            // SSL should be the first handler in the pipeline
            if (isHttpsEnabled()) {
               ch.pipeline().addLast("ssl",
                  new RedirectingSslHandler(restAddress, restAddressFuture, sslHandlerFactory));
            }

            ch.pipeline()
               .addLast(new HttpServerCodec())
               .addLast(new FileUploadHandler(uploadDir))
               .addLast(new FlinkHttpObjectAggregator(maxContentLength, responseHeaders))
               .addLast(new ChunkedWriteHandler())
               .addLast(handler.getName(), handler)
               .addLast(new PipelineErrorHandler(log, responseHeaders));
         }
      };

      NioEventLoopGroup bossGroup = new NioEventLoopGroup(1, new ExecutorThreadFactory("flink-rest-server-netty-boss"));
      NioEventLoopGroup workerGroup = new NioEventLoopGroup(0, new ExecutorThreadFactory("flink-rest-server-netty-worker"));

      bootstrap = new ServerBootstrap();
      bootstrap
         .group(bossGroup, workerGroup)
         .channel(NioServerSocketChannel.class)
         .childHandler(initializer);

      Iterator<Integer> portsIterator;
      try {
         portsIterator = NetUtils.getPortRangeFromString(restBindPortRange);
      } catch (IllegalConfigurationException e) {
         throw e;
      } catch (Exception e) {
         throw new IllegalArgumentException("Invalid port range definition: " + restBindPortRange);
      }

      int chosenPort = 0;
      while (portsIterator.hasNext()) {
         try {
            chosenPort = portsIterator.next();
            final ChannelFuture channel;
            if (restBindAddress == null) {
               channel = bootstrap.bind(chosenPort);
            } else {
               channel = bootstrap.bind(restBindAddress, chosenPort);
            }
            serverChannel = channel.syncUninterruptibly().channel();
            break;
         } catch (final Exception e) {
            // continue if the exception is due to the port being in use, fail early otherwise
            if (!(e instanceof org.jboss.netty.channel.ChannelException || e instanceof java.net.BindException)) {
               throw e;
            }
         }
      }

      if (serverChannel == null) {
         throw new BindException("Could not start rest endpoint on any port in port range " + restBindPortRange);
      }

      log.debug("Binding rest endpoint to {}:{}.", restBindAddress, chosenPort);

      final InetSocketAddress bindAddress = (InetSocketAddress) serverChannel.localAddress();
      final String advertisedAddress;
      if (bindAddress.getAddress().isAnyLocalAddress()) {
         advertisedAddress = this.restAddress;
      } else {
         advertisedAddress = bindAddress.getAddress().getHostAddress();
      }
      final int port = bindAddress.getPort();

      log.info("Rest endpoint listening at {}:{}", advertisedAddress, port);

      restBaseUrl = new URL(determineProtocol(), advertisedAddress, port, "").toString();

      restAddressFuture.complete(restBaseUrl);
      //启动完成
      state = State.RUNNING;
      //调用子类的方法，不同的实现会不一样
      startInternal();
   }

最后调用子类的startInternal();方法：该方法定义在父类自身，是个抽象方法，由子类去实现，父类的方法调用这个抽象方法：

@Override
public void startInternal() throws Exception {
    //选举服务，让当前竞选者this参与选举
   leaderElectionService.start(this);
   //开启一个定时清理ExecutionGraphCache的任务
   startExecutionGraphCacheCleanupTask();

   if (hasWebUI) {
      log.info("Web frontend listening at {}.", getRestBaseUrl());
   }
}

这里，选举的目的是将server端的端口号写入到zk，因为server端的端口地址是随机的。选举有两方面作用：1是组件本身HA的需要，2是服务发现的需要，客户端需要到zk里获取服务端的地址。，进入这个方法：leaderElectionService.start(this);

@Override
public final void start(LeaderContender contender) throws Exception {
   checkNotNull(contender, "Contender must not be null.");
   Preconditions.checkState(leaderContender == null, "Contender was already set.");

   synchronized (lock) {
      leaderContender = contender;
      leaderElectionDriver = leaderElectionDriverFactory.createLeaderElectionDriver(
         this, new LeaderElectionFatalErrorHandler(), leaderContender.getDescription());
      LOG.info("Starting DefaultLeaderElectionService with {}.", leaderElectionDriver);

      running = true;
   }
}

进入createLeaderElectionDriver

@Override
public ZooKeeperLeaderElectionDriver createLeaderElectionDriver(
      LeaderElectionEventHandler leaderEventHandler,
      FatalErrorHandler fatalErrorHandler,
      String leaderContenderDescription) throws Exception {
   return new ZooKeeperLeaderElectionDriver(
      client, latchPath, leaderPath, leaderEventHandler, fatalErrorHandler, leaderContenderDescription);
}

返回一个ZooKeeperLeaderElectionDriver

public ZooKeeperLeaderElectionDriver(
      CuratorFramework client,
      String latchPath,
      String leaderPath,
      LeaderElectionEventHandler leaderElectionEventHandler,
      FatalErrorHandler fatalErrorHandler,
      String leaderContenderDescription) throws Exception {
   this.client = checkNotNull(client);
   this.leaderPath = checkNotNull(leaderPath);
   this.leaderElectionEventHandler = checkNotNull(leaderElectionEventHandler);
   this.fatalErrorHandler = checkNotNull(fatalErrorHandler);
   this.leaderContenderDescription = checkNotNull(leaderContenderDescription);

   leaderLatch = new LeaderLatch(client, checkNotNull(latchPath));
   cache = new NodeCache(client, leaderPath);

   client.getUnhandledErrorListenable().addListener(this);

   running = true;

   leaderLatch.addListener(this);
   leaderLatch.start();

   cache.getListenable().addListener(this);
   cache.start();

   client.getConnectionStateListenable().addListener(listener);
}

这里的leaderLatch.start();就是选举，并通过异步监听选举结果的方式来实现选举

public void start() throws Exception {
    Preconditions.checkState(this.state.compareAndSet(LeaderLatch.State.LATENT, LeaderLatch.State.STARTED), "Cannot be started more than once");
    this.startTask.set(AfterConnectionEstablished.execute(this.client, new Runnable() {
        public void run() {
            try {
                LeaderLatch.this.internalStart();
            } finally {
                LeaderLatch.this.startTask.set((Object)null);
            }

        }
    }));
}

leaderLatch.addListener(this);需要一个监听器参数，而每个参与选举的组件都要事先这个监听器：LeaderLatchListener：这里的this也是实现了监听器

public interface LeaderLatchListener {
    void isLeader();

    void notLeader();
}

curator框架会在选举结束后来调用LeaderLatchListener 实现类this的这两个方法。如果选举成功，则调用isLeader()：

@Override
public void isLeader() {
   leaderElectionEventHandler.onGrantLeadership();
}

leaderElectionEventHandler.onGrantLeadership();：

@Override
@GuardedBy("lock")
public void onGrantLeadership() {
   synchronized (lock) {
      if (running) {
         issuedLeaderSessionID = UUID.randomUUID();
         clearConfirmedLeaderInformation();

         if (LOG.isDebugEnabled()) {
            LOG.debug(
               "Grant leadership to contender {} with session ID {}.",
               leaderContender.getDescription(),
               issuedLeaderSessionID);
         }
            //将当前组件赋予领导者角色
         leaderContender.grantLeadership(issuedLeaderSessionID);
      } else {
         if (LOG.isDebugEnabled()) {
            LOG.debug("Ignoring the grant leadership notification since the {} has " +
               "already been closed.", leaderElectionDriver);
         }
      }
   }

}

leaderContender.grantLeadership(issuedLeaderSessionID);

@Override
public void grantLeadership(final UUID leaderSessionID) {
   log.info("{} was granted leadership with leaderSessionID={}", getRestBaseUrl(), leaderSessionID);
   leaderElectionService.confirmLeadership(leaderSessionID, getRestBaseUrl());
}

leaderElectionService.confirmLeadership

@Override
public void confirmLeadership(UUID leaderSessionID, String leaderAddress) {
   if (LOG.isDebugEnabled()) {
      LOG.debug(
         "Confirm leader session ID {} for leader {}.",
         leaderSessionID,
         leaderAddress);
   }

   checkNotNull(leaderSessionID);

   synchronized (lock) {
      if (hasLeadership(leaderSessionID)) {//是否当前组件又leader权限
         if (running) {
            confirmLeaderInformation(leaderSessionID, leaderAddress);
         } else {
            if (LOG.isDebugEnabled()) {
               LOG.debug("Ignoring the leader session Id {} confirmation, since the " +
                  "LeaderElectionService has already been stopped.", leaderSessionID);
            }
         }
      } else {
         // Received an old confirmation call
         if (!leaderSessionID.equals(this.issuedLeaderSessionID)) {
            if (LOG.isDebugEnabled()) {
               LOG.debug("Receive an old confirmation call of leader session ID {}, " +
                  "current issued session ID is {}", leaderSessionID, issuedLeaderSessionID);
            }
         } else {
            LOG.warn("The leader session ID {} was confirmed even though the " +
               "corresponding JobManager was not elected as the leader.", leaderSessionID);
         }
      }
   }
}

confirmLeaderInformation(leaderSessionID, leaderAddress);

@GuardedBy("lock")
private void confirmLeaderInformation(UUID leaderSessionID, String leaderAddress) {
   confirmedLeaderSessionID = leaderSessionID;
   confirmedLeaderAddress = leaderAddress;
   leaderElectionDriver.writeLeaderInformation(
      LeaderInformation.known(confirmedLeaderSessionID, confirmedLeaderAddress));
}

也就是如果确认当前为leader，那么将当前信息写入zk。写zk大概逻辑代码：

try {
   final ByteArrayOutputStream baos = new ByteArrayOutputStream();
   final ObjectOutputStream oos = new ObjectOutputStream(baos);

   oos.writeUTF(leaderInformation.getLeaderAddress());
   oos.writeObject(leaderInformation.getLeaderSessionID());

   oos.close();

   boolean dataWritten = false;

   while (!dataWritten && leaderLatch.hasLeadership()) {
      Stat stat = client.checkExists().forPath(leaderPath);

      if (stat != null) {
         long owner = stat.getEphemeralOwner();
         long sessionID = client.getZookeeperClient().getZooKeeper().getSessionId();

         if (owner == sessionID) {
            try {
               client.setData().forPath(leaderPath, baos.toByteArray());

               dataWritten = true;
            } catch (KeeperException.NoNodeException noNode) {
               // node was deleted in the meantime
            }
         } else {
            try {
               client.delete().forPath(leaderPath);
            } catch (KeeperException.NoNodeException noNode) {
               // node was deleted in the meantime --> try again
            }
         }
      } else {
         try {
            client.create().creatingParentsIfNeeded().withMode(CreateMode.EPHEMERAL).forPath(
                  leaderPath,
                  baos.toByteArray());

            dataWritten = true;
         } catch (KeeperException.NodeExistsException nodeExists) {
            // node has been created in the meantime --> try again
         }
      }
   }

   if (LOG.isDebugEnabled()) {
      LOG.debug("Successfully wrote leader information: {}.", leaderInformation);
   }
}

到此，组件就将自身信息写到了zk，如果写入成功，就说明获取了leader身份，而且客户端也通过zk知道了服务端的地址信息。下面几个其他的组件的选举过程与这里都是类似的，流程基本一致。

resourcemanager的创建

然后创建第二个组件并启动：resourcemanager

resourceManager = resourceManagerFactory.createResourceManager(
   configuration,
   ResourceID.generate(),
   rpcService,
   highAvailabilityServices,
   heartbeatServices,
   fatalErrorHandler,
   new ClusterInformation(hostname, blobServer.getPort()),
   webMonitorEndpoint.getRestBaseUrl(),
   metricRegistry,
   hostname,
   ioExecutor);
   //。。。  创建dispatcher组件
   log.debug("Starting ResourceManager.");
    resourceManager.start();//启动

这里start方法执行后，就会发送消息给自己，就会执行本身的onStart方法。
创建resourcemanager代码：

public ResourceManager<T> createResourceManager(
      Configuration configuration,
      ResourceID resourceId,
      RpcService rpcService,
      HighAvailabilityServices highAvailabilityServices,
      HeartbeatServices heartbeatServices,
      FatalErrorHandler fatalErrorHandler,
      ClusterInformation clusterInformation,
      @Nullable String webInterfaceUrl,
      MetricRegistry metricRegistry,
      String hostname,
      Executor ioExecutor) throws Exception {

   final ResourceManagerMetricGroup resourceManagerMetricGroup = ResourceManagerMetricGroup.create(metricRegistry, hostname);
   final SlotManagerMetricGroup slotManagerMetricGroup = SlotManagerMetricGroup.create(metricRegistry, hostname);
    //1.
   final ResourceManagerRuntimeServices resourceManagerRuntimeServices = createResourceManagerRuntimeServices(
      configuration, rpcService, highAvailabilityServices, slotManagerMetricGroup);
//2
   return createResourceManager(
      configuration,
      resourceId,
      rpcService,
      highAvailabilityServices,
      heartbeatServices,
      fatalErrorHandler,
      clusterInformation,
      webInterfaceUrl,
      resourceManagerMetricGroup,
      resourceManagerRuntimeServices,
      ioExecutor);
}

这里有两个重要方法：

（1）创建ResourceManagerRuntimeServices
（2）createResourceManager

（1）createResourceManagerRuntimeServices：

private ResourceManagerRuntimeServices createResourceManagerRuntimeServices(
      Configuration configuration,
      RpcService rpcService,
      HighAvailabilityServices highAvailabilityServices,
      SlotManagerMetricGroup slotManagerMetricGroup) throws ConfigurationException {

   return ResourceManagerRuntimeServices.fromConfiguration(
      createResourceManagerRuntimeServicesConfiguration(configuration),
      highAvailabilityServices,
      rpcService.getScheduledExecutor(),
      slotManagerMetricGroup);
}

fromConfiguration：

public static ResourceManagerRuntimeServices fromConfiguration(
      ResourceManagerRuntimeServicesConfiguration configuration,
      HighAvailabilityServices highAvailabilityServices,
      ScheduledExecutor scheduledExecutor,
      SlotManagerMetricGroup slotManagerMetricGroup) {
   //1.创建了一个SlotManager 
   final SlotManager slotManager = createSlotManager(configuration, scheduledExecutor, slotManagerMetricGroup);
    //创建了一个JobLeaderIdService 
   final JobLeaderIdService jobLeaderIdService = new JobLeaderIdService(
      highAvailabilityServices,
      scheduledExecutor,
      configuration.getJobTimeout());
  //返回封装了SlotManager 和JobLeaderIdService 的ResourceManagerRuntimeServices
   return new ResourceManagerRuntimeServices(slotManager, jobLeaderIdService);
}

所以，resourceManager主要是启动了两个服务：SlotManager 和JobLeaderIdService
（2）createResourceManager，具体实现类是StandaloneResourceManagerFactory

protected ResourceManager<ResourceID> createResourceManager(
   Configuration configuration,
   ResourceID resourceId,
   RpcService rpcService,
   HighAvailabilityServices highAvailabilityServices,
   HeartbeatServices heartbeatServices,
   FatalErrorHandler fatalErrorHandler,
   ClusterInformation clusterInformation,
   @Nullable String webInterfaceUrl,
   ResourceManagerMetricGroup resourceManagerMetricGroup,
   ResourceManagerRuntimeServices resourceManagerRuntimeServices,
   Executor ioExecutor) {

   final Time standaloneClusterStartupPeriodTime = ConfigurationUtils.getStandaloneClusterStartupPeriodTime(configuration);

   return new StandaloneResourceManager(
      rpcService,
      resourceId,
      highAvailabilityServices,
      heartbeatServices,
      resourceManagerRuntimeServices.getSlotManager(),
      ResourceManagerPartitionTrackerImpl::new,
      resourceManagerRuntimeServices.getJobLeaderIdService(),
      clusterInformation,
      fatalErrorHandler,
      resourceManagerMetricGroup,
      standaloneClusterStartupPeriodTime,
      AkkaUtils.getTimeoutAsTime(configuration),
      ioExecutor);
}

创建了resourcemanger后，就启动了，调用start方法：resourceManager.start();

// ------------------------------------------------------------------------
//  Start & shutdown & lifecycle callbacks
// ------------------------------------------------------------------------

/**
 * Triggers start of the rpc endpoint. This tells the underlying rpc server that the rpc endpoint is ready
 * to process remote procedure calls.
 */
public final void start() {
   rpcServer.start();
}

RpcServer的start方法，这里是个框架相关的方法，会调用到子类AkkaInvocationHandler的start方法：

class AkkaInvocationHandler implements InvocationHandler, AkkaBasedEndpoint, RpcServer

@Override
public void start() {
   rpcEndpoint.tell(ControlMessages.START, ActorRef.noSender());
}

这个rpcEndpoint是个ActorRef，rpcEndpoint.tell(ControlMessages.START, ActorRef.noSender());这里是发送给自己START消息，这里就会转到StandaloneResourceManager的onStart方法来执行。而实际上调用的是StandaloneResourceManager的父类ResourceManager的onStart方法：

@Override
public final void onStart() throws Exception {
   try {
      startResourceManagerServices();
   } catch (Throwable t) {
      final ResourceManagerException exception = new ResourceManagerException(String.format("Could not start the ResourceManager %s", getAddress()), t);
      onFatalError(exception);
      throw exception;
   }
}

private void startResourceManagerServices() throws Exception {
   try {
       //选举服务，将自己信息写道zk中
      leaderElectionService = highAvailabilityServices.getResourceManagerLeaderElectionService();

      initialize();

      leaderElectionService.start(this);
      jobLeaderIdService.start(new JobLeaderIdActionsImpl());

      registerTaskExecutorMetrics();
   } catch (Exception e) {
      handleStartResourceManagerServicesException(e);
   }
}

整个选举过程与上面的rest服务一样。最后会调用到：

public void onGrantLeadership() {
   synchronized (lock) {
      if (running) {
         issuedLeaderSessionID = UUID.randomUUID();
         clearConfirmedLeaderInformation();

         if (LOG.isDebugEnabled()) {
            LOG.debug(
               "Grant leadership to contender {} with session ID {}.",
               leaderContender.getDescription(),
               issuedLeaderSessionID);
         }
         //调用到这里
         leaderContender.grantLeadership(issuedLeaderSessionID);
      } else {
         if (LOG.isDebugEnabled()) {
            LOG.debug("Ignoring the grant leadership notification since the {} has " +
               "already been closed.", leaderElectionDriver);
         }
      }
   }

}

最后调用到Resourcemanager的grantLeadership方法：

@Override
public void grantLeadership(final UUID newLeaderSessionID) {
   final CompletableFuture<Boolean> acceptLeadershipFuture = clearStateFuture
      .thenComposeAsync((ignored) -> tryAcceptLeadership(newLeaderSessionID), getUnfencedMainThreadExecutor());

   final CompletableFuture<Void> confirmationFuture = acceptLeadershipFuture.thenAcceptAsync(
      (acceptLeadership) -> {
         if (acceptLeadership) {
            // confirming the leader session ID might be blocking,
            leaderElectionService.confirmLeadership(newLeaderSessionID, getAddress());
         }
      },
      ioExecutor);

   confirmationFuture.whenComplete(
      (Void ignored, Throwable throwable) -> {
         if (throwable != null) {
            onFatalError(ExceptionUtils.stripCompletionException(throwable));
         }
      });
}

这里有两处比较重要的部分：首先看代码：
tryAcceptLeadership(newLeaderSessionID), getUnfencedMainThreadExecutor());

private CompletableFuture<Boolean> tryAcceptLeadership(final UUID newLeaderSessionID) {
   if (leaderElectionService.hasLeadership(newLeaderSessionID)) {
      final ResourceManagerId newResourceManagerId = ResourceManagerId.fromUuid(newLeaderSessionID);

      log.info("ResourceManager {} was granted leadership with fencing token {}", getAddress(), newResourceManagerId);

      // clear the state if we've been the leader before
      if (getFencingToken() != null) {
         clearStateInternal();
      }

      setFencingToken(newResourceManagerId);
      //成为leader的resourcemanager才能执行整个方法执行服务
      startServicesOnLeadership();

      return prepareLeadershipAsync().thenApply(ignored -> true);
   } else {
      return CompletableFuture.completedFuture(false);
   }
}

startServicesOnLeadership();：成为leader的resourcemanager才能执行整个方法执行服务，这里启动两个心跳服务，两个定时服务。

private void startServicesOnLeadership() {
   //启动心跳服务
   startHeartbeatServices();
   //启动slotManager，启动两个定时服务
   //1.检查taskExecutor的死活状态 50s没有发送心跳过来
   //2.检查slot的申请请求状态，slot申请超时时间如果没有返回则不要了
   slotManager.start(getFencingToken(), getMainThreadExecutor(), new ResourceActionsImpl());
   //
   onLeadership();
}

1.启动心跳服务

private void startHeartbeatServices() {
   //启动与TaskManager 的Heartbeat
   taskManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(
      resourceId,
      new TaskManagerHeartbeatListener(),
      getMainThreadExecutor(),
      log);
   //启动与jobManager的Heartbeat
   jobManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(
      resourceId,
      new JobManagerHeartbeatListener(),
      getMainThreadExecutor(),
      log);
}

createHeartbeatManagerSender在heartBeatService中定义的，还有类似的方法createHeartbeatManager，区别在于带sender的是发送消息的主动方。

public <I, O> HeartbeatManager<I, O> createHeartbeatManagerSender(
   ResourceID resourceId,
   HeartbeatListener<I, O> heartbeatListener,
   ScheduledExecutor mainThreadExecutor,
   Logger log) {

   return new HeartbeatManagerSenderImpl<>(
      heartbeatInterval,
      heartbeatTimeout,
      resourceId,
      heartbeatListener,
      mainThreadExecutor,
      log);
}

HeartbeatManagerSenderImpl(
      long heartbeatPeriod,
      long heartbeatTimeout,
      ResourceID ownResourceID,
      HeartbeatListener<I, O> heartbeatListener,
      ScheduledExecutor mainThreadExecutor,
      Logger log) {
   this(
      heartbeatPeriod,
      heartbeatTimeout,
      ownResourceID,
      heartbeatListener,
      mainThreadExecutor,
      log,
      new HeartbeatMonitorImpl.Factory<>());
}

HeartbeatManagerSenderImpl(
      long heartbeatPeriod,
      long heartbeatTimeout,
      ResourceID ownResourceID,
      HeartbeatListener<I, O> heartbeatListener,
      ScheduledExecutor mainThreadExecutor,
      Logger log,
      HeartbeatMonitor.Factory<O> heartbeatMonitorFactory) {
   super(
      heartbeatTimeout,
      ownResourceID,
      heartbeatListener,
      mainThreadExecutor,
      log,
      heartbeatMonitorFactory);

   this.heartbeatPeriod = heartbeatPeriod;
   mainThreadExecutor.schedule(this, 0L, TimeUnit.MILLISECONDS);
}

mainThreadExecutor.schedule(this, 0L, TimeUnit.MILLISECONDS);定时服务，调度当前线程。

@Override
public void run() {
   if (!stopped) {
      log.debug("Trigger heartbeat request.");
      for (HeartbeatMonitor<O> heartbeatMonitor : getHeartbeatTargets().values()) {
         requestHeartbeat(heartbeatMonitor);
      }
           //启动的时候调度一次，延迟heartbeatPeriod再次调度，每隔10s
      getMainThreadExecutor().schedule(this, heartbeatPeriod, TimeUnit.MILLISECONDS);
   }
}

HeartbeatMonitor就是taskExecutor。

接下来看slotManager.start(getFencingToken(), getMainThreadExecutor(), new ResourceActionsImpl());

@Override
public void start(ResourceManagerId newResourceManagerId, Executor newMainThreadExecutor, ResourceActions newResourceActions) {
   LOG.info("Starting the SlotManager.");

   this.resourceManagerId = Preconditions.checkNotNull(newResourceManagerId);
   mainThreadExecutor = Preconditions.checkNotNull(newMainThreadExecutor);
   resourceActions = Preconditions.checkNotNull(newResourceActions);

   started = true;
        //第一个定时任务
   taskManagerTimeoutsAndRedundancyCheck = scheduledExecutor.scheduleWithFixedDelay(
      () -> mainThreadExecutor.execute(
         () -> checkTaskManagerTimeoutsAndRedundancy()),
      0L,
      taskManagerTimeout.toMilliseconds(),
      TimeUnit.MILLISECONDS);
//第二个定时任务
   slotRequestTimeoutCheck = scheduledExecutor.scheduleWithFixedDelay(
      () -> mainThreadExecutor.execute(
         () -> checkSlotRequestTimeouts()),
      0L,
      slotRequestTimeout.toMilliseconds(),
      TimeUnit.MILLISECONDS);

   registerSlotManagerMetrics();
}

第一个定时任务检查那些taskmanager超时了：checkTaskManagerTimeoutsAndRedundancy，心跳间隔10s，任务检查30s，任务超时50s。

void checkTaskManagerTimeoutsAndRedundancy() {
   if (!taskManagerRegistrations.isEmpty()) {
      long currentTime = System.currentTimeMillis();

      ArrayList<TaskManagerRegistration> timedOutTaskManagers = new ArrayList<>(taskManagerRegistrations.size());

      // first retrieve the timed out TaskManagers
      for (TaskManagerRegistration taskManagerRegistration : taskManagerRegistrations.values()) {
         if (currentTime - taskManagerRegistration.getIdleSince() >= taskManagerTimeout.toMilliseconds()) {
            // we collect the instance ids first in order to avoid concurrent modifications by the
            // ResourceActions.releaseResource call
            timedOutTaskManagers.add(taskManagerRegistration);
         }
      }

      int slotsDiff = redundantTaskManagerNum * numSlotsPerWorker - freeSlots.size();
      if (freeSlots.size() == slots.size()) {
         // No need to keep redundant taskManagers if no job is running.
         releaseTaskExecutors(timedOutTaskManagers, timedOutTaskManagers.size());
      } else if (slotsDiff > 0) {
         // Keep enough redundant taskManagers from time to time.
         int requiredTaskManagers = MathUtils.divideRoundUp(slotsDiff, numSlotsPerWorker);
         allocateRedundantTaskManagers(requiredTaskManagers);
      } else {
         // second we trigger the release resource callback which can decide upon the resource release
         int maxReleaseNum = (-slotsDiff) / numSlotsPerWorker;
         releaseTaskExecutors(timedOutTaskManagers, Math.min(maxReleaseNum, timedOutTaskManagers.size()));
      }
   }
}

第二个定时任务checkSlotRequestTimeouts检查那些slot请求任务超时了。5分钟超时。

private void checkSlotRequestTimeouts() {
   if (!pendingSlotRequests.isEmpty()) {
      long currentTime = System.currentTimeMillis();

      Iterator<Map.Entry<AllocationID, PendingSlotRequest>> slotRequestIterator = pendingSlotRequests.entrySet().iterator();

      while (slotRequestIterator.hasNext()) {
         PendingSlotRequest slotRequest = slotRequestIterator.next().getValue();

         if (currentTime - slotRequest.getCreationTimestamp() >= slotRequestTimeout.toMilliseconds()) {
            slotRequestIterator.remove();

            if (slotRequest.isAssigned()) {
               cancelPendingSlotRequest(slotRequest);
            }

            resourceActions.notifyAllocationFailure(
               slotRequest.getJobId(),
               slotRequest.getAllocationId(),
               new TimeoutException("The allocation could not be fulfilled in time."));
         }
      }
   }
}

resourcemanager启动总结

resourcemanager的选举启动了两个心跳任务，两个定时任务。

dispatcher的创建并启动

//create方法内部会创建dispatcher并调用start方法启动
log.debug("Starting Dispatcher.");
dispatcherRunner = dispatcherRunnerFactory.createDispatcherRunner(
   highAvailabilityServices.getDispatcherLeaderElectionService(),
   fatalErrorHandler,
   new HaServicesJobGraphStoreFactory(highAvailabilityServices),
   ioExecutor,
   rpcService,
   partialDispatcherServices);

1.创建工厂：dispatcherRunnerFactory.createDispatcherRunner方法

  /**
* 1.创建dispatcher
* 2.启动dispatcher
*/
    @Override
   public DispatcherRunner createDispatcherRunner(
         LeaderElectionService leaderElectionService,
         FatalErrorHandler fatalErrorHandler,
         JobGraphStoreFactory jobGraphStoreFactory,
         Executor ioExecutor,
         RpcService rpcService,
         PartialDispatcherServices partialDispatcherServices) throws Exception {
//
      final DispatcherLeaderProcessFactory dispatcherLeaderProcessFactory = dispatcherLeaderProcessFactoryFactory.createFactory(
         jobGraphStoreFactory,
         ioExecutor,
         rpcService,
         partialDispatcherServices,
         fatalErrorHandler);

      return DefaultDispatcherRunner.create(
         leaderElectionService,
         fatalErrorHandler,
         dispatcherLeaderProcessFactory);
   }

首先创建一个工厂

public DispatcherLeaderProcessFactory createFactory(
      JobGraphStoreFactory jobGraphStoreFactory,
      Executor ioExecutor,
      RpcService rpcService,
      PartialDispatcherServices partialDispatcherServices,
      FatalErrorHandler fatalErrorHandler) {
   final AbstractDispatcherLeaderProcess.DispatcherGatewayServiceFactory dispatcherGatewayServiceFactory = new DefaultDispatcherGatewayServiceFactory(
      dispatcherFactory,
      rpcService,
      partialDispatcherServices);

   return new SessionDispatcherLeaderProcessFactory(
      dispatcherGatewayServiceFactory,
      jobGraphStoreFactory,
      ioExecutor,
      fatalErrorHandler);
}

SessionDispatcherLeaderProcessFactory用来创建SessionDispatcherLeaderProcess的，整个工厂封装给了DispatcherLeaderProcessFactory
然后调用create方法：

public static DispatcherRunner create(
      LeaderElectionService leaderElectionService,
      FatalErrorHandler fatalErrorHandler,
      DispatcherLeaderProcessFactory dispatcherLeaderProcessFactory) throws Exception {
   //创建DefaultDispatcherRunner
   final DefaultDispatcherRunner dispatcherRunner = new DefaultDispatcherRunner(
      leaderElectionService,
      fatalErrorHandler,
      dispatcherLeaderProcessFactory);
   //开启DefaultDispatcherRunner的生命周期
   return DispatcherRunnerLeaderElectionLifecycleManager.createFor(dispatcherRunner, leaderElectionService);
}

public static <T extends DispatcherRunner & LeaderContender> DispatcherRunner createFor(T dispatcherRunner, LeaderElectionService leaderElectionService) throws Exception {
   return new DispatcherRunnerLeaderElectionLifecycleManager<>(dispatcherRunner, leaderElectionService);
}

把选举服务给了这个方法作为参数：leaderElectionService

private DispatcherRunnerLeaderElectionLifecycleManager(T dispatcherRunner, LeaderElectionService leaderElectionService) throws Exception {
   this.dispatcherRunner = dispatcherRunner;
   this.leaderElectionService = leaderElectionService;

   leaderElectionService.start(dispatcherRunner);
}

里面执行了选举服务：leaderElectionService.start(dispatcherRunner);

public final void start(LeaderContender contender) throws Exception {
   checkNotNull(contender, "Contender must not be null.");
   Preconditions.checkState(leaderContender == null, "Contender was already set.");

   synchronized (lock) {
      leaderContender = contender;
      leaderElectionDriver = leaderElectionDriverFactory.createLeaderElectionDriver(
         this, new LeaderElectionFatalErrorHandler(), leaderContender.getDescription());
      LOG.info("Starting DefaultLeaderElectionService with {}.", leaderElectionDriver);

      running = true;
   }
}

这里的leaderContender 是dispatcherRunner。再次来到ZooKeeperLeaderElectionDriver

public ZooKeeperLeaderElectionDriver(
      CuratorFramework client,
      String latchPath,
      String leaderPath,
      LeaderElectionEventHandler leaderElectionEventHandler,
      FatalErrorHandler fatalErrorHandler,
      String leaderContenderDescription) throws Exception {
   this.client = checkNotNull(client);
   this.leaderPath = checkNotNull(leaderPath);
   this.leaderElectionEventHandler = checkNotNull(leaderElectionEventHandler);
   this.fatalErrorHandler = checkNotNull(fatalErrorHandler);
   this.leaderContenderDescription = checkNotNull(leaderContenderDescription);

   leaderLatch = new LeaderLatch(client, checkNotNull(latchPath));
   cache = new NodeCache(client, leaderPath);

   client.getUnhandledErrorListenable().addListener(this);

   running = true;

   leaderLatch.addListener(this);
   leaderLatch.start();

   cache.getListenable().addListener(this);
   cache.start();

   client.getConnectionStateListenable().addListener(listener);
}

选举成功回调ZooKeeperLeaderElectionDriver的isLeader方法：

@Override
public void isLeader() {
   leaderElectionEventHandler.onGrantLeadership();
}

然后到DefaultDispatcherRunner的grantLeadership方法

@Override
public void grantLeadership(UUID leaderSessionID) {
   runActionIfRunning(() -> startNewDispatcherLeaderProcess(leaderSessionID));
}

调startNewDispatcherLeaderProcess

private void startNewDispatcherLeaderProcess(UUID leaderSessionID) {
   //停掉旧的实例
   stopDispatcherLeaderProcess();
    //创建新的实例
   dispatcherLeaderProcess = createNewDispatcherLeaderProcess(leaderSessionID);
    //启动新的实例
   final DispatcherLeaderProcess newDispatcherLeaderProcess = dispatcherLeaderProcess;
   FutureUtils.assertNoException(
      previousDispatcherLeaderProcessTerminationFuture.thenRun(newDispatcherLeaderProcess::start));
}

启动方法：newDispatcherLeaderProcess::start，调用的是AbstractDispatcherLeaderProcess类的start方法

@Override
public final void start() {
   runIfStateIs(
      State.CREATED,
      this::startInternal);
}

private void startInternal() {
   log.info("Start {}.", getClass().getSimpleName());
   state = State.RUNNING;
   onStart();
}

调用的是SessionDispatcherLeaderProcess的onStart方法：

@Override
protected void onStart() {
   startServices();

   onGoingRecoveryOperation = recoverJobsAsync()//拿到所有的jobGraph对象
      .thenAccept(this::createDispatcherIfRunning)//将每个job启动一个dispatcher跑起来
      .handle(this::onErrorIfRunning);
}

1.startService:就是启动jobGraphStore，用来存储jobGraph

private void startServices() {
   try {
      jobGraphStore.start(this);
   } catch (Exception e) {
      throw new FlinkRuntimeException(
         String.format(
            "Could not start %s when trying to start the %s.",
            jobGraphStore.getClass().getSimpleName(),
            getClass().getSimpleName()),
         e);
   }
}

2.recoverJobsAsync：恢复待执行的任务，异步的方式，拿到所有需要恢复的jobGraph，真正恢复需要调用后面的.thenAccept(this::createDispatcherIfRunning)

private CompletableFuture<Collection<JobGraph>> recoverJobsAsync() {
   return CompletableFuture.supplyAsync(
      this::recoverJobsIfRunning,
      ioExecutor);
}

调用的recoverJobsIfRunning：

private Collection<JobGraph> recoverJobsIfRunning() {
   return supplyUnsynchronizedIfRunning(this::recoverJobs).orElse(Collections.emptyList());

}

恢复job方法：recoverJobs

private Collection<JobGraph> recoverJobs() {
   log.info("Recover all persisted job graphs.");
   final Collection<JobID> jobIds = getJobIds();
   final Collection<JobGraph> recoveredJobGraphs = new ArrayList<>();

   for (JobID jobId : jobIds) {
      recoveredJobGraphs.add(recoverJob(jobId));
   }

   log.info("Successfully recovered {} persisted job graphs.", recoveredJobGraphs.size());

   return recoveredJobGraphs;
}

拿到所有的job，然后根据jobid去恢复。

private Collection<JobID> getJobIds() {
   try {
      return jobGraphStore.getJobIds();
   } catch (Exception e) {
      throw new FlinkRuntimeException(
         "Could not retrieve job ids of persisted jobs.",
         e);
   }
}

recoverJob是真正恢复job的方法：

private JobGraph recoverJob(JobID jobId) {
   log.info("Trying to recover job with job id {}.", jobId);
   try {
      return jobGraphStore.recoverJobGraph(jobId);
   } catch (Exception e) {
      throw new FlinkRuntimeException(
         String.format("Could not recover job with job id %s.", jobId),
         e);
   }
}

实现类是DefaultJobGraphStore的recoverJobGraph，拿到所有的jobGraph

public JobGraph recoverJobGraph(JobID jobId) throws Exception {
   checkNotNull(jobId, "Job ID");

   LOG.debug("Recovering job graph {} from {}.", jobId, jobGraphStateHandleStore);

   final String name = jobGraphStoreUtil.jobIDToName(jobId);

   synchronized (lock) {
      verifyIsRunning();

      boolean success = false;

      RetrievableStateHandle<JobGraph> jobGraphRetrievableStateHandle;

      try {
         try {
            jobGraphRetrievableStateHandle = jobGraphStateHandleStore.getAndLock(name);
         } catch (StateHandleStore.NotExistException ignored) {
            success = true;
            return null;
         } catch (Exception e) {
            throw new FlinkException("Could not retrieve the submitted job graph state handle " +
               "for " + name + " from the submitted job graph store.", e);
         }

         JobGraph jobGraph;
         try {
            jobGraph = jobGraphRetrievableStateHandle.retrieveState();
         } catch (ClassNotFoundException cnfe) {
            throw new FlinkException("Could not retrieve submitted JobGraph from state handle under " + name +
               ". This indicates that you are trying to recover from state written by an " +
               "older Flink version which is not compatible. Try cleaning the state handle store.", cnfe);
         } catch (IOException ioe) {
            throw new FlinkException("Could not retrieve submitted JobGraph from state handle under " + name +
               ". This indicates that the retrieved state handle is broken. Try cleaning the state handle " +
               "store.", ioe);
         }

         addedJobGraphs.add(jobGraph.getJobID());

         LOG.info("Recovered {}.", jobGraph);

         success = true;
         return jobGraph;
      } finally {
         if (!success) {
            jobGraphStateHandleStore.release(name);
         }
      }
   }
}

上面拿到了需要恢复的jobGraph，下面执行.thenAccept(this::createDispatcherIfRunning)

private void createDispatcherIfRunning(Collection<JobGraph> jobGraphs) {
   runIfStateIs(State.RUNNING, () -> createDispatcher(jobGraphs));
}

每一个jobGraph都需要一个dispatcher去调度：

private void createDispatcher(Collection<JobGraph> jobGraphs) {

   final DispatcherGatewayService dispatcherService = dispatcherGatewayServiceFactory.create(
      DispatcherId.fromUuid(getLeaderSessionId()),
      jobGraphs,
      jobGraphStore);

   completeDispatcherSetup(dispatcherService);
}

而启动jobGraph在dispatcherGatewayServiceFactory.create方法里面

private void createDispatcher(Collection<JobGraph> jobGraphs) {

   final DispatcherGatewayService dispatcherService = dispatcherGatewayServiceFactory.create(
      DispatcherId.fromUuid(getLeaderSessionId()),
      jobGraphs,
      jobGraphStore);

   completeDispatcherSetup(dispatcherService);
}

public AbstractDispatcherLeaderProcess.DispatcherGatewayService create(
      DispatcherId fencingToken,
      Collection<JobGraph> recoveredJobs,
      JobGraphWriter jobGraphWriter) {

   final Dispatcher dispatcher;
   try {
       //返回的是一个StandaloneDispatcher 
      dispatcher = dispatcherFactory.createDispatcher(
         rpcService,
         fencingToken,
         recoveredJobs,
         (dispatcherGateway, scheduledExecutor, errorHandler) -> new NoOpDispatcherBootstrap(),
         PartialDispatcherServicesWithJobGraphStore.from(partialDispatcherServices, jobGraphWriter));
   } catch (Exception e) {
      throw new FlinkRuntimeException("Could not create the Dispatcher rpc endpoint.", e);
   }

   dispatcher.start();

   return DefaultDispatcherGatewayService.from(dispatcher);
}

整个流程总结一下：
onStart()
->recoverJobsAsync()(恢复所有的jobGraph)
->createDispatcherIfRunning()
-> runIfStateIs(State.RUNNING, () -> createDispatcher(jobGraphs));
->createDispatcher(jobGraphs)(恢复jobGraph)

也就是对于每个jobGraph，都需要一个dispatcher去调度运行。
其中createDispatcher返回的是一个StandaloneDispatcher

@Override
public StandaloneDispatcher createDispatcher(
      RpcService rpcService,
      DispatcherId fencingToken,
      Collection<JobGraph> recoveredJobs,
      DispatcherBootstrapFactory dispatcherBootstrapFactory,
      PartialDispatcherServicesWithJobGraphStore partialDispatcherServicesWithJobGraphStore) throws Exception {
   // create the default dispatcher
   return new StandaloneDispatcher(
      rpcService,
      fencingToken,
      recoveredJobs,
      dispatcherBootstrapFactory,
      DispatcherServices.from(partialDispatcherServicesWithJobGraphStore, DefaultJobManagerRunnerFactory.INSTANCE));
}

dispatcher的start又进入了rpcServer的start从而转到onStart方法。也就是StandaloneDispatcher 的onstart方法

public final void start() {
   rpcServer.start();
}

调用的是父类Dispatcher 的onStart方法

@Override
public void onStart() throws Exception {
   try {
      startDispatcherServices();
   } catch (Throwable t) {
      final DispatcherException exception = new DispatcherException(String.format("Could not start the Dispatcher %s", getAddress()), t);
      onFatalError(exception);
      throw exception;
   }

   startRecoveredJobs();
   this.dispatcherBootstrap = this.dispatcherBootstrapFactory.create(
         getSelfGateway(DispatcherGateway.class),
         this.getRpcService().getScheduledExecutor() ,
         this::onFatalError);
}

startRecoveredJobs();将任务调度执行起来。

private void startRecoveredJobs() {
   for (JobGraph recoveredJob : recoveredJobs) {
      runRecoveredJob(recoveredJob);
   }
   recoveredJobs.clear();
}

private void runRecoveredJob(final JobGraph recoveredJob) {
   checkNotNull(recoveredJob);
   try {
      runJob(recoveredJob, ExecutionType.RECOVERY);
   } catch (Throwable throwable) {
      onFatalError(new DispatcherException(String.format("Could not start recovered job %s.", recoveredJob.getJobID()), throwable));
   }
}

到这里job就运行起来了。运行的模式是ExecutionType.RECOVERY。
job恢复后调用completeDispatcherSetup方法：添加一些回调。

final void completeDispatcherSetup(DispatcherGatewayService dispatcherService) {
   runIfStateIs(
      State.RUNNING,
      () -> completeDispatcherSetupInternal(dispatcherService));
}

private void completeDispatcherSetupInternal(DispatcherGatewayService createdDispatcherService) {
   Preconditions.checkState(dispatcherService == null, "The DispatcherGatewayService can only be set once.");
   dispatcherService = createdDispatcherService;
   dispatcherGatewayFuture.complete(createdDispatcherService.getGateway());
   FutureUtils.forward(createdDispatcherService.getShutDownFuture(), shutDownFuture);
   handleUnexpectedDispatcherServiceTermination(createdDispatcherService);
}

总结

到此，jobmanager的三个最重要的组件就启动完成了，三个组件分别是：

WebMonitorEndpoint

初始化一大堆 Handler 和一个 Router，并且进行排序去重，之后，再把每个 Handler 注册到Router当中
启动一个 Netty 的服务端
启动内部服务：执行竞选！WebMonitorEndpoint 本身就是一个 LeaderContender 角色。如果竞选成功，则回调 isLeader() 方法
竞选成功，其实就只是把 WebMontiroEndpoint 的 address 以及跟 zookeeper的sessionID 写入到 znode 中
启动一个关于 ExecutionGraph 的 Cache 的定时清理任务

ResourceManager

1、ResourceManager 是 RpcEndpoint 的子类，所以在构建 ResourceManager 对象完成之后，会调用 start() 方法来启动这个 RpcEndpoint，然后就调准到它的 onStart() 方法执行。
2、ResourceManager 是 LeaderContender 的子类，会通过 LeaderElectionService 参加竞选，如果竞选成功，则会回调 isLeader() 方法。
3、启动 ResourceManager 需要的一些服务:
两个心跳服务：
（1）ResourceManager 和 TaskExecutor 之间的心跳
（2）ResourceManager 和 JobMaster 之间的心跳
两个定时服务：
（1）checkTaskManagerTimeoutsAndRedundancy() 检查 TaskExecutor的超时
（2）checkSlotRequestTimeouts() 检查 SlotRequest 超时

Dispatcher 启动和初始化

1、启动 JobGraphStore 服务
2、从 JobGraphStrore 恢复执行 Job, 要启动 Dispatcher

到此为止，job manager启动完成！

colossus_bigdata

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
005.flink源码分析-jobmanager的启动

jobmanager概览JobManager 是 Flink 集群的主节点，它包含几大重要的组件：1、ResourceManagerFlink的集群资源管理器，只有一个，关于slot的管理和申请等工作，都由他负责。2、Dispatcher负责接收用户提交的 JobGragh, 然后启动一个 JobMaster，类似于 YARN 集群中的 AppMaster角色，类似于 Spark Job 中的 Driver 角色。3、WebMonitorEndpoint里面维护了很多很多的Handler，如
复制链接

扫一扫

专栏目录