Flink源码之提交流程2_集群启动

再次感谢尚硅谷!!!

上一篇写到了提交应用,这继续

调用太多了,每个方法只看主要代码,多余的代码都以…代替

一、创建和启动 JobManager里的组件:Dispatcher、ResourceManager、JobMaster

在上一篇的deployJobCluster方法中有一个getYarnJobClusterEntrypoint(),这是AM 的入口

return deployInternal(
   clusterSpecification,
   "Flink per-job cluster",
   getYarnJobClusterEntrypoint(),
   jobGraph,
   detached);
protected String getYarnJobClusterEntrypoint() {
   return YarnJobClusterEntrypoint.class.getName();
}

进入YarnJobClusterEntrypoint类

其main方法是单个 Flink 作业的 Yarn Application Master Process 的可执行入口点。

在这里插入图片描述

public static void main(String[] args) {
   ... 配置相关的
   YarnJobClusterEntrypoint yarnJobClusterEntrypoint = new YarnJobClusterEntrypoint(configuration);
   ClusterEntrypoint.runClusterEntrypoint(yarnJobClusterEntrypoint);
}

点runClusterEntrypoint方法到ClusterEntrypoint类的runCluster方法

private void runCluster(...) throws Exception {

      /*TODO 初始化服务:Rpc相关*/
      initializeServices(configuration, pluginManager);
	  ...
      /*TODO 创建和启动 JobManager里的组件:Dispatcher、ResourceManager、JobMaster*/
      clusterComponent = dispatcherResourceManagerComponentFactory.create(...);

       ...
}

点create方法到DefaultDispatcherResourceManagerComponentFactory类

这个就是分析的主体

public DispatcherResourceManagerComponent create(...) throws Exception {

   	  ...
	
      /*TODO 创建 ResourceManager:Yarn模式的 ResourceManager*/
      resourceManager = resourceManagerFactory.createResourceManager(...);
    
      /*TODO 创建和启动 Dispatcher => Dispatcher会创建和启动JobMaster*/
      dispatcherRunner = dispatcherRunnerFactory.createDispatcherRunner(...);

      /*TODO 启动 ResourceManager*/
      resourceManager.start();

      ...
}

1.创建 ResourceManager

点createResourceManager方法直到ActiveResourceManagerFactory类的createResourceManager方法会返回一个ActiveResourceManager对象,ActiveResourceManager类是ResourceManager类的子类

public ResourceManager<WorkerType> createResourceManager(...) throws Exception {

   return new ActiveResourceManager<>(...);
}

2.创建和启动 Dispatcher => Dispatcher会创建和启动JobMaster

回到DefaultDispatcherResourceManagerComponentFactory类点createDispatcherRunner方法到DefaultDispatcherGatewayServiceFactory类

public AbstractDispatcherLeaderProcess.DispatcherGatewayService create(...) {

   /*TODO 创建Dispatcher*/
   dispatcher = dispatcherFactory.createDispatcher(...);

   /*TODO 启动 Dispatcher,接着看 onStart()*/
   dispatcher.start();
   ...
}

去dispatcher的onStart方法(到Dispatcher类)

public void onStart() throws Exception {
   
   /*TODO 启动 dispatcher服务*/
   startDispatcherServices();

   /*TODO 启动JobMaster*/
   startRecoveredJobs();
}

点进startRecoveredJobs方法里直到createJobManagerRunner方法

CompletableFuture<JobManagerRunner> createJobManagerRunner(...) {
  
    /*TODO 创建JobMaster */
    JobManagerRunner runner = jobManagerRunnerFactory.createJobManagerRunner(...);
    
    /*TODO 启动JobMaster*/
    runner.start();
           
}
1)创建JobMaste

点createJobManagerRunner方法到DefaultJobMasterServiceFactory类的createJobMasterService方法,返回一个JobMaster对象

public JobMaster createJobMasterService(...) throws Exception {

   return new JobMaster(...);
}
2)启动JobMaster

点start方法直到JobMaster类的startJobExecution方法

private Acknowledge startJobExecution(JobMasterId newJobMasterId) throws Exception {
	...

   /*TODO 真正启动JobMaster服务*/
   startJobMasterServices();

   /*TODO 重置和启动调度器*/
   resetAndStartScheduler();
}
private void startJobMasterServices() throws Exception {
   /*TODO 启动心跳服务:taskmanager、resourcemanager*/
   startHeartbeatServices();

   /*TODO 启动 slotpool*/
   slotPool.start(getFencingToken(), getAddress(), getMainThreadExecutor());
    
   ...

   // job is ready to go, try to establish connection with resource manager
   //   - activate leader retrieval for the resource manager
   //   - on notification of the leader, the connection will be established and
   //     the slot pool will start requesting slots
   /*TODO 与ResourceManager建立连接,slotpool开始请求资源*/
   resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());
}
a.启动心跳服务
private void startHeartbeatServices() {
   taskManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(
      resourceId,
      new TaskManagerHeartbeatListener(),
      getMainThreadExecutor(),
      log);

   resourceManagerHeartbeatManager = heartbeatServices.createHeartbeatManager(
      resourceId,
      new ResourceManagerHeartbeatListener(),
      getMainThreadExecutor(),
      log);
}
b.启动 slotpool
c.与ResourceManager建立连接,slotpool开始请求资源

点resourceManagerLeaderRetriever.start(new ResourceManagerLeaderListener());到RegisteredRpcConnection类start方法

public void start() {
   ...
   /*TODO 创建注册对象*/
   final RetryingRegistration<F, G, S> newRegistration = createNewRegistration();
   
   /*TODO 开始注册,注册成功之后,调用 onRegistrationSuccess()*/
   newRegistration.startRegistration();

}
(1)创建注册对象

点createNewRegistration方法到JobMaster类的generateRegistration方法

protected RetryingRegistration<ResourceManagerId, ResourceManagerGateway, JobMasterRegistrationSuccess> generateRegistration() {
   
         return gateway.registerJobManager(
            jobMasterId,
            jobManagerResourceID,
            jobManagerRpcAddress,
            jobID,
            timeout);

}
(2)开始注册,注册成功之后,调用 onRegistrationSuccess()

①注册,点startRegistration方法到RetryingRegistration类,再点register方法,再调用invokeRegistration方法,这个方法是JobMaster类的一个匿名内部类里的invokeRegistration方法

protected CompletableFuture<RegistrationResponse> invokeRegistration(
      ResourceManagerGateway gateway, ResourceManagerId fencingToken, long timeoutMillis) {
   Time timeout = Time.milliseconds(timeoutMillis);

   return gateway.registerJobManager(
      jobMasterId,
      jobManagerResourceID,
      jobManagerRpcAddress,
      jobID,
      timeout);
}

②注册成功之后,到JobMaster类的onRegistrationSuccess方法,然后再调用establishResourceManagerConnection方法

private void establishResourceManagerConnection(final JobMasterRegistrationSuccess success) {
      ...
      /*TODO slotpool连接到ResourceManager,请求资源*/
      slotPool.connectToResourceManager(resourceManagerGateway);
	  ...
}

点connectToResourceManager方法到SlotPoolImpl类,再调用requestSlotFromResourceManager方法,再点点点到ResourceManager类的requestSlot方法

public CompletableFuture<Acknowledge> requestSlot(...) {
    ...
    
    /*TODO ResourceManager内部的 slotManager去向 Yarn的ResourceManager申请资源*/
    slotManager.registerSlotRequest(slotRequest);
    
    ...
}

再点registerSlotRequest方法到SlotManagerImpl类的internalRequestSlot方法

private void internalRequestSlot(...) throws ResourceManagerException {
    ...
       
    () -> fulfillPendingSlotRequestWithPendingTaskManagerSlot(pendingSlotRequest);
}

// 再点到fulfillPendingSlotRequestWithPendingTaskManagerSlot

private void fulfillPendingSlotRequestWithPendingTaskManagerSlot(...) throws ResourceManagerException {
   ...
      pendingTaskManagerSlotOptional = allocateResource(resourceProfile);
   ...
}
private Optional<PendingTaskManagerSlot> allocateResource(ResourceProfile requestedSlotResourceProfile) {
   ...

   if (!resourceActions.allocateResource(defaultWorkerResourceSpec)) {
      // resource cannot be allocated
      return Optional.empty();
   }
	...
}

再点allocateResource到ResourceManager类的内部类ResourceActionsImpl的allocateResource方法

public boolean allocateResource(WorkerResourceSpec workerResourceSpec) {
   validateRunsInMainThread();
   return startNewWorker(workerResourceSpec);
}

再点startNewWorker方法直到到ActiveResourceManager类的requestNewWorker方法

3.启动 ResourceManager

回到DefaultDispatcherResourceManagerComponentFactory类

分析这个resourceManager.start();

去ResourceManger类的onStart方法

public final void onStart() throws Exception {
    
      startResourceManagerServices();
}

点startResourceManagerServices方法

private void startResourceManagerServices() throws Exception {
      ...
      /*TODO 创建了Yarn的RM和NM的客户端,初始化并启动*/
      initialize();

      /*TODO 通过选举服务,启动ResourceManager*/
      leaderElectionService.start(this);
      ...
}
1)创建了Yarn的RM和NM的客户端,初始化并启动

点initialize方法到YarnResourceManagerDriver类的initializeInternal方法

protected void initializeInternal() throws Exception {

   /*TODO 创建Yarn的ResourceManager的客户端,并且初始化和启动*/
   resourceManagerClient = yarnResourceManagerClientFactory.createResourceManagerClient(
      yarnHeartbeatIntervalMillis,
      yarnContainerEventHandler);
   resourceManagerClient.init(yarnConfig);
   resourceManagerClient.start();

       ...
           
   /*TODO 创建yarn的 NodeManager的客户端,并且初始化和启动*/
   nodeManagerClient = yarnNodeManagerClientFactory.createNodeManagerClient(yarnContainerEventHandler);
   nodeManagerClient.init(yarnConfig);
   nodeManagerClient.start();
}
2)启动ResourceManager

回到ResourceManger类的startResourceManagerServices方法

点start到本类的startServicesOnLeadership方法

private void startServicesOnLeadership() {
   /*TODO 启动心跳服务:TaskManager、JobMaster*/
   startHeartbeatServices();

   /*TODO 启动slotManager*/
   slotManager.start(getFencingToken(), getMainThreadExecutor(), new ResourceActionsImpl());

   ...
}
a.启动心跳服务
private void startHeartbeatServices() {  // 启动心跳服务
   taskManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(
      resourceId,
      new TaskManagerHeartbeatListener(),
      getMainThreadExecutor(),
      log);

   jobManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(
      resourceId,
      new JobManagerHeartbeatListener(),
      getMainThreadExecutor(),
      log);
}
b.启动slotManager

二、启动TaskManger

我们现在是基于找YarnTaskExecutorRunner类

此类是在 YARN 容器中运行 TaskExecutor 的可执行入口点。

public static void main(String[] args) {
   EnvironmentInformation.logEnvironmentInfo(LOG, "YARN TaskExecutor runner", args);
   SignalHandler.register(LOG);
   JvmShutdownSafeguard.installAsShutdownHook(LOG);

   runTaskManagerSecurely(args);
}

点runTaskManagerSecurely方法直到TaskExecutorToServiceAdapter类的start方法

public void start() {
   /*TODO 通过Rpc服务,启动 TaskExecutor,找 它的 onStart()方法*/
   taskExecutor.start();
}

到TaskExecutor的onStart方法

public void onStart() throws Exception {
   ...
   /*TODO 启动 TaskExecutor服务*/
   startTaskExecutorServices();
   ...
}

点startTaskExecutorServices方法直到RegisteredRpcConnection类的start方法

// 和上面的jobmaster的流程一样,用的一个抽象类RetryingRegistration

public void start() {
	...
   /*TODO 创建注册对象*/
   final RetryingRegistration<F, G, S> newRegistration = createNewRegistration();
   
   /*TODO 开始注册,注册成功之后,调用 onRegistrationSuccess()*/
   newRegistration.startRegistration();

}

①注册,点startRegistration方法到RetryingRegistration类,再点register方法,再调用invokeRegistration方法,这个方法是TaskExecutorToResourceManagerConnection类的内部类ResourceManagerRegistration的一个方法

protected CompletableFuture<RegistrationResponse> invokeRegistration(
      ResourceManagerGateway resourceManager, ResourceManagerId fencingToken, long timeoutMillis) throws Exception {

   Time timeout = Time.milliseconds(timeoutMillis);
   return resourceManager.registerTaskExecutor(
      taskExecutorRegistration,
      timeout);
}

②注册成功之后,在TaskExecutorToResourceManagerConnection类中找onRegistrationSuccess方法,点点点到TaskExecutor类的内部类ResourceManagerRegistrationListener的establishResourceManagerConnection方法

private void establishResourceManagerConnection(...) {

   final CompletableFuture<Acknowledge> slotReportResponseFuture = resourceManagerGateway.sendSlotReport(...);

   ...
}

点sendSlotReport方法到ResourceManager类

public CompletableFuture<Acknowledge> sendSlotReport(...) {

    ...
        
	slotManager.registerTaskManager(workerTypeWorkerRegistration, slotReport)

    ...
}

点registerTaskManager方法到SlotManagerImpl类

public boolean registerTaskManager(...) {
      ...
       
      // next register the new slots
      for (SlotStatus slotStatus : initialSlotReport) {
         registerSlot(
            slotStatus.getSlotID(),
            slotStatus.getAllocationID(),
            slotStatus.getJobID(),
            slotStatus.getResourceProfile(),
            taskExecutorConnection);
      }
    
    ...
}

点registerSlot方法

private void registerSlot(...) {

   ...

   /*TODO 创建和注册 新的这些 slot*/
   final TaskManagerSlot slot = createAndRegisterTaskManagerSlot(slotId, resourceProfile, taskManagerConnection);

  
      /*TODO 分配slot*/
      if (assignedPendingSlotRequest == null) {
         /*TODO 表示 挂起的请求都已经满足了,你暂时没事*/
         handleFreeSlot(slot);
      } else {
         /*TODO 表示 你要被分配给某个请求*/
         assignedPendingSlotRequest.unassignPendingTaskManagerSlot();
         allocateSlot(slot, assignedPendingSlotRequest);
      }
   
}

点allocateSlot方法

private void allocateSlot(TaskManagerSlot taskManagerSlot, PendingSlotRequest pendingSlotRequest) {
   ...
       
   TaskExecutorGateway gateway = taskExecutorConnection.getTaskExecutorGateway();

   // RPC call to the task manager
   /*TODO 分配完之后,通知 TM提供 slot给 JM*/
   CompletableFuture<Acknowledge> requestFuture = gateway.requestSlot(
      slotId,
      pendingSlotRequest.getJobId(),
      allocationId,
      pendingSlotRequest.getResourceProfile(),
      pendingSlotRequest.getTargetAddress(),
      resourceManagerId,
      taskManagerRequestTimeout);

   ...
}

点requestSlot方法到TaskExecutor类

public CompletableFuture<Acknowledge> requestSlot(
   ...
      /*TODO 根据 RM的命令,分配自己的slot*/
      allocateSlot(
         slotId,
         jobId,
         allocationId,
         resourceProfile);
   ...
    
      /*TODO 向JobManager提供 slot*/
      offerSlotsToJobManager(jobId);
    
   ...
}

a.分配自己的slot,点allocateSlot方法到TaskSlotTableImpl类

b.向JobManager提供 slot,点offerSlotsToJobManager方法直到本类的internalOfferSlotsToJobManager方法

private void internalOfferSlotsToJobManager(JobTable.Connection jobManagerConnection) {
   	  ...

      final JobMasterGateway jobMasterGateway = jobManagerConnection.getJobManagerGateway();
       
      ...

      CompletableFuture<Collection<SlotOffer>> acceptedSlotsFuture = jobMasterGateway.offerSlots(
         getResourceID(),
         reservedSlots,
         taskManagerConfiguration.getTimeout());

      ...
}

点offerSlots方法到JobMaster类

public CompletableFuture<Collection<SlotOffer>> offerSlots(...) {

   ...

   return CompletableFuture.completedFuture(
      slotPool.offerSlots(
         taskManagerLocation,
         rpcTaskManagerGateway,
         slots));
}

再点offerSlots到SlotPoolImpl类

public Collection<SlotOffer> offerSlots(...) {

   ArrayList<SlotOffer> result = new ArrayList<>(offers.size());

   for (SlotOffer offer : offers) {
      if (offerSlot(
         taskManagerLocation,
         taskManagerGateway,
         offer)) {

         result.add(offer);
      }
   }

   return result;
}

点offerSlot方法

boolean offerSlot(
      final TaskManagerLocation taskManagerLocation,
      final TaskManagerGateway taskManagerGateway,
      final SlotOffer slotOffer) {
    
    ...
        
     * @param taskManagerLocation location from where the offer comes from
	 * @param taskManagerGateway TaskManager gateway
	 * @param slotOffer the offered slot
	 * @return True if we accept the offering
         
   }

在这里插入图片描述

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

臭屁虾

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值