Hadoop源码分析(四) ApplicationMasterLauncher源码分析 2021SC@SDUSC


2021SC@SDUSC

一、ApplicationMasterLauncher简介

上一章,介绍了ApplicationMaster的有关内容,其中ApplicationMasterLauncher是ApplicationMaster管理的一部分,它主要负责与NodeManager通信,以完成ApplicationMaster的启动。当用户向yarn ResourceManager提交应用程序时,ResourceManager收到提交请求后,先向资源调度器申请用以启动ApplicationMaster的资源,申请到资源以后,再由ApplicationMasterLauncher与对应的NodeManager通信,从而启动应用程序的ApplicationMaster。

二、ApplicationMasterLauncher工作流程

AML应用流程
ApplicationMasterLauncher主要处理AMLauncherEventType.LAUNCH和AMLauncherEventType.CLEANUP两类事件。ApplicationMasterLauncher主要是基于生产者消费者模式,根据AttemptStoredTransition生成的AMLauncherEventType.LAUNCH事件生成对应类型的AMLauncher放入masterEvents事件队列。ApplicationMasterLauncher维护了一个线程池,从而能够尽快地处理事件:如果ApplicationMasterLauncher收到LAUNCH事件,它会与对应的NodeManager通信,要求它启动ApplicationMaster;如果ApplicationMasterLauncher收到CLEANUP事件,它与对应的NodeManager通信,要求它杀死ApplicationMaster。
而ApplicationMasterLauncher初始化启动时,会启动launcherHandlingThread线程,该线程又维护了一个线程池ThreadPoolExecutor launcherPool,也会异步执行AMLauncher对应的launch方法或cleanup方法。

三、ApplicationMasterLauncher源码分析

3.1 主要属性

launcherPool:工作线程池
laucherHandlingPool:工作线程
masterEvents:阻塞式队列
context:RM的context信息

  //org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher.Java
  //ApplicationMasterLauncher线程池
  private ThreadPoolExecutor launcherPool;
  //工作线程,也是该部分的核心线程,用于监控masterEvents队列,并将任务下发给launcherPool执行。
  private LauncherThread launcherHandlingThread;
  //阻塞式队列
  private final BlockingQueue<Runnable> masterEvents
    = new LinkedBlockingQueue<Runnable>();
  //RM的context信息: RMContextImpl
  protected final RMContext context;

3.2 初始化方法

serviceInit方法,主要用于初始化工作线程池的launcherPool。

  @Override
  protected void serviceInit(Configuration conf) throws Exception {
    //设置默认线程数量为50
    int threadCount = conf.getInt(
        YarnConfiguration.RM_AMLAUNCHER_THREAD_COUNT,
        YarnConfiguration.DEFAULT_RM_AMLAUNCHER_THREAD_COUNT);
    ThreadFactory tf = new ThreadFactoryBuilder()
        .setNameFormat("ApplicationMasterLauncher #%d")
        .build();
    //构建launcherPool,工作线程池
    launcherPool = new ThreadPoolExecutor(threadCount, threadCount, 1,
        TimeUnit.HOURS, new LinkedBlockingQueue<Runnable>());
    //构建launcherPool线程工厂类
    launcherPool.setThreadFactory(tf);

    Configuration newConf = new YarnConfiguration(conf);
    //设置重置次数,最大为10次
    newConf.setInt(CommonConfigurationKeysPublic.
            IPC_CLIENT_CONNECT_MAX_RETRIES_ON_SOCKET_TIMEOUTS_KEY,
        conf.getInt(YarnConfiguration.RM_NODEMANAGER_CONNECT_RETRIES,
            YarnConfiguration.DEFAULT_RM_NODEMANAGER_CONNECT_RETRIES));
    setConfig(newConf);
    super.serviceInit(newConf);
  }

3.3 构造方法

当ResourceManager调用serviceInit方法时调用该构造方法,主要是构建工作线程: launcherHandlingThread。

  public ApplicationMasterLauncher(RMContext context) {
    super(ApplicationMasterLauncher.class.getName());
    this.context = context;
    //构建工作线程:  launcherHandlingThread
    this.launcherHandlingThread = new LauncherThread();
  }

3.4 启动方法

serviceStart方法,主要是启动工作线程launcherHandlingThread

  @Override
  protected void serviceStart() throws Exception {
    //启动
    launcherHandlingThread.start();
    super.serviceStart();
  }

3.5 run方法

LauncherThread是主工作线程,run方法主要从队列masterEvents中领取任务,然后交给工作线程池launcherPool执行。

  private class LauncherThread extends Thread {
    
    public LauncherThread() {
      super("ApplicationMaster Launcher");
    }

    @Override
    public void run() {
      while (!this.isInterrupted()) {
        Runnable toLaunch;
        try {
          //从队列masterEvents中领取任务
          toLaunch = masterEvents.take();
          //用于处理masterEvents队列中的事件的线程池
          launcherPool.execute(toLaunch);
        } catch (InterruptedException e) {
          LOG.warn(this.getClass().getName() + " interrupted. Returning.");
          return;
        }
      }
    }
  } 

3.6 handle方法

处理AMLauncherEvent事件,根据事件的类型处理相应的任务,主要有两种任务:LAUNCH和CLEANUP。

  @Override
  public synchronized void  handle(AMLauncherEvent appEvent) {
    AMLauncherEventType event = appEvent.getType();
    RMAppAttempt application = appEvent.getAppAttempt();
    switch (event) {
    //LAUNCH事件,用于启动ApplicationMaster
    case LAUNCH:
      launch(application);
      break;
    //CLEANUP事件,用于杀死ApplicationMaster
    case CLEANUP:
      cleanup(application);
      break;
    default:
      break;
    }
  }

3.7 launch方法

主要是构建启动类型的任务加入到masterEvents,之后会有主线程执行

  //构建AMLauncher,交由工作线程池执行
  protected Runnable createRunnableLauncher(RMAppAttempt application, 
      AMLauncherEventType event) {
    Runnable launcher =
        new AMLauncher(context, application, event, getConfig());
    return launcher;
  }
  
  private void launch(RMAppAttempt application) {
    //构建启动类型的任务
    Runnable launcher = createRunnableLauncher(application, 
        AMLauncherEventType.LAUNCH);
    //添加任务队列
    masterEvents.add(launcher);
  }
  

3.8 AMLauncher

AMLauncher构建好之后,交由工作线程池执行,主要负责处理LAUNCH和CLEANUP事件。

3.8.1 AMLauncher通讯协议

ContainerManagementProtocol是AM与NM之间的通讯协议,AM通过与该RPC协议通信要求NM启动或者停止Container,获取各个Container的使用状态等信息。
该协议主要提供了三个RPC函数:
(1)startContainers:AM通过该RPC要求NM启动一个Container。该函数有一个StartContainerRequest类型的参数,封装了Container启动所需的本地资源、环境变量、执行命令、Token等信息。若Container启动成功,则该函数返回一个StartContainerResponse对象。

  @Public
  @Stable
  StartContainersResponse startContainers(StartContainersRequest request)
      throws YarnException, IOException;

(2)stopContainers:AM通过该RPC要求NM杀死一个Container,有一个StartContainerRequest类型的参数,用于指定待杀死的ContainerID。若Container被成功杀死,则该函数返回一个StartContainerResponse对象。

  @Public
  @Stable
  StopContainersResponse stopContainers(StopContainersRequest request)
      throws YarnException, IOException;

(3)getContainerStatuses:AM通过该RPC获取一个Container的运行状态。该函数参数类型为GetContainerStatusRequest,封装了目标Container的ID,返回值为封装了Container当前运行状态的类型为GetContainerStatusResponse对象。

  @Public
  @Stable
  GetContainerStatusesResponse getContainerStatuses(
      GetContainerStatusesRequest request) throws YarnException,
      IOException;

3.8.2 AMLauncher构造方法

ApplicationMasterLauncher中的createRunnableLauncher方法调用后,生成AMLauncher 对象,然后加入到masterEvents队列,等到主线程launcherHandlingThread 获取交由工作线程launcherPool 调用。

  @SuppressWarnings("rawtypes")
  private final EventHandler handler;

  public AMLauncher(RMContext rmContext, RMAppAttempt application,
      AMLauncherEventType eventType, Configuration conf) {
    this.application = application;
    this.conf = conf;
    this.eventType = eventType;
    this.rmContext = rmContext;
    this.handler = rmContext.getDispatcher().getEventHandler();
    this.masterContainer = application.getMasterContainer();
    this.timelineServiceV2Enabled = YarnConfiguration.
        timelineServiceV2Enabled(conf);
  }

3.8.3 AMLauncher的run方法

线程池ApplicationMasterLauncher中的launcherPool 执行后,根据事件类型eventType处理对应的事件,即LAUNCH和CLEANUP事件。

  @SuppressWarnings("unchecked")
  public void run() {
    switch (eventType) {
    //处理LAUNCH事件,启动AM
    case LAUNCH:
      try {
        LOG.info("Launching master" + application.getAppAttemptId());
        //尝试启动Container
        launch();
        //处理RMAppAttemptEvent启动事件
        handler.handle(new RMAppAttemptEvent(application.getAppAttemptId(),
            RMAppAttemptEventType.LAUNCHED, System.currentTimeMillis()));
      } catch(Exception ie) {
        onAMLaunchFailed(masterContainer.getId(), ie);
      }
      break;
    //处理CLEANUP事件,执行请求操作
    case CLEANUP:
      try {
        LOG.info("Cleaning master " + application.getAppAttemptId());
        //执行清理操作
        cleanup();
      } catch(IOException ie) {
        LOG.info("Error cleaning master ", ie);
      } catch (YarnException e) {
        StringBuilder sb = new StringBuilder("Container ");
        sb.append(masterContainer.getId().toString())
            .append(" is not handled by this NodeManager");
        if (!e.getMessage().contains(sb.toString())) {
          // Ignoring if container is already killed by Node Manager.
          LOG.info("Error cleaning master ", e);
        }
      }
      break;
    default:
      LOG.warn("Received unknown event-type " + eventType + ". Ignoring.");
      break;
    }
  }

3.8.4 AMLauncher的launch方法

AM通过launch方法与NM建立连接,通过ContainerManagementProtocol中的startContainers协议发送信息启动Container。ApplicationMasterLauncher收到LAUNCH事件,它会与对应的NodeManager通信,要求它启动ApplicationMaster。

  private void launch() throws IOException, YarnException {
    //建立连接
    connect();
    //获取containerId
    ContainerId masterContainerID = masterContainer.getId();
    //获取applicationContext信息
    ApplicationSubmissionContext applicationContext =
        application.getSubmissionContext();
    LOG.info("Setting up container " + masterContainer
        + " for AM " + application.getAppAttemptId());
    ContainerLaunchContext launchContext =
        createAMContainerLaunchContext(applicationContext, masterContainerID);

    StartContainerRequest scRequest =
        StartContainerRequest.newInstance(launchContext,
          masterContainer.getContainerToken());
    List<StartContainerRequest> list = new ArrayList<StartContainerRequest>();
    list.add(scRequest);
    StartContainersRequest allRequests =
        StartContainersRequest.newInstance(list);

    //获取响应信息
    StartContainersResponse response =
        containerMgrProxy.startContainers(allRequests);
    if (response.getFailedRequests() != null
        && response.getFailedRequests().containsKey(masterContainerID)) {
      Throwable t =
          response.getFailedRequests().get(masterContainerID).deSerialize();
      parseAndThrowException(t);
    } else {
      //succeeded_requests,成功的请求
      LOG.info("Done launching container " + masterContainer + " for AM "
          + application.getAppAttemptId());
    }
  }

3.8.5 AMLauncher的cleanup方法

ApplicationMasterLauncher收到CLEANUP事件,它与对应的NodeManager通信,要求它杀死ApplicationMaster。

  private void cleanup() throws IOException, YarnException {
    //建立连接
    connect();
    //获取container的id信息
    ContainerId containerId = masterContainer.getId();
    List<ContainerId> containerIds = new ArrayList<ContainerId>();
    containerIds.add(containerId);
    //构建stop请求
    StopContainersRequest stopRequest =
        StopContainersRequest.newInstance(containerIds);
    //发送stop请求
    StopContainersResponse response =
        containerMgrProxy.stopContainers(stopRequest);
    //处理响应信息
    if (response.getFailedRequests() != null
        && response.getFailedRequests().containsKey(containerId)) {
      Throwable t = response.getFailedRequests().get(containerId).deSerialize();
      parseAndThrowException(t);
    }
  }

之后会详细介绍ApplicationMasterService和AMLivelinessMonitor的相关内容,以便完善对ApplicationMaster的分析研究。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值