Hadoop3.2.1 【 YARN 】源码分析 : ApplicationMasterLauncher

一.前言

用户向YARN ResourceManager提交应用程序, ResourceManager收到提交请求后, 先向资源调度器申请用以启动ApplicationMaster的资源, 待申请到资源后,再由ApplicationMasterLauncher与对应的NodeManager通信, 从而启动应用程序的ApplicationMaster
在这里插入图片描述

二. 属性

一共有四个主要属性.
其中最重要的就三个.
masterEvents : 阻塞式队列
launcherPool: 工作线程池
launcherHandlingThread : 任务下发工作线程, 单线程执行, 用于监控masterEvents队列领取任务. 并将任务下发给launcherPool执行.


  // ApplicationMasterLauncher 线程池
  private ThreadPoolExecutor launcherPool;

  // [ 主 ] 工作线程, 单线程执行, 用于监控masterEvents队列,
  //                并将任务下发给launcherPool执行.
  private LauncherThread launcherHandlingThread;

  // 阻塞式队列
  private final BlockingQueue<Runnable> masterEvents  = new LinkedBlockingQueue<Runnable>();

  // RM的 context信息 : RMContextImpl
  protected final RMContext context;
  

三.构造方法

由ResourceManager在调用serviceInit方法时调用.
主要是构建主工作线程: launcherHandlingThread

  public ApplicationMasterLauncher(RMContext context) {
    super(ApplicationMasterLauncher.class.getName());
    this.context = context;
    // 构建工作线程
    this.launcherHandlingThread = new LauncherThread();
  }

四. serviceInit

主要是初始化工作线程池launcherPool.


  @Override
  protected void serviceInit(Configuration conf) throws Exception {

    // 默认线程数量 : 50
    int threadCount = conf.getInt(
        YarnConfiguration.RM_AMLAUNCHER_THREAD_COUNT,
        YarnConfiguration.DEFAULT_RM_AMLAUNCHER_THREAD_COUNT);
    ThreadFactory tf = new ThreadFactoryBuilder()
        .setNameFormat("ApplicationMasterLauncher #%d")
        .build();

    // 构架线程池
    launcherPool = new ThreadPoolExecutor(threadCount, threadCount, 1,
        TimeUnit.HOURS, new LinkedBlockingQueue<Runnable>());

    // 构建线程工厂类
    launcherPool.setThreadFactory(tf);

    Configuration newConf = new YarnConfiguration(conf);

    // 最大重试次数: 10
    newConf.setInt(CommonConfigurationKeysPublic.
            IPC_CLIENT_CONNECT_MAX_RETRIES_ON_SOCKET_TIMEOUTS_KEY,
        conf.getInt(YarnConfiguration.RM_NODEMANAGER_CONNECT_RETRIES,
            YarnConfiguration.DEFAULT_RM_NODEMANAGER_CONNECT_RETRIES));
    setConfig(newConf);
    super.serviceInit(newConf);
  }

五.serviceStart

这里没有啥说的,主要是启动主工作线程launcherHandlingThread

  @Override
  protected void serviceStart() throws Exception {
    launcherHandlingThread.start();
    super.serviceStart();
  }

六.LauncherThread

LauncherThread是主工作线程 , 主要是从队列masterEvents中领取任务.然后交由工作线程池launcherPool执行.

@Override
    public void run() {
      while (!this.isInterrupted()) {
        Runnable toLaunch;
        try {
          // 从队列汇总获取任务.
          toLaunch = masterEvents.take();
          // 线程池用于处理 masterEvents 队列中的事件
          launcherPool.execute(toLaunch);
        } catch (InterruptedException e) {
          LOG.warn(this.getClass().getName() + " interrupted. Returning.");
          return;
        }
      }
    }

七. handle

主要处理AMLauncherEvent 时间. 根据事件类型处理任务.
主要有两种任务类型 LAUNCH 和 CLEANUP

@Override
  public synchronized void  handle(AMLauncherEvent appEvent) {
    AMLauncherEventType event = appEvent.getType();
    RMAppAttempt application = appEvent.getAppAttempt();
    switch (event) {
    case LAUNCH:
      // 处理启动类型事件
      launch(application);
      break;
    case CLEANUP:
      // 处理清理操作
      cleanup(application);
      break;
    default:
      break;
    }
  }

八. launch

主要是构建启动类型的任务加入到masterEvents, 稍后会有主线程执行.

  private void launch(RMAppAttempt application) {
    // 构建启动类型的任务
    Runnable launcher = createRunnableLauncher(application,  AMLauncherEventType.LAUNCH);
    
    // 添加任务队列
    masterEvents.add(launcher);
  }



  //构建 AMLauncher
  protected Runnable createRunnableLauncher(RMAppAttempt application, 
      AMLauncherEventType event) {
    Runnable launcher =  new AMLauncher(context, application, event, getConfig());
    return launcher;
  }

在这里,我们看到 这里构建的是AMLauncher , 交由工作线程池开始执行.

九.AMLauncher

负责两种类型的事件 LAUNCH 和 CLEANUP 两种事件

9.1. 通讯协议ContainerManagementProtocol

ContainerManagementProtocol: AM与NM之间的协议, AM通过该RPC要求NM启动或者停止Container, 获取各个Container的使用状态等信息。

方法名称描述
startContainers启动容器
stopContainers停止容器
getContainerStatuses获取容器状态
increaseContainersResource [废弃]增加容器资源
updateContainer更新容器
signalToContainer发送信号
localize本地化容器所需的资源,目前,此API仅适用于运行容器
reInitializeContainer使用新的 Launch Context 初始化容器
restartContainer重新启动容器
rollbackLastReInitialization尝试回滚最后一次重新初始化操作
commitLastReInitialization尝试提交最后一次初始化操作,如果提交成功则不可以回滚.

ContainerManagementProtocol协议主要提供了以下三个RPC函数 :
❑ startContainer: ApplicationMaster通过该RPC要求NodeManager启动一个Container。该函数有一个StartContainerRequest类型的参数, 封装了Container启动所需的本地资源、 环境变量、 执行命令、 Token等信息。 如果Container启动成功, 则该函数返回一个StartContainerResponse对象。
❑ stopContainer: ApplicationMaster通过该RPC要求NodeManager停止( 杀死) 一个Container。 该函数有一个StopContainerRequest类型的参数, 用于指定待杀死的ContainerID。 如果Container被成功杀死, 则该函数返回一个StopContainer-Response对象。
❑ getContainerStatus: ApplicationMaster通过该RPC获取一个Container的运行状态。 该函数参数类型为GetContainerStatusRequest, 封装了目标Container的ID, 返回值为封装了Container当前运行状态的类型为GetContainerStatusResponse的对象。

9.2. 构造方法

ApplicationMasterLauncher#createRunnableLauncher进行调用, 生成AMLauncher 对象, 然后加入到masterEvents队列,
等到主线程launcherHandlingThread 获取交由工作线程launcherPool 调用.

在这里插入图片描述

  public AMLauncher(RMContext rmContext, RMAppAttempt application,
      AMLauncherEventType eventType, Configuration conf) {
    this.application = application;
    this.conf = conf;
    this.eventType = eventType;
    this.rmContext = rmContext;
    this.handler = rmContext.getDispatcher().getEventHandler();
    this.masterContainer = application.getMasterContainer();
    this.timelineServiceV2Enabled = YarnConfiguration.
        timelineServiceV2Enabled(conf);
  }

9.3. run

核心方法, 线程池ApplicationMasterLauncher#launcherPool 执行.
根据事件类型eventType 处理不同的事件.


  @SuppressWarnings("unchecked")
  public void run() {

    switch (eventType) {
    // 启动AM
    case LAUNCH:
      try {
        LOG.info("Launching master" + application.getAppAttemptId());

        // 尝试启动Container
        launch();

        // 处理RMAppAttemptEvent启动事件
        handler.handle(new RMAppAttemptEvent(application.getAppAttemptId(),
            RMAppAttemptEventType.LAUNCHED, System.currentTimeMillis()));
      } catch(Exception ie) {
        onAMLaunchFailed(masterContainer.getId(), ie);
      }
      break;
    // 执行请求操作.
    case CLEANUP:
      try {
        LOG.info("Cleaning master " + application.getAppAttemptId());
        
        // 执行清理操作
        cleanup();
      } catch(IOException ie) {
        LOG.info("Error cleaning master ", ie);
      } catch (YarnException e) {
        StringBuilder sb = new StringBuilder("Container ");
        sb.append(masterContainer.getId().toString());
        sb.append(" is not handled by this NodeManager");
        if (!e.getMessage().contains(sb.toString())) {
          // Ignoring if container is already killed by Node Manager.
          LOG.info("Error cleaning master ", e);
        }
      }
      break;
    default:
      LOG.warn("Received unknown event-type " + eventType + ". Ignoring.");
      break;
    }
  }

9.3. launch

与NodeManager建立连接, 通过ContainerManagementProtocol#startContainer 协议发送信息启动 启动Container

private void launch() throws IOException, YarnException {
    // 建立连接
    connect();

    // 获取ContainerId:   container_1611506953824_0001_01_000001
    ContainerId masterContainerID = masterContainer.getId();

    // 获取applicationContext信息
    // application_id { id: 1 cluster_timestamp: 1611506953824 }
    // application_name: "org.apache.spark.examples.SparkPi"
    // queue: "default"
    // priority { priority: 0 }
    // am_container_spec { localResources { key: "__spark_conf__" value { resource { scheme: "hdfs" host: "localhost" port: 8020 file: "/user/henghe/.sparkStaging/application_1611506953824_0001/__spark_conf__.zip" } size: 250024 timestamp: 1611512474054 type: ARCHIVE visibility: PRIVATE } }
    // localResources { key: "__app__.jar" value { resource { scheme: "hdfs" host: "localhost" port: 8020 file: "/user/henghe/.sparkStaging/application_1611506953824_0001/spark-examples_2.11-2.4.5.jar" } size: 1475072 timestamp: 1611512473631 type: FILE visibility: PRIVATE } } tokens: "HDTS\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" environment { key: "SPARK_YARN_STAGING_DIR" value: "hdfs://localhost:8020/user/henghe/.sparkStaging/application_1611506953824_0001" }
    // environment { key: "SPARK_USER" value: "henghe" }
    // environment { key: "CLASSPATH" value: "{{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/*<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/lib/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__" }
    // environment { key: "PYTHONHASHSEED" value: "0" } command: "{{JAVA_HOME}}/bin/java" command: "-server" command: "-Xmx1024m" command: "-Djava.io.tmpdir={{PWD}}/tmp" command: "-Dspark.yarn.app.container.log.dir=<LOG_DIR>" command: "org.apache.spark.deploy.yarn.ApplicationMaster" command: "--class" command: "\'org.apache.spark.examples.SparkPi\'" command: "--jar" command: "file:/opt/tools/spark-2.4.5/examples/jars/spark-examples_2.11-2.4.5.jar" command: "--arg" command: "\'10\'" command: "--properties-file" command: "{{PWD}}/__spark_conf__/__spark_conf__.properties" command: "1>" command: "<LOG_DIR>/stdout" command: "2>" command: "<LOG_DIR>/stderr" application_ACLs { accessType: APPACCESS_VIEW_APP acl: "sysadmin,henghe " } application_ACLs { accessType: APPACCESS_MODIFY_APP acl: "sysadmin,henghe " } } resource { memory: 2048 virtual_cores: 1 resource_value_map { key: "memory-mb" value: 2048 units: "Mi" type: COUNTABLE } resource_value_map { key: "vcores" value: 1 units: "" type: COUNTABLE } } applicationType: "SPARK"
    ApplicationSubmissionContext applicationContext =    application.getSubmissionContext();
    //  Setting up container Container:  [ContainerId: container_1611513130854_0001_01_000001,  AllocationRequestId: -1,  Version: 0,
    //  NodeId: boyi-pro.lan:56960,
    //  NodeHttpAddress: boyi-pro.lan:8042,
    //  Resource: <memory:2048, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: boyi-pro.lan:56960 }, ExecutionType: GUARANTEED, ] for AM appattempt_1611513130854_0001_000001
    LOG.info("Setting up container " + masterContainer  + " for AM " + application.getAppAttemptId());
    ContainerLaunchContext launchContext = createAMContainerLaunchContext(applicationContext, masterContainerID);




    // container_launch_context { localResources { key: "__spark_conf__" value { resource { scheme: "hdfs" host: "localhost" port: 8020 file: "/user/henghe/.sparkStaging/application_1611513130854_0001/__spark_conf__.zip" } size: 250024 timestamp: 1611513158992 type: ARCHIVE visibility: PRIVATE } } localResources { key: "__app__.jar" value { resource { scheme: "hdfs" host: "localhost" port: 8020 file: "/user/henghe/.sparkStaging/application_1611513130854_0001/spark-examples_2.11-2.4.5.jar" } size: 1475072 timestamp: 1611513158274 type: FILE visibility: PRIVATE } } tokens: "HDTS\000\001\000\032\n\r\n\t\b\001\020\346\336\253\255\363.\020\001\020\312\207\343\314\372\377\377\377\377\001\024\256X\203p\026\324\327\356^;\212z\314\333\276\361\263\227\246\236\020YARN_AM_RM_TOKEN\000\000"
    // environment { key: "SPARK_YARN_STAGING_DIR" value: "hdfs://localhost:8020/user/henghe/.sparkStaging/application_1611513130854_0001" }
    // environment { key: "APPLICATION_WEB_PROXY_BASE" value: "/proxy/application_1611513130854_0001" }
    // environment { key: "SPARK_USER" value: "henghe" }
    // environment { key: "CLASSPATH" value: "{{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/*<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/lib/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__" }
    // environment { key: "PYTHONHASHSEED" value: "0" }
    // environment { key: "APP_SUBMIT_TIME_ENV" value: "1611513161745" } command: "{{JAVA_HOME}}/bin/java" command: "-server" command: "-Xmx1024m" command: "-Djava.io.tmpdir={{PWD}}/tmp" command: "-Dspark.yarn.app.container.log.dir=<LOG_DIR>" command: "org.apache.spark.deploy.yarn.ApplicationMaster" command: "--class" command: "\'org.apache.spark.examples.SparkPi\'" command: "--jar" command: "file:/opt/tools/spark-2.4.5/examples/jars/spark-examples_2.11-2.4.5.jar" command: "--arg" command: "\'10\'" command: "--properties-file" command: "{{PWD}}/__spark_conf__/__spark_conf__.properties" command: "1>" command: "<LOG_DIR>/stdout" command: "2>" command: "<LOG_DIR>/stderr" application_ACLs { accessType: APPACCESS_MODIFY_APP acl: "sysadmin,henghe " } application_ACLs { accessType: APPACCESS_VIEW_APP acl: "sysadmin,henghe " } } container_token { identifier: "\n\021\022\r\n\t\b\001\020\346\336\253\255\363.\020\001\030\001\022\022boyi-pro.lan:56960\032\006henghe\"+\b\200\020\020\001\032\024\n\tmemory-mb\020\200\020\032\002Mi \000\032\016\n\006vcores\020\001\032\000 \000(\361\251\322\255\363.0\265\254\237\350\0068\346\336\253\255\363.B\002\b\000H\245\332\255\255\363.Z\000`\001h\001p\000x\377\377\377\377\377\377\377\377\377\001" password: "$g\341\316Mw\377\fC\270\v\026<\242a\325\027\354\331\354" kind: "ContainerToken" service: "boyi-pro.lan:56960" }
    StartContainerRequest scRequest =
        StartContainerRequest.newInstance(launchContext,
          masterContainer.getContainerToken());

    List<StartContainerRequest> list = new ArrayList<StartContainerRequest>();

    list.add(scRequest);

    StartContainersRequest allRequests =  StartContainersRequest.newInstance(list);


    // 获取响应信息
    StartContainersResponse response =  containerMgrProxy.startContainers(allRequests);

    if (response.getFailedRequests() != null
        && response.getFailedRequests().containsKey(masterContainerID)) {
      Throwable t =
          response.getFailedRequests().get(masterContainerID).deSerialize();
      parseAndThrowException(t);
    } else {

      // succeeded_requests { app_attempt_id { application_id { id: 1 cluster_timestamp: 1611514283537 } attemptId: 1 } id: 1 }
      LOG.info("Done launching container " + masterContainer + " for AM "  + application.getAppAttemptId());
    }
  }

9.5. cleanup


  private void cleanup() throws IOException, YarnException {
    
    // 建立连接
    connect();
    
    // 获取容器的id
    ContainerId containerId = masterContainer.getId();
    List<ContainerId> containerIds = new ArrayList<ContainerId>();
    containerIds.add(containerId);
    
    // 构建请求
    StopContainersRequest stopRequest =  StopContainersRequest.newInstance(containerIds);
    
    // 发送请求
    StopContainersResponse response =  containerMgrProxy.stopContainers(stopRequest);
    
    // 处理响应信息
    if (response.getFailedRequests() != null
        && response.getFailedRequests().containsKey(containerId)) {
      Throwable t = response.getFailedRequests().get(containerId).deSerialize();
      parseAndThrowException(t);
    }
  }
已标记关键词 清除标记
相关推荐
©️2020 CSDN 皮肤主题: 鲸 设计师:meimeiellie 返回首页