yarn2.7源码分析之ApplicationMaster与ResourceManager.ApplicationMasterService的通信

概述

ApplicationMaster与ResourceManager之间通信主要有以下三个步骤:
1、ApplicationMaster通过rpc向ResourceManager注册。ApplicationMaster启动时,首先向ResourceManager注册,注册消息封装到ProtocolBuffers消息RegisterApplicationMasterRequest中,主要包含以下字段:

  • host: ApplicationMaster启动所在的节点的host
  • rpc_port: ApplicationMaster本次启动对外rpc的端口号
  •   tracking_url: ApplicationMaster对外提供的追踪的web url,用户可以通过该url查看应用程序执行状态

ApplicationMaster注册成功后,ResourceManager将返回一个RegisterApplicationMasterResponse类型的对象。该返回对象主要包含以下几个字段:

  • maximumCapability: 最大可申请的单个Container的占用的资源量
  • application_ACLs: 应用程序访问控制列表

2、注册成功后,ApplicationMaster通过rpc向ResourceManager申请资源(资源以Container为单位),rpc请求中主要包含以下几个字段:

  • ask: AppliationMaster请求的资源列表,每个请求资源可以用ResourceRequest表示。ResourceRequest包含以下几个主要字段:
  • priority: 资源优先级,为正整数,值越小,优先级越高,分配的资源的优先级也就越高
  • resouce_name: 期望资源所在的节点,如果是*,表示任何机器上的资源都可以
  • capability: 所需的资源量,支持cpu和内存两种资源
  • num_container: 满足以上要求的资源数目
  • release: ApplicationMaster释放的资源列表

ResourceManager接受请求后,将返回以下一个AllocateResponse类型的对象,该对象主要包含以下字段:

  • a_m_command:AppliactionMaster需要执行的命令。主要有两个取值:AM_RESYNC表示重启,AM_SHUTDOWN表示关闭。当ResourceManager重启或者应用程序信息出现不一致的状态时,可能要求AppliactionMaster重启;当处于黑名单时,则要求ApplicationMaster关闭。
  • allocated_container: 分配给应用程序的Container列表(Container在mr中相当于task,在spark中相当于executor)

3、应用程序执行完毕后,ApplicationMaster将通过rpc告诉ResoureManager程序执行完毕并退出。

yarn-api层源码分析

1、ApplicationMasterProtocol

该接口描述了ApplicationMaster与ResourceManager之间通信的三个步骤:

  • registerApplicationMaster()方法:ApplicationMaster通过rpc向ResourceManager注册。
  • allocate()方法:注册成功后,ApplicationMaster通过rpc向ResourceManager申请资源。
  • finishApplicationMaster()方法:ApplicationMaster将通过rpc告诉ResoureManager程序执行完毕并退出。
/**
 * <p>The protocol between a live instance of <code>ApplicationMaster</code> 
 * and the <code>ResourceManager</code>.</p>
 * 
 * <p>This is used by the <code>ApplicationMaster</code> to register/unregister
 * and to request and obtain resources in the cluster from the
 * <code>ResourceManager</code>.</p>
 */
@Public
@Stable
public interface ApplicationMasterProtocol {

  
  public RegisterApplicationMasterResponse registerApplicationMaster(
      RegisterApplicationMasterRequest request) 
  throws YarnException, IOException;
  
  public FinishApplicationMasterResponse finishApplicationMaster(
      FinishApplicationMasterRequest request) 
  throws YarnException, IOException;

  
  public AllocateResponse allocate(AllocateRequest request) 
  throws YarnException, IOException;
}

yarn-common层基于protobuf的客户端实现

1、ApplicationMasterProtocolPBClientImpl

public class ApplicationMasterProtocolPBClientImpl implements ApplicationMasterProtocol, Closeable {

  private ApplicationMasterProtocolPB proxy;

  public ApplicationMasterProtocolPBClientImpl(long clientVersion, InetSocketAddress addr,
      Configuration conf) throws IOException {
    RPC.setProtocolEngine(conf, ApplicationMasterProtocolPB.class, ProtobufRpcEngine.class);
    proxy =
        (ApplicationMasterProtocolPB) RPC.getProxy(ApplicationMasterProtocolPB.class, clientVersion,
          addr, conf);
  }

  @Override
  public void close() {
    if (this.proxy != null) {
      RPC.stopProxy(this.proxy);
    }
  }

  @Override
  public AllocateResponse allocate(AllocateRequest request)
      throws YarnException, IOException {
    AllocateRequestProto requestProto =
        ((AllocateRequestPBImpl) request).getProto();
    try {
      return new AllocateResponsePBImpl(proxy.allocate(null, requestProto));
    } catch (ServiceException e) {
      RPCUtil.unwrapAndThrowException(e);
      return null;
    }
  }

  @Override
  public FinishApplicationMasterResponse finishApplicationMaster(
      FinishApplicationMasterRequest request) throws YarnException,
      IOException {
    FinishApplicationMasterRequestProto requestProto =
        ((FinishApplicationMasterRequestPBImpl) request).getProto();
    try {
      return new FinishApplicationMasterResponsePBImpl(
        proxy.finishApplicationMaster(null, requestProto));
    } catch (ServiceException e) {
      RPCUtil.unwrapAndThrowException(e);
      return null;
    }
  }

  @Override
  public RegisterApplicationMasterResponse registerApplicationMaster(
      RegisterApplicationMasterRequest request) throws YarnException,
      IOException {
    RegisterApplicationMasterRequestProto requestProto =
        ((RegisterApplicationMasterRequestPBImpl) request).getProto();
    try {
      return new RegisterApplicationMasterResponsePBImpl(
        proxy.registerApplicationMaster(null, requestProto));
    } catch (ServiceException e) {
      RPCUtil.unwrapAndThrowException(e);
      return null;
    }
  }
}

yarn-common层基于protpbuf的服务端实现

1、ApplicationMasterProtocolPBServiceImpl

它完完全全是ApplicationMasterService的代理类

public class ApplicationMasterProtocolPBServiceImpl implements ApplicationMasterProtocolPB {

//real将初始化为ApplicationMasterService
  private ApplicationMasterProtocol real;
  
  public ApplicationMasterProtocolPBServiceImpl(ApplicationMasterProtocol impl) {
    this.real = impl;
  }
  
  @Override
  public AllocateResponseProto allocate(RpcController arg0,
      AllocateRequestProto proto) throws ServiceException {
    AllocateRequestPBImpl request = new AllocateRequestPBImpl(proto);
    try {
//调用ApplicationMasterService#allocate()方法
      AllocateResponse response = real.allocate(request);
      return ((AllocateResponsePBImpl)response).getProto();
    } catch (YarnException e) {
      throw new ServiceException(e);
    } catch (IOException e) {
      throw new ServiceException(e);
    }
  }

  @Override
  public FinishApplicationMasterResponseProto finishApplicationMaster(
      RpcController arg0, FinishApplicationMasterRequestProto proto)
      throws ServiceException {
    FinishApplicationMasterRequestPBImpl request = new FinishApplicationMasterRequestPBImpl(proto);
    try {
//调用ApplicationMasterService#finishApplicationMaster()方法
      FinishApplicationMasterResponse response = real.finishApplicationMaster(request);
      return ((FinishApplicationMasterResponsePBImpl)response).getProto();
    } catch (YarnException e) {
      throw new ServiceException(e);
    } catch (IOException e) {
      throw new ServiceException(e);
    }
  }

  @Override
  public RegisterApplicationMasterResponseProto registerApplicationMaster(
      RpcController arg0, RegisterApplicationMasterRequestProto proto)
      throws ServiceException {
    RegisterApplicationMasterRequestPBImpl request = new RegisterApplicationMasterRequestPBImpl(proto);
    try {
//调用ApplicationMasterService#registerApplicationMaster()方法
      RegisterApplicationMasterResponse response = real.registerApplicationMaster(request);
      return ((RegisterApplicationMasterResponsePBImpl)response).getProto();
    } catch (YarnException e) {
      throw new ServiceException(e);
    } catch (IOException e) {
      throw new ServiceException(e);
    }
  }
}

ApplicationMasterService处理AM的注册

ApplicationMasterService接收到registerApplicationMaster的请求后,将向RMAppAttemptImpl发送一个RMAppAttemptEventType.registered事件,而RMAppAttemptImpl收到该事件后,首先保存ApplicationMaster的基本信息(所在host、启用的rpc端口号等),然后向RMAppImpl发送一个RMAppEventType.attempt_registered事件。至此,RMAppAttemptImpl状态由launched转为running,RMAppImpl状态由accepted转为running。

由此得出,RMApp状态机running的含义为:该application的ApplicationMaster成功在某个节点上运行。

 @Override
  public RegisterApplicationMasterResponse registerApplicationMaster(
      RegisterApplicationMasterRequest request) throws YarnException,
      IOException {

    AMRMTokenIdentifier amrmTokenIdentifier =
        YarnServerSecurityUtils.authorizeRequest();
    ApplicationAttemptId applicationAttemptId =
        amrmTokenIdentifier.getApplicationAttemptId();

    ApplicationId appID = applicationAttemptId.getApplicationId();
    AllocateResponseLock lock = responseMap.get(applicationAttemptId);
    if (lock == null) {
      RMAuditLogger.logFailure(this.rmContext.getRMApps().get(appID).getUser(),
          AuditConstants.REGISTER_AM, "Application doesn't exist in cache "
              + applicationAttemptId, "ApplicationMasterService",
          "Error in registering application master", appID,
          applicationAttemptId);
      throwApplicationDoesNotExistInCacheException(applicationAttemptId);
    }

    // Allow only one thread in AM to do registerApp at a time.
    synchronized (lock) {
      AllocateResponse lastResponse = lock.getAllocateResponse();
      if (hasApplicationMasterRegistered(applicationAttemptId)) {
        // allow UAM re-register if work preservation is enabled
        ApplicationSubmissionContext appContext =
            rmContext.getRMApps().get(appID).getApplicationSubmissionContext();
        if (!(appContext.getUnmanagedAM()
            && appContext.getKeepContainersAcrossApplicationAttempts())) {
          String message =
              AMRMClientUtils.APP_ALREADY_REGISTERED_MESSAGE + appID;
          LOG.warn(message);
          RMAuditLogger.logFailure(
              this.rmContext.getRMApps().get(appID).getUser(),
              AuditConstants.REGISTER_AM, "", "ApplicationMasterService",
              message, appID, applicationAttemptId);
          throw new InvalidApplicationMasterRequestException(message);
        }
      }

      this.amLivelinessMonitor.receivedPing(applicationAttemptId);

      // Setting the response id to 0 to identify if the
      // application master is register for the respective attemptid
      lastResponse.setResponseId(0);
      lock.setAllocateResponse(lastResponse);

      RegisterApplicationMasterResponse response =
          recordFactory.newRecordInstance(
              RegisterApplicationMasterResponse.class);
//由AMSProcessingChain责任链模式处理AM的注册
      this.amsProcessingChain.registerApplicationMaster(
          amrmTokenIdentifier.getApplicationAttemptId(), request, response);
      return response;
    }
  }

AMSProcessingChain的初始化

private final AMSProcessingChain amsProcessingChain;

 public ApplicationMasterService(String name, RMContext rmContext,
      YarnScheduler scheduler) {
    ......
    this.amsProcessingChain = new AMSProcessingChain(new DefaultAMSProcessor());
  }

DefaultAMSProcessor处理AM的注册

public void registerApplicationMaster(
      ApplicationAttemptId applicationAttemptId,
      RegisterApplicationMasterRequest request,
      RegisterApplicationMasterResponse response)
      throws IOException, YarnException {

    RMApp app = getRmContext().getRMApps().get(
        applicationAttemptId.getApplicationId());
    LOG.info("AM registration " + applicationAttemptId);
//向RMAppAttemptImpl发送一个RMAppAttemptEventType.registered事件
    getRmContext().getDispatcher().getEventHandler()
        .handle(
            new RMAppAttemptRegistrationEvent(applicationAttemptId, request
                .getHost(), request.getRpcPort(), request.getTrackingUrl()));
    RMAuditLogger.logSuccess(app.getUser(),
        RMAuditLogger.AuditConstants.REGISTER_AM,
        "ApplicationMasterService", app.getApplicationId(),
        applicationAttemptId);
    response.setMaximumResourceCapability(getScheduler()
        .getMaximumResourceCapability(app.getQueue()));
    response.setApplicationACLs(app.getRMAppAttempt(applicationAttemptId)
        .getSubmissionContext().getAMContainerSpec().getApplicationACLs());
    response.setQueue(app.getQueue());
    if (UserGroupInformation.isSecurityEnabled()) {
      LOG.info("Setting client token master key");
      response.setClientToAMTokenMasterKey(java.nio.ByteBuffer.wrap(
          getRmContext().getClientToAMTokenSecretManager()
          .getMasterKey(applicationAttemptId).getEncoded()));
    }

    // For work-preserving AM restart, retrieve previous attempts' containers
    // and corresponding NM tokens.
    if (app.getApplicationSubmissionContext()
        .getKeepContainersAcrossApplicationAttempts()) {
      List<Container> transferredContainers = getScheduler()
          .getTransferredContainers(applicationAttemptId);
      if (!transferredContainers.isEmpty()) {
        response.setContainersFromPreviousAttempts(transferredContainers);
        // Clear the node set remembered by the secret manager. Necessary
        // for UAM restart because we use the same attemptId.
        rmContext.getNMTokenSecretManager()
            .clearNodeSetForAttempt(applicationAttemptId);

        List<NMToken> nmTokens = new ArrayList<NMToken>();
        for (Container container : transferredContainers) {
          try {
            NMToken token = getRmContext().getNMTokenSecretManager()
                .createAndGetNMToken(app.getUser(), applicationAttemptId,
                    container);
            if (null != token) {
              nmTokens.add(token);
            }
          } catch (IllegalArgumentException e) {
            // if it's a DNS issue, throw UnknowHostException directly and
            // that
            // will be automatically retried by RMProxy in RPC layer.
            if (e.getCause() instanceof UnknownHostException) {
              throw (UnknownHostException) e.getCause();
            }
          }
        }
        response.setNMTokensFromPreviousAttempts(nmTokens);
        LOG.info("Application " + app.getApplicationId() + " retrieved "
            + transferredContainers.size() + " containers from previous"
            + " attempts and " + nmTokens.size() + " NM tokens.");
      }
    }

    response.setSchedulerResourceTypes(getScheduler()
        .getSchedulingResourceTypes());
    response.setResourceTypes(ResourceUtils.getResourcesTypeInfo());
    if (getRmContext().getYarnConfiguration().getBoolean(
        YarnConfiguration.RM_RESOURCE_PROFILES_ENABLED,
        YarnConfiguration.DEFAULT_RM_RESOURCE_PROFILES_ENABLED)) {
      response.setResourceProfiles(
          resourceProfilesManager.getResourceProfiles());
    }
  }

 RMAppAttemptRegistrationEvent定义如下:

public RMAppAttemptRegistrationEvent(ApplicationAttemptId appAttemptId,
      String host, int rpcPort, String trackingUrl) {
//registered类型的RMAppAttemptEventType
    super(appAttemptId, RMAppAttemptEventType.REGISTERED);
    this.host = host;
    this.rpcport = rpcPort;
    this.trackingurl = trackingUrl;
  }

 由以下spark on yarn的运行日志也可得出—— RMApp状态机running的含义为:该application的ApplicationMaster成功在某个节点上运行。

参考:ApplicationMaster,ResourceManager和NodeManager通信过程

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值