HadoopSourceAnalyse---RMAppAttempt FSM

Overview

RMAppAttempt state machine


图 1-1

APP_ACCEPTED Handle

RMAppAttempt 由RMApp创建并启动,向scheduler 提交靖求之后进入submited 状态。 scheduler 验证请求,并创建一个内部App对像并提交到queue,等待调度,向dispatcher 发送APP_ACCEPTED消息,最终该消息将由RMAppAttempt处理:(这里以CapacityScheduler为例)
    FiCaSchedulerApp SchedulerApp = 
        new FiCaSchedulerApp(applicationAttemptId, user, queue, 
            queue.getActiveUsersManager(), rmContext);

    // Submit to the queue
    try {
      queue.submitApplication(SchedulerApp, user, queueName);
    } catch (AccessControlException ace) {
      LOG.info("Failed to submit application " + applicationAttemptId + 
          " to queue " + queueName + " from user " + user, ace);
      this.rmContext.getDispatcher().getEventHandler().handle(
          new RMAppAttemptRejectedEvent(applicationAttemptId, 
              ace.toString()));
      return;
    }

    applications.put(applicationAttemptId, SchedulerApp);

    LOG.info("Application Submission: " + applicationAttemptId + 
        ", user: " + user +
        " queue: " + queue +
        ", currently active: " + applications.size());

    rmContext.getDispatcher().getEventHandler().handle(
        new RMAppAttemptEvent(applicationAttemptId,
            RMAppAttemptEventType.APP_ACCEPTED)); 
收到该事件,状态机,会调用ScheduleTransition,将自己注册到执行等待队例,然后状态机进入scheduled状态,如果master是可管理的;

CONTAINER_ALLOCATED Handle

状态机进入该状态之后,系统将等待 NM node的下一次heartbeat消息,收到消之后,scheduler会检测该node的当前可用capacity,有capacity,将在该node上为App分配一个container 对像:
In  LeafQueue
  // Create the container if necessary
    Container container = 
        getContainer(rmContainer, application, node, capability, priority);
  
    // something went wrong getting/creating the container 
    if (container == null) {
      LOG.warn("Couldn't get container for allocation!");
      return Resources.none();
    }

    // Can we allocate a container on this node?
    int availableContainers = 
        resourceCalculator.computeAvailableContainers(available, capability);
    if (availableContainers > 0) {
      // Allocate...

      // Did we previously reserve containers at this 'priority'?
      if (rmContainer != null){
        unreserve(application, priority, node, rmContainer);
      }

      // Create container tokens in secure-mode
      if (UserGroupInformation.isSecurityEnabled()) {
        ContainerToken containerToken = 
            createContainerToken(application, container);
        if (containerToken == null) {
          // Something went wrong...
          return Resources.none();
        }
        container.setContainerToken(containerToken);
      }
      
      // Inform the application
      RMContainer allocatedContainer = 
          application.allocate(type, node, priority, request, container);

      // Does the application need this resource?
      if (allocatedContainer == null) {
        return Resources.none();
      }

      // Inform the node
      node.allocateContainer(application.getApplicationId(), 
          allocatedContainer);
 第一个container 用来运行ApplicationMaster, 

Container 分配成功之后,AppAttempt将向Scheduler请求已分配的container,并设定为Master container,
 // Acquire the AM container from the scheduler.
      Allocation amContainerAllocation = appAttempt.scheduler.allocate(
          appAttempt.applicationAttemptId, EMPTY_CONTAINER_REQUEST_LIST,
          EMPTY_CONTAINER_RELEASE_LIST);

      // Set the masterContainer
      appAttempt.setMasterContainer(amContainerAllocation.getContainers().get(
                                                                           0));

然后通知 state Store 保存当前App状态,AppAttempt 进入ALLOCATE_SAVING状态  保存完成之后,AppAttempt会收到一个 ATTEMP_SAVED通知。

ATTEMP_SAVED Handle

状态机收到该事件之后,开始加载并启动container,使得master得以开始运行:

  private void launchAttempt(){
    // Send event to launch the AM Container
    eventHandler.handle(new AMLauncherEvent(AMLauncherEventType.LAUNCH, this));
  }

 private void launch() throws IOException {
    connect();
    ContainerId masterContainerID = masterContainer.getId();
    ApplicationSubmissionContext applicationContext =
      application.getSubmissionContext();
    LOG.info("Setting up container " + masterContainer
        + " for AM " + application.getAppAttemptId());  
    ContainerLaunchContext launchContext =
        createAMContainerLaunchContext(applicationContext, masterContainerID);
    StartContainerRequest request = 
        recordFactory.newRecordInstance(StartContainerRequest.class);
    request.setContainerLaunchContext(launchContext);
    request.setContainer(masterContainer);
    containerMgrProxy.startContainer(request);
    LOG.info("Done launching container " + masterContainer
        + " for AM " + application.getAppAttemptId());
  }
LAUNCHE 成功之后,会收到 LAUNCHED可件通知:

LAUNCHED Handle

收到LAUNCHED通知之后,AppAttempt向监视线程注册, 之后等待Master启动运行的消息,master 启动之后,必须要向ResourceManager注册自己, 这时Resourcemanager会把这个注册事件发给appAttempt处理,

REGISTERED Handle

AppAttempt 收到 register 消息之后,保存master运行的相关信息,(host, port, trackingurl)然后通知App:
 // Let the app know
      appAttempt.eventHandler.handle(new RMAppEvent(appAttempt
          .getAppAttemptId().getApplicationId(),
          RMAppEventType.ATTEMPT_REGISTERED));

ApplicationMaster 注册之后, AM会一直发送heartbeat 消息,通过 调用ApplicationMasterService.allocate() 方法, 收到applicationMaster的heartbeat 消息之后,Scheduler会为先向RMContainer发送Acquired 事件更新已经为AM分配的container状态,RMContainer 状态更新之后发送ContainerAcquired事件通知RMAppAttempt,



CONTAINER_ACQIRED Handle

当RMAppAttempt 收到该事件后,把该container 所属的node加放自己的runnodes set中去。
appAttempt.ranNodes.add(acquiredEvent.getContainer().getNodeId());

UNREGSITERD Handle

当任务执行完成之后,AM会向 ApplicationMasterService 注销自己,AppAttempt会收到unregsitered 事件通知,appatempt会执行一系列的清除工作,最后退出。

转载于:https://www.cnblogs.com/tnangle/archive/2013/04/28/3376704.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值