YARN distributedshell AM 源码流程查看1

2 篇文章 0 订阅
2 篇文章 0 订阅

前言

  •  1. distributedshell   AM 流程有点复杂,我只是简单把流程描述,排版有点low,现实样式我会慢慢改进
  •   2 .能力有限,如有错误请在评论处指出
  •   3. 源码太长,遇到“....”说明省略部分源码
  •   4. hadoop版本:3.1.1
  •   5.此篇过多的是源码,YARN distributedshell AM 源码流程查看2 使用时序图展示流程的,两篇内容是一样的

正文

    1.入口main

         1.1部分调用层级

             这样写不够清晰,请看1.2时序图

ApplicationMaster.main
 1.ApplicationMaster.init
 2.ApplicationMaster.run
  3.amRMClient.registerApplicationMaster
  4.amRMClient.addContainerRequest(containerAsk); 
   5.client.addContainerRequest(req);
     6.addResourceRequest(req.getPriority(), node,req.getExecutionTypeRequest(), resource, req,true,req.getNodeLabelExpression());
       7.addResourceRequestToAsk(resourceRequestInfo.remoteRequest);

 1.2序列图比较清晰

  

 以上流程比较清晰,不展示源码



2.源码部分


 2.1. AMRMClientImpl.addResourceRequestToAsk 

    添加资源到ACK,这样设计怪怪的,注释有解释为啥这样设计,

    ack是一个set,由异步线程处理
   

 private void  addResourceRequestToAsk(ResourceRequest remoteRequest) {
    // This code looks weird but is needed because of the following scenario.
    // A ResourceRequest is removed from the remoteRequestTable. A 0 container 
    // request is added to 'ask' to notify the RM about not needing it any more.
    // Before the call to allocate, the user now requests more containers. If 
    // the locations of the 0 size request and the new request are the same
    // (with the difference being only container count), then the set comparator
    // will consider both to be the same and not add the new request to ask. So 
    // we need to check for the "same" request being present and remove it and 
    // then add it back. The comparator is container count agnostic.
    // This should happen only rarely but we do need to guard against it.
    if(ask.contains(remoteRequest)) {
      ask.remove(remoteRequest);
    }
    ask.add(remoteRequest);
  }


 
 2.2.AMRMClientAsyncImpl.HeartbeatThread  

注册AM后就会触发线程HeartbeatThread  启动   

循环执行代码
   调用response = client.allocate(progress);
   把allocate结果塞给 responseQueue
  responseQueue有另外线程消费

private class HeartbeatThread extends Thread {
   ....

    public void run() {
      while (true) { //不定循环
        Object response = null;
        // synchronization ensures we don't send heartbeats after unregistering
        synchronized (unregisterHeartbeatLock) {
          if (!keepRunning) {
            return;
          }

          try {
            response = client.allocate(progress); //分配容器
          } catch (ApplicationAttemptNotFoundException e) {
            handler.onShutdownRequest();
            LOG.info("Shutdown requested. Stopping callback.");
            return;
          } catch (Throwable ex) {
            LOG.error("Exception on heartbeat", ex);
            response = ex;
          }
          if (response != null) {
            while (true) {
              try {
                responseQueue.put(response);//结果丢给responseQueue,异步处理
                break;
              } catch (InterruptedException ex) {
                LOG.debug("Interrupted while waiting to put on response queue", ex);
              }
            }
       
       ....
    }
  }

 

 2.3.client.allocate 方法调用

    cloneAsks 方法获取ackList 生成allocateRequest
    调用 rmClient.allocate(allocateRequest) 分配 

 public AllocateResponse allocate(float progressIndicator) 
      throws YarnException, IOException {
    ....
    try {
      synchronized (this) {
        askList = cloneAsks(); /
        ....

        allocateRequest = AllocateRequest.newBuilder()
            .responseId(lastResponseId).progress(progressIndicator)
            .askList(askList).resourceBlacklistRequest(blacklistRequest)
            .releaseList(releaseList).updateRequests(updateList).build();
        populateSchedulingRequests(allocateRequest);

        ....
      }

      try {
        allocateResponse = rmClient.allocate(allocateRequest); //申请分配资源
        removeFromOutstandingSchedulingRequests(
            allocateResponse.getAllocatedContainers());
        removeFromOutstandingSchedulingRequests(
            allocateResponse.getContainersFromPreviousAttempts());
      } catch (ApplicationMasterNotRegisteredException e) {
       ....
      }

      ....
    return allocateResponse;
  }


 2.4.AMRMClientAsyncImpl.CallbackHandlerThread 线程处理 responseQueue 信息


    循环执行
    object = responseQueue.take();
    根据返回信息做不同判定处理 
    2.4.1 updatedNodes, 回调  RMCallbackHandler.onNodesUpdated 没实现
    2.4.2 completed, 回调     RMCallbackHandler.onContainersCompleted
    2.4.3 allocated  回调      RMCallbackHandler.onContainersAllocated(allocated) 
    2.4.4 changed  回调         RMCallbackHandler.onContainersUpdated(allocated)

    ....

private class CallbackHandlerThread extends Thread {
    ....
    public void run() {
      while (true) {
        if (!keepRunning) {
          return;
        }
        try {
          Object object;
          try {
            object = responseQueue.take();
          } catch (InterruptedException ex) {
            LOG.debug("Interrupted while waiting for queue", ex);
            Thread.currentThread().interrupt();
            continue;
          }
          ....

          AllocateResponse response = (AllocateResponse) object;
          String collectorAddress = null;
          if (response.getCollectorInfo() != null) {
            collectorAddress = response.getCollectorInfo().getCollectorAddr();
          }

         ....

          List<NodeReport> updatedNodes = response.getUpdatedNodes();
          if (!updatedNodes.isEmpty()) { //   4.1 updatedNodes
         
            handler.onNodesUpdated(updatedNodes);
          }

          List<ContainerStatus> completed =
              response.getCompletedContainersStatuses();
          if (!completed.isEmpty()) {
            LOG.info("====> completed 节点"+completed);
            handler.onContainersCompleted(completed);
          }

          ....

          List<Container> allocated = response.getAllocatedContainers();
          if (!allocated.isEmpty()) {
            LOG.info("====> allocated 节点"+allocated);
            handler.onContainersAllocated(allocated);
          }

          if (!response.getContainersFromPreviousAttempts().isEmpty()) {
            if (handler instanceof AMRMClientAsync.AbstractCallbackHandler) {
              ((AMRMClientAsync.AbstractCallbackHandler) handler)
                  .onContainersReceivedFromPreviousAttempts(
                      response.getContainersFromPreviousAttempts());
            }
          }
          ....
          progress = handler.getProgress();
        } catch (Throwable ex) {
          handler.onError(ex);
          // re-throw exception to end the thread
          throw new YarnRuntimeException(ex);
        }
      }
    }
  }


 2.5.RMCallbackHandler回调处理

  上承 2.4.3 在此举例子allocated  回调ApplicationMaster.RMCallbackHandler.onContainersAllocated
  launchThread 封装container 启动

public void onContainersAllocated(List<Container> allocatedContainers) {
      LOG.info("Got response from RM for container ask, allocatedCnt="
          + allocatedContainers.size());
      for (Container allocatedContainer : allocatedContainers) {
        if (numAllocatedContainers.get() == numTotalContainers) {
          ....
        } else {
          numAllocatedContainers.addAndGet(1);
          String yarnShellId = Integer.toString(yarnShellIdCounter);
          yarnShellIdCounter++;
         ....

          Thread launchThread =
              createLaunchContainerThread(allocatedContainer, yarnShellId); //独立线程处理

          ....
          launchThreads.add(launchThread);
          launchedContainers.add(allocatedContainer.getId());
          launchThread.start(); 

          // Remove the corresponding request
          Collection<AMRMClient.ContainerRequest> requests =
              amRMClient.getMatchingRequests(
                  allocatedContainer.getAllocationRequestId());
          if (requests.iterator().hasNext()) {
            AMRMClient.ContainerRequest request = requests.iterator().next();
            amRMClient.removeContainerRequest(request);
          }
        }
      }
    }


 2.6.LaunchContainerRunnable.start  单独线程启动container


    nmClientAsync.startContainerAsync(container, ctx);

public void run() {
     ....

      //省略掉的是处理resouce,环境变量,命令,token等
      myShellEnv.put(YARN_SHELL_ID, shellId);
      ContainerRetryContext containerRetryContext =
          ContainerRetryContext.newInstance(
              containerRetryPolicy, containerRetryErrorCodes,
              containerMaxRetries, containrRetryInterval,
              containerFailuresValidityInterval);
      ContainerLaunchContext ctx = ContainerLaunchContext.newInstance(
        localResources, myShellEnv, commands, null, allTokens.duplicate(),
          null, containerRetryContext);
      containerListener.addContainer(container.getId(), container);
      nmClientAsync.startContainerAsync(container, ctx);//启动container
    }


 7.NMClientAsyncImpl.startContainerAsync


     events.put(new StartContainerEvent(container, containerLaunchContext)); events 是阻塞队列

public void startContainerAsync(
      Container container, ContainerLaunchContext containerLaunchContext) {
    if (containers.putIfAbsent(container.getId(),
        new StatefulContainer(this, container.getId())) != null) {
      callbackHandler.onStartContainerError(container.getId(),
          RPCUtil.getRemoteException("Container " + container.getId() +
              " is already started or scheduled to start"));
    }
    try {
      events.put(new StartContainerEvent(container, containerLaunchContext));
    } catch (InterruptedException e) {
      LOG.warn("Exception when scheduling the event of starting Container " +
          container.getId());
      callbackHandler.onStartContainerError(container.getId(), e);
    }
  }


 8.NMClientAsyncImpl.匿名线程类

    引用线程名字 eventDispatcherThread  一直循环消费 events

     threadPool.execute(getContainerEventProcessor(event)); event再次封装成线程被线程池处理

 eventDispatcherThread = new Thread() {
      @Override
      public void run() {
        ContainerEvent event = null;
        Set<String> allNodes = new HashSet<String>();

        while (!stopped.get() && !Thread.currentThread().isInterrupted()) {
          try {
            event = events.take();
          } catch (InterruptedException e) {
            if (!stopped.get()) {
              LOG.error("Returning, thread interrupted", e);
            }
            return;
          }

          allNodes.add(event.getNodeId().toString());

          int threadPoolSize = threadPool.getCorePoolSize();

          if (threadPoolSize != maxThreadPoolSize) {

            // nodes where containers will run at *this* point of time. This is
            // *not* the cluster size and doesn't need to be.
            int nodeNum = allNodes.size();
            int idealThreadPoolSize = Math.min(maxThreadPoolSize, nodeNum);

            if (threadPoolSize < idealThreadPoolSize) {
              ....
              threadPool.setCorePoolSize(newThreadPoolSize);
            }
          }

          // the events from the queue are handled in parallel with a thread
          // pool
          threadPool.execute(getContainerEventProcessor(event)); // 线程内用线程池处理

          // TODO: Group launching of multiple containers to a single
          // NodeManager into a single connection
        }
      }
    };


 2.9.NMClientAsyncImpl.ContainerEventProcessor.run 


    根据状态进行判断
    if(event.getType=QUERY_CONTAINER)
     callbackHandler.onContainerStatusReceived(containerId, containerStatus);// 回调 NMCallbackHandler.onContainerStatusReceived    
    else
     container.handle(event);//进去状态机啦    
                  event.getType=START_CONTAINER -->2.10 , StartContainerTransition
                  event.getType=UPDATE_CONTAINER_RESOURCE --> UpdateContainerResourceTransition

 protected class ContainerEventProcessor implements Runnable {
    protected ContainerEvent event;

   ....

    @Override
    public void run() {
      ContainerId containerId = event.getContainerId();
      LOG.info("Processing Event " + event + " for Container " + containerId);
      if (event.getType() == ContainerEventType.QUERY_CONTAINER) {//根据状态进行判断
        try {
          ContainerStatus containerStatus = client.getContainerStatus(
              containerId, event.getNodeId());
          try {
            callbackHandler.onContainerStatusReceived(
                containerId, containerStatus);
          } catch (Throwable thr) {
            ....
          }
        } catch (YarnException e) {
        ....
        }
      } else {
        StatefulContainer container = containers.get(containerId);
        if (container == null) {
          LOG.info("Container " + containerId + " is already stopped or failed");
        } else {
          container.handle(event);  //处理
          if (isCompletelyDone(container)) {
            containers.remove(containerId);
          }
        }
      }
    }

   
 2.10.StatefulContainer.handle

    this.stateMachine.doTransition(event.getType(), event); // event.getType=START_CONTAINER

    //此处进入状态机不做讲解,记住event.getType=START_CONTAINER 进入2.11 流程即可

 2.11.StartContainerTransition.transition

         先启动
           Map<String, ByteBuffer> allServiceResponse =
              container.nmClientAsync.getClient().startContainer(
                  scEvent.getContainer(), scEvent.getContainerLaunchContext());
           再回调
            container.nmClientAsync.getCallbackHandler().onContainerStarted(
                containerId, allServiceResponse);
 2.11.1 先启动NMClientImpl.startContainer

 public Map<String, ByteBuffer> startContainer(
      Container container, ContainerLaunchContext containerLaunchContext)
          throws YarnException, IOException {
     ....
    StartedContainer startingContainer =
        new StartedContainer(container.getId(), container.getNodeId());
    synchronized (startingContainer) {
      addStartingContainer(startingContainer);
      
      Map<String, ByteBuffer> allServiceResponse;
      ContainerManagementProtocolProxyData proxy = null;
      try {
        proxy =
            cmProxy.getProxy(container.getNodeId().toString(),
                container.getId());
        StartContainerRequest scRequest =
            StartContainerRequest.newInstance(containerLaunchContext,
              container.getContainerToken());
        List<StartContainerRequest> list = new ArrayList<StartContainerRequest>();
        list.add(scRequest);
        StartContainersRequest allRequests =
            StartContainersRequest.newInstance(list);
        StartContainersResponse response =
            proxy
                .getContainerManagementProtocol().startContainers(allRequests);//调用RPC接口
        if (response.getFailedRequests() != null
            && response.getFailedRequests().containsKey(container.getId())) {
          Throwable t =
              response.getFailedRequests().get(container.getId()).deSerialize();
          parseAndThrowException(t);
        }
        allServiceResponse = response.getAllServicesMetaData();
        startingContainer.state = ContainerState.RUNNING;
      } catch (YarnException | IOException e) {
       ....
      }
      return allServiceResponse;
    }
  }


 2.11.2 再回调 ApplicationMaster.NMCallbackHandler.onContainerStarted
        applicationMaster.nmClientAsync.getContainerStatusAsync(containerId, container.getNodeId());

 public void getContainerStatusAsync(ContainerId containerId, NodeId nodeId) {
    try {
      events.put(new ContainerEvent(containerId, nodeId, null,
          ContainerEventType.QUERY_CONTAINER));
    } catch (InterruptedException e) {
      LOG.warn("Exception when scheduling the event of querying the status" +
          " of Container " + containerId);
      callbackHandler.onGetContainerStatusError(containerId, e);
    }
  }

重新走2.8流程


 12. 其他

      例如流程2.4 的子流程,与2.5流程类似,不在叙述

       AM基本流程已经写完.有空再补充一个比较全的时序图

      

  

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值