源码走读-Yarn-ResourceManager05-MR任务提交-客户端侧分析

13 篇文章 0 订阅
11 篇文章 0 订阅

0x00 系列文章目录

  1. 源码走读-Yarn-ResourceManager01-基础概念
  2. 源码走读-Yarn-ResourceManager02-RM的启动-脚本
  3. 源码走读-Yarn-ResourceManager03-RM的启动之RM详解
  4. 源码走读-Yarn-ResourceManager04-RM调度之FairScheduler
  5. 源码走读-Yarn-ResourceManager05-MR任务提交-客户端侧分析
  6. 源码走读-Yarn-ResourceManager06-MR任务提交-服务端分析
  7. 源码走读-Yarn-ResourceManager07-ShutdownHookManager
  8. 源码走读-Yarn-ResourceManager08-总结

0x05 RM调度-MR任务提交-客户端侧分析

5.1 mapreduce.job

org.apache.hadoop.mapreduce.Job

我们都知道,MR任务的一般结尾会有一句话是job.waitForCompletion(true),这行代码意思是提交任务并等待结束。我们的分析就从这里入手:

public boolean waitForCompletion(boolean verbose
                                   ) throws IOException, InterruptedException,
                                            ClassNotFoundException {
    if (state == Job.JobState.DEFINE) {
      //提交任务
      submit();
    }
    if (verbose) {
      //监控任务执行,持续打印输出,直到任务完成(成功或失败)
      this.monitorAndPrintJob();
    } else {
      // get the completion poll interval from the client.
      int completionPollIntervalMillis = getCompletionPollInterval(this.cluster.getConf());

      while (!isComplete()) {
        try {
          Thread.sleep(completionPollIntervalMillis);
        } catch (InterruptedException ie) {
        }
      }
    }
    return isSuccessful();
  }

下面看看submit方法:

// 提交任务都集群,然后立刻返回
public void submit() 
         throws IOException, InterruptedException, ClassNotFoundException {
    ensureState(JobState.DEFINE);
    setUseNewAPI();
    connect();
    // 创建JobSubmitter
    final JobSubmitter submitter = 
        getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
    status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
      public JobStatus run() throws IOException, InterruptedException, 
      ClassNotFoundException {
      	  // 提交job
        return submitter.submitJobInternal(Job.this, cluster);
      }
    });
    state = JobState.RUNNING;
    LOG.info("The url to track the job: " + getTrackingURL());
   }

可以看到使用了JobSubmitter类,下面接着看。

5.2 JobSubmitter

org.apache.hadoop.mapreduce.JobSubmitter

接着看submitter.submitJobInternal方法。由于里面代码太多,我这里只写出最关键的几句:

  // 获取jobId
  JobID jobId = submitClient.getNewJobID();
  job.setJobID(jobId);

  // 上传程序Jar、files、依赖的libJar、archive等到HDFS
  copyAndConfigureFiles(job, submitJobDir);  
  
  // 根据Job输入文件来计算切片,生成切片规划文件并写入staging dir,根据切片数计算map数
  // split的时候会用具体的InputFormat的实现类TextInputFormat来读取文件进行划分
  // 具体来说,在split时会调用TextInputFormat的getSplits方法
  // 拿到的划分后的文件,会获取到文件大小、主机地址、是否在内存等信息
  // 分片信息会写入HDFS,以便后续map任务使用
  int maps = writeSplits(job, submitJobDir);
  
  // 然后就根据split数确认了map任务数量
  conf.setInt(MRJobConfig.NUM_MAPS, maps);
   
  // 把job所有配置信息写入staging文件 
  writeConf(conf, submitJobFile); 
           
  // 真正开始提交job,并拿到提交状态返回
  status = this.submitClient.submitJob(jobId, submitJobDir.toString(), job.getCredentials());

可以看到,这里先获取jobID把,再job提交到了submitClient,他是一个ClientProtocol接口的实现类。上面的代码看了以后我们知道,现在的主要工作是两步:获取JobId然后提交Job。下面我们分开讲下这两个流程:因为我们是提交到Yarn跑任务的,所以实际使用的是YARNRunner

5.3 获取JobID

5.3.1 YARNRunner

org.apache.hadoop.mapred.YARNRunner

先看看前面使用的submitClient.getNewJobID():

//这里是调用了resMgrDelegate.getNewJobID来获取jobId
@Override
  public JobID getNewJobID() throws IOException, InterruptedException {
    return resMgrDelegate.getNewJobID();
  }

上面我们看到是用了resMgrDelegate,那继续看看这个ResourceMgrDelegate是啥:

5.3.2 ResourceMgrDelegate

ResourceMgrDelegate的部分代码:

public class ResourceMgrDelegate extends YarnClient{
  private YarnConfiguration conf;
  private ApplicationSubmissionContext application;
  private ApplicationId applicationId;
  @Private
  @VisibleForTesting
  protected YarnClient client;
  private Text rmDTService;

  /**
   * Delegate responsible for communicating with the Resource Manager's
   * {@link ApplicationClientProtocol}.
   * 被委托负责用ApplicationClientProtocol协议和RM通信交互
   * @param conf the configuration object.
   */
  public ResourceMgrDelegate(YarnConfiguration conf) {
    super(ResourceMgrDelegate.class.getName());
    this.conf = conf;
    this.client = YarnClient.createYarnClient();
    init(conf);
    start();
  }
  
   @Override
  protected void serviceInit(Configuration conf) throws Exception {
    client.init(conf);
    super.serviceInit(conf);
  }

  @Override
  protected void serviceStart() throws Exception {
    client.start();
    super.serviceStart();
  }

  @Override
  protected void serviceStop() throws Exception {
    client.stop();
    super.serviceStop();
  }
  
  //获取一个新的JobId
  public JobID getNewJobID() throws IOException, InterruptedException {
    try {
      // 这里的client是一个YarnClientImpl实例
      // 获取一个ApplicationSubmissionContext 
      this.application = client.createApplication().getApplicationSubmissionContext();
      // 获取applicationId
      this.applicationId = this.application.getApplicationId();
      return TypeConverter.fromYarn(applicationId);
    } catch (YarnException e) {
      throw new IOException(e);
    }
  }
}

看到serviceInit serviceStart等方法有没有很熟悉?对他的父类YarnClient就是继承自AbstractService
这里有用YarnClientImplApplicationSubmissionContext,我们在后面介绍。

5.3.3 YarnClientImpl

org.apache.hadoop.yarn.client.api.impl.YarnClientImpl

我们先看看前面提到的ResourceMgrDelegate中用的client.createApplication方法:

@Override
  public YarnClientApplication createApplication()
      throws YarnException, IOException {
    // 生成一个ApplicationSubmissionContextPBImpl实例
    // ApplicationSubmissionContext 表示所有RM为应用程序启动AM所需的所有信息
    ApplicationSubmissionContext context = Records.newRecord
        (ApplicationSubmissionContext.class);
    // 获得一个新应用程序的返回信息
    GetNewApplicationResponse newApp = getNewApplication();
    ApplicationId appId = newApp.getApplicationId();
    // 将appId保存到ApplicationSubmissionContext中
    context.setApplicationId(appId);
    // 拿到appId后实例化一个封装了applictionResponse和context的对象
    return new YarnClientApplication(newApp, context);
  }

接着看看getNewApplication方法:

private GetNewApplicationResponse getNewApplication()
      throws YarnException, IOException {
    GetNewApplicationRequest request =
        Records.newRecord(GetNewApplicationRequest.class);
    // 这里的rmClient是关键
    return rmClient.getNewApplication(request);
  }

上面方法中的rmClient是实现了ApplicationClientProtocol接口的类,下面看看rmClient.getNewApplication方法。

5.3.4 ApplicationClientProtocolPBClientImpl

org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl

/**
 * 这个方法继承自ApplicationClientProtocol,注释如下:
 * 这个方法是客户端用来获取一个新的ApplicationId,然后用它来提交新的application
 * 该方法执行后,RM会在返回一个GetNewApplicationResponse,
 * 他包含了一个新的、单调递增的ApplicationId以及一些详细的集群信息如最大资源容量
 * 
 * 也就是说这里只是获取appId,不会真正执行app
 */
@Override
  public GetNewApplicationResponse getNewApplication(
      GetNewApplicationRequest request) throws YarnException,
      IOException {
    GetNewApplicationRequestProto requestProto =
        ((GetNewApplicationRequestPBImpl) request).getProto();
    try {
      return new GetNewApplicationResponsePBImpl(proxy.getNewApplication(null,
        requestProto));
    } catch (ServiceException e) {
      RPCUtil.unwrapAndThrowException(e);
      return null;
    }
  }

到这里,客户端侧的获取JobId流程就介绍完了。下面,我们接着讲SubmitJob流程。

5.4 提交Job

5.4.1 YARNRunner

下面接着看JobSubmitter中调用的YARNRunner的submitJob方法:

@Override
  public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts)
  throws IOException, InterruptedException {
    addHistoryToken(ts);
    // 将所需信息构建为appContext,来为开启 MR 的 ApplicationMaster做准备
    ApplicationSubmissionContext appContext =
      createApplicationSubmissionContext(conf, jobSubmitDir, ts);

    // 提交给ResourceManager
    try {
    	//这里就是刚才ResourceMgrDelegate.getNewJobId获取到的applicationId
      ApplicationId applicationId =
          resMgrDelegate.submitApplication(appContext);

		// appMaster信息报告
      ApplicationReport appMaster = resMgrDelegate
          .getApplicationReport(applicationId);
      String diagnostics =
          (appMaster == null ?
              "application report is null" : appMaster.getDiagnostics());
      if (appMaster == null
          || appMaster.getYarnApplicationState() == YarnApplicationState.FAILED
          || appMaster.getYarnApplicationState() == YarnApplicationState.KILLED) {
        throw new IOException("Failed to run job : " +
            diagnostics);
      }
      return clientCache.getClient(jobId).getJobStatus(jobId);
    } catch (YarnException e) {
      throw new IOException(e);
    }
  }

到这里,就介绍完了YARNRunner中的submitJob两个方法,依然是mapreduce包中的代码。下面的开始进入yarn包。

5.4.2 YarnClientImpl

直接看submitApplication方法:

  @Override
  public ApplicationId submitApplication(ApplicationSubmissionContext appContext)
          throws YarnException, IOException {
    ApplicationId applicationId = appContext.getApplicationId();
    if (applicationId == null) {
      throw new ApplicationIdNotProvidedException(
          "ApplicationId is not provided in ApplicationSubmissionContext");
    }
    SubmitApplicationRequest request =
        Records.newRecord(SubmitApplicationRequest.class);
    request.setApplicationSubmissionContext(appContext);

    // Automatically add the timeline DT into the CLC
    // Only when the security and the timeline service are both enabled
    if (isSecurityEnabled() && timelineServiceEnabled) {
      addTimelineDelegationToken(appContext.getAMContainerSpec());
    }

    // 提交application的请求
    rmClient.submitApplication(request);

    int pollCount = 0;
    long startTime = System.currentTimeMillis();
    EnumSet<YarnApplicationState> waitingStates = 
                                 EnumSet.of(YarnApplicationState.NEW,
                                 YarnApplicationState.NEW_SAVING,
                                 YarnApplicationState.SUBMITTED);
    EnumSet<YarnApplicationState> failToSubmitStates = 
                                  EnumSet.of(YarnApplicationState.FAILED,
                                  YarnApplicationState.KILLED);		
    while (true) {
      try {
        ApplicationReport appReport = getApplicationReport(applicationId);
        YarnApplicationState state = appReport.getYarnApplicationState();
        if (!waitingStates.contains(state)) {
          if(failToSubmitStates.contains(state)) {
            throw new YarnException("Failed to submit " + applicationId + 
                " to YARN : " + appReport.getDiagnostics());
          }
          LOG.info("Submitted application " + applicationId);
          break;
        }

        long elapsedMillis = System.currentTimeMillis() - startTime;
        if (enforceAsyncAPITimeout() &&
            elapsedMillis >= asyncApiPollTimeoutMillis) {
          throw new YarnException("Timed out while waiting for application " +
              applicationId + " to be submitted successfully");
        }

        // Notify the client through the log every 10 poll, in case the client
        // is blocked here too long.
        if (++pollCount % 10 == 0) {
          LOG.info("Application submission is not finished, " +
              "submitted application " + applicationId +
              " is still in " + state);
        }
        try {
          Thread.sleep(submitPollIntervalMillis);
        } catch (InterruptedException ie) {
          LOG.error("Interrupted while waiting for application "
              + applicationId
              + " to be successfully submitted.");
        }
      } catch (ApplicationNotFoundException ex) {
        // FailOver or RM restart happens before RMStateStore saves
        // ApplicationState
        LOG.info("Re-submit application " + applicationId + "with the " +
            "same ApplicationSubmissionContext");
        rmClient.submitApplication(request);
      }
    }

    return applicationId;
  }
5.4.3 ApplicationClientProtocolPBClientImpl
 @Override
  public SubmitApplicationResponse submitApplication(
      SubmitApplicationRequest request) throws YarnException,
      IOException {
    SubmitApplicationRequestProto requestProto =
        ((SubmitApplicationRequestPBImpl) request).getProto();
    try {
      return new SubmitApplicationResponsePBImpl(proxy.submitApplication(null,
        requestProto));
    } catch (ServiceException e) {
      RPCUtil.unwrapAndThrowException(e);
      return null;
    }
  }

5.5 小结

本章主要通过一个Map任务的提交过程,来分析客户端侧的源码。

下一章,我们会进入服务端代码继续分析任务提交过程-源码走读-Yarn-ResourceManager06-MR任务提交-服务端分析

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值