Job提交到Yarn过程详解--转载

最新推荐文章于 2024-07-10 18:18:15 发布

zuoseve01

最新推荐文章于 2024-07-10 18:18:15 发布

阅读量245

点赞数

分类专栏： yarn

原文链接：https://blog.csdn.net/weixin_41870706/article/details/79880703

版权

yarn 专栏收录该内容

4 篇文章 1 订阅

订阅专栏

主要组件介绍：

Yarn是个资源管理，任务调度的框架，主要包括三大模块：ResouceManager，NodeManager，ApplicationMaster

ResouceManager：资源管理器，整个集群资源的协调者，调度者，管理者

NodeManager：NM是每个节点上的资源和任务管理器。它会定时地向RM汇报本节点上的资源使用情况和各个Container的运行状态；同时会接收并处理来自AM的Container 启动/停止等请求。

ApplicationMaster：负责应用的监控，跟踪应用执行状态，重启失败任务等

Container：Container是YARN中的资源抽象，它封装了某个节点上的多维度资源，如内存、CPU、磁盘、网络等，当AM向RM申请资源时，RM为AM返回的资源便是用Container 表示的。 YARN会为每个任务分配一个Container且该任务只能使用该Container中描述的资源。

MR提交任务到yarn运行过程：

1：当我们在Client端提交任务，WaitforCompelet（true）方法中会调用submit（）方法，根据所传参数，决定是否开启饶舌模式，周期性地报告作业进展，否则周期性得询问作业是否完成

2：进入submit方法中，我们可以看到submit方法主要干四件事

1：确定没有重复提交

2：根据配置信息确定是否采用新的API

3：与集群的连接，调用connect方法，创建Cluster对象cluster，早cluster这个类的构造方法中调用了initialize方法，根据系统配置创建用户所要求的ClientProtocol，无非就是LocalRunner还是YarnRunner

4：生成submitter对象调用submitJobInternal方法

4：传入cluster参数生成了JobSubmitter对象submitter，因此submitter也获得了远程与RM通信的能力。

5：最终submit这个方法return 的 submitteer.submitJobInternal，后续的job资源上传到hdfs中都是在这个方法搞定的

3：我们进入submitteer.submitJobInternal（）里

JobStatus submitJobInternal(Job job, Cluster cluster) 
  throws ClassNotFoundException, InterruptedException, IOException {
 
    //validate the jobs output specs 
    checkSpecs(job);                //检查输出格式等配置的合理性
 
    Configuration conf = job.getConfiguration();
    addMRFrameworkToDistributedCache(conf);
 
    Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf); //获取目录路径
    //configure the command line options correctly on the submitting dfs
    InetAddress ip = InetAddress.getLocalHost(); //获取本节点的ip地址
    if (ip != null) {
      submitHostAddress = ip.getHostAddress();  //本节点ip地址的字符串形式
      submitHostName = ip.getHostName();     //本节点名称
      conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
      conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
    }
    JobID jobId = submitClient.getNewJobID();  //生成一个作业ID
    job.setJobID(jobId);    //将作业id写入job对象
    Path submitJobDir = new Path(jobStagingArea, jobId.toString());
    JobStatus status = null;
    try {
      conf.set(MRJobConfig.USER_NAME,
          UserGroupInformation.getCurrentUser().getShortUserName());
      conf.set("hadoop.http.filter.initializers", 
          "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
      conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
      LOG.debug("Configuring job " + jobId + " with " + submitJobDir 
          + " as the submit dir");
      // get delegation token for the dir
      TokenCache.obtainTokensForNamenodes(job.getCredentials(),
          new Path[] { submitJobDir }, conf);   //获取与namenode打交道所需的证件
      
      populateTokenCache(conf, job.getCredentials());
 
      // generate a secret to authenticate shuffle transfers
      if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
        KeyGenerator keyGen;        //生成map与reduce数据流动的密码
        try {
          keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
          keyGen.init(SHUFFLE_KEY_LENGTH);
        } catch (NoSuchAlgorithmException e) {
          throw new IOException("Error generating shuffle secret key", e);
        }
        SecretKey shuffleKey = keyGen.generateKey();
        TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
            job.getCredentials());
      }
      if (CryptoUtils.isEncryptedSpillEnabled(conf)) {
        conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);
        LOG.warn("Max job attempts set to 1 since encrypted intermediate" +
                "data spill is enabled");
      }
 
      copyAndConfigureFiles(job, submitJobDir);    //将可执行文件之类拷贝到HDFS中
      Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
      
      // Create the splits for the job
      LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
      int maps = writeSplits(job, submitJobDir);    //生成切片，以切片的数量决定mapper数量
      conf.setInt(MRJobConfig.NUM_MAPS, maps);
      LOG.info("number of splits:" + maps);
 
      // write "queue admins of the queue to which job is being submitted"
      // to job file.
      String queue = conf.get(MRJobConfig.QUEUE_NAME,
          JobConf.DEFAULT_QUEUE_NAME);    //作业调度队列默认为default
      AccessControlList acl = submitClient.getQueueAdmins(queue);
      conf.set(toFullPropertyName(queue,
          QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());
 
      // removing jobtoken referrals before copying the jobconf to HDFS
      // as the tasks don't need this setting, actually they may break
      // because of it if present as the referral will point to a
      // different job.
      TokenCache.cleanUpTokenReferral(conf);
 
      if (conf.getBoolean(
          MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
          MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
        // Add HDFS tracking ids
        ArrayList<String> trackingIds = new ArrayList<String>();
        for (Token<? extends TokenIdentifier> t :
            job.getCredentials().getAllTokens()) {
          trackingIds.add(t.decodeIdentifier().getTrackingId());
        }
        conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
            trackingIds.toArray(new String[trackingIds.size()]));
      }
 
      // Set reservation info if it exists
      ReservationId reservationId = job.getReservationId();
      if (reservationId != null) {
        conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
      }
 
      // Write job file to submit dir
      writeConf(conf, submitJobFile);   //将conf的内容写入一个xml文件中
      
      //
      // Now, actually submit the job (using the submit name)
      //
      printTokens(jobId, job.getCredentials());
      status = submitClient.submitJob(
          jobId, submitJobDir.toString(), job.getCredentials());  //获取作业提交状态
      if (status != null) {
        return status;               //提交成功返回status
      } else {
        throw new IOException("Could not launch job");  //提交失败抛出异常
      }
    } finally {
      if (status == null) {           //如果提交失败了，要把之前穿件的目录删除
        LOG.info("Cleaning up the staging area " + submitJobDir);
        if (jtFs != null && submitJobDir != null)
          jtFs.delete(submitJobDir, true);  
 
      }
    }
  }
检查输出格式等配置的合理性

    Configuration conf = job.getConfiguration();
    addMRFrameworkToDistributedCache(conf);

    Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf); //获取目录路径
    //configure the command line options correctly on the submitting dfs
    InetAddress ip = InetAddress.getLocalHost(); //获取本节点的ip地址
    if (ip != null) {
      submitHostAddress = ip.getHostAddress();  //本节点ip地址的字符串形式
      submitHostName = ip.getHostName();     //本节点名称
      conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
      conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
    }
    JobID jobId = submitClient.getNewJobID();  //生成一个作业ID
    job.setJobID(jobId);    //将作业id写入job对象
    Path submitJobDir = new Path(jobStagingArea, jobId.toString());
    JobStatus status = null;
    try {
      conf.set(MRJobConfig.USER_NAME,
          UserGroupInformation.getCurrentUser().getShortUserName());
      conf.set("hadoop.http.filter.initializers", 
          "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
      conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
      LOG.debug("Configuring job " + jobId + " with " + submitJobDir 
          + " as the submit dir");
      // get delegation token for the dir
      TokenCache.obtainTokensForNamenodes(job.getCredentials(),
          new Path[] { submitJobDir }, conf);   //获取与namenode打交道所需的证件
      
      populateTokenCache(conf, job.getCredentials());

      // generate a secret to authenticate shuffle transfers
      if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
        KeyGenerator keyGen;        //生成map与reduce数据流动的密码
        try {
          keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
          keyGen.init(SHUFFLE_KEY_LENGTH);
        } catch (NoSuchAlgorithmException e) {
          throw new IOException("Error generating shuffle secret key", e);
        }
        SecretKey shuffleKey = keyGen.generateKey();
        TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
            job.getCredentials());
      }
      if (CryptoUtils.isEncryptedSpillEnabled(conf)) {
        conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);
        LOG.warn("Max job attempts set to 1 since encrypted intermediate" +
                "data spill is enabled");
      }

      copyAndConfigureFiles(job, submitJobDir);    //将可执行文件之类拷贝到HDFS中

      Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
      
      // Create the splits for the job
      LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
      int maps = writeSplits(job, submitJobDir);    //生成切片，以切片的数量决定mapper数量
      conf.setInt(MRJobConfig.NUM_MAPS, maps);
      LOG.info("number of splits:" + maps);

      // write "queue admins of the queue to which job is being submitted"
      // to job file.
      String queue = conf.get(MRJobConfig.QUEUE_NAME,
          JobConf.DEFAULT_QUEUE_NAME);    //作业调度队列默认为default
      AccessControlList acl = submitClient.getQueueAdmins(queue);
      conf.set(toFullPropertyName(queue,
          QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());

      // removing jobtoken referrals before copying the jobconf to HDFS
      // as the tasks don't need this setting, actually they may break
      // because of it if present as the referral will point to a
      // different job.
      TokenCache.cleanUpTokenReferral(conf);

      if (conf.getBoolean(
          MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
          MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
        // Add HDFS tracking ids
        ArrayList<String> trackingIds = new ArrayList<String>();
        for (Token<? extends TokenIdentifier> t :
            job.getCredentials().getAllTokens()) {
          trackingIds.add(t.decodeIdentifier().getTrackingId());
        }
        conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
            trackingIds.toArray(new String[trackingIds.size()]));
      }

      // Set reservation info if it exists
      ReservationId reservationId = job.getReservationId();
      if (reservationId != null) {
        conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
      }

      // Write job file to submit dir
      writeConf(conf, submitJobFile);   //将conf的内容写入一个xml文件中
      
      //
      // Now, actually submit the job (using the submit name)
      //
      printTokens(jobId, job.getCredentials());
      status = submitClient.submitJob(
          jobId, submitJobDir.toString(), job.getCredentials());  //获取作业提交状态
      if (status != null) {
        return status;               //提交成功返回status
      } else {
        throw new IOException("Could not launch job");  //提交失败抛出异常
      }
    } finally {
      if (status == null) {           //如果提交失败了，要把之前穿件的目录删除
        LOG.info("Cleaning up the staging area " + submitJobDir);
        if (jtFs != null && submitJobDir != null)
          jtFs.delete(submitJobDir, true);  

      }
    }
  }

Configuration conf = job.getConfiguration();
    addMRFrameworkToDistributedCache(conf);

    Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf); //获取目录路径
    //configure the command line options correctly on the submitting dfs
    InetAddress ip = InetAddress.getLocalHost(); //获取本节点的ip地址
    if (ip != null) {
      submitHostAddress = ip.getHostAddress();  //本节点ip地址的字符串形式
      submitHostName = ip.getHostName();     //本节点名称
      conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
      conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
    }
    JobID jobId = submitClient.getNewJobID();  //生成一个作业ID
    job.setJobID(jobId);    //将作业id写入job对象
    Path submitJobDir = new Path(jobStagingArea, jobId.toString());
    JobStatus status = null;
    try {
      conf.set(MRJobConfig.USER_NAME,
          UserGroupInformation.getCurrentUser().getShortUserName());
      conf.set("hadoop.http.filter.initializers", 
          "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
      conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
      LOG.debug("Configuring job " + jobId + " with " + submitJobDir 
          + " as the submit dir");
      // get delegation token for the dir
      TokenCache.obtainTokensForNamenodes(job.getCredentials(),
          new Path[] { submitJobDir }, conf);   //获取与namenode打交道所需的证件
      
      populateTokenCache(conf, job.getCredentials());

      // generate a secret to authenticate shuffle transfers
      if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
        KeyGenerator keyGen;        //生成map与reduce数据流动的密码
        try {
          keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
          keyGen.init(SHUFFLE_KEY_LENGTH);
        } catch (NoSuchAlgorithmException e) {
          throw new IOException("Error generating shuffle secret key", e);
        }
        SecretKey shuffleKey = keyGen.generateKey();
        TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
            job.getCredentials());
      }
      if (CryptoUtils.isEncryptedSpillEnabled(conf)) {
        conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);
        LOG.warn("Max job attempts set to 1 since encrypted intermediate" +
                "data spill is enabled");
      }

      copyAndConfigureFiles(job, submitJobDir);    //将可执行文件之类拷贝到HDFS中

      Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
      
      // Create the splits for the job
      LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
      int maps = writeSplits(job, submitJobDir);    //生成切片，以切片的数量决定mapper数量
      conf.setInt(MRJobConfig.NUM_MAPS, maps);
      LOG.info("number of splits:" + maps);

      // write "queue admins of the queue to which job is being submitted"
      // to job file.
      String queue = conf.get(MRJobConfig.QUEUE_NAME,
          JobConf.DEFAULT_QUEUE_NAME);    //作业调度队列默认为default
      AccessControlList acl = submitClient.getQueueAdmins(queue);
      conf.set(toFullPropertyName(queue,
          QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());

      // removing jobtoken referrals before copying the jobconf to HDFS
      // as the tasks don't need this setting, actually they may break
      // because of it if present as the referral will point to a
      // different job.
      TokenCache.cleanUpTokenReferral(conf);

      if (conf.getBoolean(
          MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
          MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
        // Add HDFS tracking ids
        ArrayList<String> trackingIds = new ArrayList<String>();
        for (Token<? extends TokenIdentifier> t :
            job.getCredentials().getAllTokens()) {
          trackingIds.add(t.decodeIdentifier().getTrackingId());
        }
        conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
            trackingIds.toArray(new String[trackingIds.size()]));
      }

      // Set reservation info if it exists
      ReservationId reservationId = job.getReservationId();
      if (reservationId != null) {
        conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
      }

      // Write job file to submit dir
      writeConf(conf, submitJobFile);   //将conf的内容写入一个xml文件中
      
      //
      // Now, actually submit the job (using the submit name)
      //
      printTokens(jobId, job.getCredentials());
      status = submitClient.submitJob(
          jobId, submitJobDir.toString(), job.getCredentials());  //获取作业提交状态
      if (status != null) {
        return status;               //提交成功返回status
      } else {
        throw new IOException("Could not launch job");  //提交失败抛出异常
      }
    } finally {
      if (status == null) {           //如果提交失败了，要把之前穿件的目录删除
        LOG.info("Cleaning up the staging area " + submitJobDir);
        if (jtFs != null && submitJobDir != null)
          jtFs.delete(submitJobDir, true);  

      }
    }
  }

里面比较重要的两个方法主要在于下面这两个

copyAndConfigureFiles(job, submitJobDir  //将可执行文件之类拷贝到HDFS中

writeSplits(job, submitJobDir);    //生成切片，以切片的数量决定mapper数量 //生成切片，以切片的数量决定mapper数量

我们先到第一个方法中看一下它到底干了什么，都把那些文件拷贝到了hdfs中

再次进入uploadFiles（）这个方法里面

public void uploadFiles(Job job, Path submitJobDir) throws IOException {
  Configuration conf = job.getConfiguration();
  short replication =                                                  //获取副本数，默认是10
      (short) conf.getInt(Job.SUBMIT_REPLICATION,
          Job.DEFAULT_SUBMIT_REPLICATION);
 
  if (!(conf.getBoolean(Job.USED_GENERIC_PARSER, false))) {
    LOG.warn("Hadoop command-line option parsing not performed. "
        + "Implement the Tool interface and execute your application "
        + "with ToolRunner to remedy this.");
  }
 
  // get all the command line arguments passed in by the user conf
  String files = conf.get("tmpfiles");      //下面这四个文件就是此方法要上传的内容了每样上传10份，当然下面还有很多代码。。。。
  String libjars = conf.get("tmpjars");
  String archives = conf.get("tmparchives");
  String jobJar = job.getJar();

我们在进入writeSplits这个方法内，看一下切片到底如何划分的

下面我们在进入writeNewSplits方法，发现InputFormat对象的input调用的getsplits方法是FileInputFormat类重写的方法，最后又调用了computeSplitSize方法，返回块的大小.

 protected long computeSplitSize(long blockSize, long minSize,

maxSize) {

return Math.max(minSize, Math.min(maxSize, blockSize)); //maxSize设置的大小，blockSize=128，minSize=1

}

最后将10份file，jobjars，libjars，archieves和切片信息全上传到了hdfs中，上传的目录是在stagingDir目录下，建一个以jobid为文件名的目录。

4：向ResouceManage（RM） submit application

5：RM收到submitAoolication（）消息后，变将请求传递给Yarn调度器根据application，RM调用start container，调度器分配一个容易，，然后RM 在节点管理器的管理下在容器中启动application master的进程

6： MRAppMaster进行initializeJob

7：MRAppMaster从hdfs中调用retrieve inputsplits获取切片信息

8：MRAppMaster向ResourceManager调用allocateResource方法，获取资源

9：MRAppMaster开启一个容器，在NodeManager的管理下在容器中启动一个taskJVM，在jvm中启动一个YarnChild

10：YarnChild从hdfs中获取jobresources

11：run maptask，默认map完成5%，开启reduceTask,但是这个可以调，通常设置80%