Job提交到Yarn过程详解--转载

主要组件介绍:

     Yarn是个资源管理,任务调度的框架,主要包括三大模块:ResouceManager,NodeManager,ApplicationMaster

     ResouceManager:资源管理器,整个集群资源的协调者,调度者,管理者

     NodeManager:NM是每个节点上的资源和任务管理器。它会定时地向RM汇报本节点上的资源使用情况和各个Container的运行状态;同时会接收并处理来自AM的Container 启动/停止等请求。

      ApplicationMaster:负责应用的监控,跟踪应用执行状态,重启失败任务等

      Container:Container是YARN中的资源抽象,它封装了某个节点上的多维度资源,如内存、CPU、磁盘、网络等,当AM向RM申请资源时,RM为AM返回的资源便是用Container 表示的。 YARN会为每个任务分配一个Container且该任务只能使用该Container中描述的资源。

MR提交任务到yarn运行过程:

     1:当我们在Client端提交任务,WaitforCompelet(true)方法中会调用submit()方法,根据所传参数,决定是否开启饶舌模式,周期性地报告作业进展,否则周期性得询问作业是否完成

     2:进入submit方法中,我们可以看到submit方法主要干四件事

              1:确定没有重复提交

              2:根据配置信息确定是否采用新的API

              3:与集群的连接,调用connect方法,创建Cluster对象cluster,早cluster这个类的构造方法中调用了initialize方法,根据系统配置创建用户所要求的ClientProtocol,无非就是LocalRunner还是YarnRunner

                4:生成submitter对象调用submitJobInternal方法

                        4:传入cluster参数生成了JobSubmitter对象submitter,因此submitter也获得了远程与RM通信的能力。

             5:最终submit这个方法return 的 submitteer.submitJobInternal,后续的job资源上传到hdfs中都是在这个方法搞定的

    3:我们进入submitteer.submitJobInternal()里

JobStatus submitJobInternal(Job job, Cluster cluster) 
  throws ClassNotFoundException, InterruptedException, IOException {
 
    //validate the jobs output specs 
    checkSpecs(job);                //检查输出格式等配置的合理性
 
    Configuration conf = job.getConfiguration();
    addMRFrameworkToDistributedCache(conf);
 
    Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf); //获取目录路径
    //configure the command line options correctly on the submitting dfs
    InetAddress ip = InetAddress.getLocalHost(); //获取本节点的ip地址
    if (ip != null) {
      submitHostAddress = ip.getHostAddress();  //本节点ip地址的字符串形式
      submitHostName = ip.getHostName();     //本节点名称
      conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
      conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
    }
    JobID jobId = submitClient.getNewJobID();  //生成一个作业ID
    job.setJobID(jobId);    //将作业id写入job对象
    Path submitJobDir = new Path(jobStagingArea, jobId.toString());
    JobStatus status = null;
    try {
      conf.set(MRJobConfig.USER_NAME,
          UserGroupInformation.getCurrentUser().getShortUserName());
      conf.set("hadoop.http.filter.initializers", 
          "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
      conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
      LOG.debug("Configuring job " + jobId + " with " + submitJobDir 
          + " as the submit dir");
      // get delegation token for the dir
      TokenCache.obtainTokensForNamenodes(job.getCredentials(),
          new Path[] { submitJobDir }, conf);   //获取与namenode打交道所需的证件
      
      populateTokenCache(conf, job.getCredentials());
 
      // generate a secret to authenticate shuffle transfers
      if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
        KeyGenerator keyGen;        //生成map与reduce数据流动的密码
        try {
          keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
          keyGen.init(SHUFFLE_KEY_LENGTH);
        } catch (NoSuchAlgorithmException e) {
          throw new IOException("Error generating shuffle secret key", e);
        }
        SecretKey shuffleKey = keyGen.generateKey();
        TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
            job.getCredentials());
      }
      if (CryptoUtils.isEncryptedSpillEnabled(conf)) {
        conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);
        LOG.warn("Max job attempts set to 1 since encrypted intermediate" +
                "data spill is enabled");
      }
 
      copyAndConfigureFiles(job, submitJobDir);    //将可执行文件之类拷贝到HDFS中
      Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
      
      // Create the splits for the job
      LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
      int maps = writeSplits(job, submitJobDir);    //生成切片,以切片的数量决定mapper数量
      conf.setInt(MRJobConfig.NUM_MAPS, maps);
      LOG.info("number of splits:" + maps);
 
      // write "queue admins of the queue to which job is being submitted"
      // to job file.
      String queue = conf.get(MRJobConfig.QUEUE_NAME,
          JobConf.DEFAULT_QUEUE_NAME);    //作业调度队列默认为default
      AccessControlList acl = submitClient.getQueueAdmins(queue);
      conf.set(toFullPropertyName(queue,
          QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());
 
      // removing jobtoken referrals before copying the jobconf to HDFS
      // as the tasks don't need this setting, actually they may break
      // because of it if present as the referral will point to a
      // different job.
      TokenCache.cleanUpTokenReferral(conf);
 
      if (conf.getBoolean(
          MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
          MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
        // Add HDFS tracking ids
        ArrayList<String> trackingIds = new ArrayList<String>();
        for (Token<? extends TokenIdentifier> t :
            job.getCredentials().getAllTokens()) {
          trackingIds.add(t.decodeIdentifier().getTrackingId());
        }
        conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
            trackingIds.toArray(new String[trackingIds.size()]));
      }
 
      // Set reservation info if it exists
      ReservationId reservationId = job.getReservationId();
      if (reservationId != null) {
        conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
      }
 
      // Write job file to submit dir
      writeConf(conf, submitJobFile);   //将conf的内容写入一个xml文件中
      
      //
      // Now, actually submit the job (using the submit name)
      //
      printTokens(jobId, job.getCredentials());
      status = submitClient.submitJob(
          jobId, submitJobDir.toString(), job.getCredentials());  //获取作业提交状态
      if (status != null) {
        return status;               //提交成功返回status
      } else {
        throw new IOException("Could not launch job");  //提交失败抛出异常
      }
    } finally {
      if (status == null) {           //如果提交失败了,要把之前穿件的目录删除
        LOG.info("Cleaning up the staging area " + submitJobDir);
        if (jtFs != null && submitJobDir != null)
          jtFs.delete(submitJobDir, true);  
 
      }
    }
  }
检查输出格式等配置的合理性

    Configuration conf = job.getConfiguration();
    addMRFrameworkToDistributedCache(conf);

    Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf); //获取目录路径
    //configure the command line options correctly on the submitting dfs
    InetAddress ip = InetAddress.getLocalHost(); //获取本节点的ip地址
    if (ip != null) {
      submitHostAddress = ip.getHostAddress();  //本节点ip地址的字符串形式
      submitHostName = ip.getHostName();     //本节点名称
      conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
      conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
    }
    JobID jobId = submitClient.getNewJobID();  //生成一个作业ID
    job.setJobID(jobId);    //将作业id写入job对象
    Path submitJobDir = new Path(jobStagingArea, jobId.toString());
    JobStatus status = null;
    try {
      conf.set(MRJobConfig.USER_NAME,
          UserGroupInformation.getCurrentUser().getShortUserName());
      conf.set("hadoop.http.filter.initializers", 
          "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
      conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
      LOG.debug("Configuring job " + jobId + " with " + submitJobDir 
          + " as the submit dir");
      // get delegation token for the dir
      TokenCache.obtainTokensForNamenodes(job.getCredentials(),
          new Path[] { submitJobDir }, conf);   //获取与namenode打交道所需的证件
      
      populateTokenCache(conf, job.getCredentials());

      // generate a secret to authenticate shuffle transfers
      if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
        KeyGenerator keyGen;        //生成map与reduce数据流动的密码
        try {
          keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
          keyGen.init(SHUFFLE_KEY_LENGTH);
        } catch (NoSuchAlgorithmException e) {
          throw new IOException("Error generating shuffle secret key", e);
        }
        SecretKey shuffleKey = keyGen.generateKey();
        TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
            job.getCredentials());
      }
      if (CryptoUtils.isEncryptedSpillEnabled(conf)) {
        conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);
        LOG.warn("Max job attempts set to 1 since encrypted intermediate" +
                "data spill is enabled");
      }

      copyAndConfigureFiles(job, submitJobDir);    //将可执行文件之类拷贝到HDFS中

      Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
      
      // Create the splits for the job
      LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
      int maps = writeSplits(job, submitJobDir);    //生成切片,以切片的数量决定mapper数量
      conf.setInt(MRJobConfig.NUM_MAPS, maps);
      LOG.info("number of splits:" + maps);

      // write "queue admins of the queue to which job is being submitted"
      // to job file.
      String queue = conf.get(MRJobConfig.QUEUE_NAME,
          JobConf.DEFAULT_QUEUE_NAME);    //作业调度队列默认为default
      AccessControlList acl = submitClient.getQueueAdmins(queue);
      conf.set(toFullPropertyName(queue,
          QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());

      // removing jobtoken referrals before copying the jobconf to HDFS
      // as the tasks don't need this setting, actually they may break
      // because of it if present as the referral will point to a
      // different job.
      TokenCache.cleanUpTokenReferral(conf);

      if (conf.getBoolean(
          MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
          MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
        // Add HDFS tracking ids
        ArrayList<String> trackingIds = new ArrayList<String>();
        for (Token<? extends TokenIdentifier> t :
            job.getCredentials().getAllTokens()) {
          trackingIds.add(t.decodeIdentifier().getTrackingId());
        }
        conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
            trackingIds.toArray(new String[trackingIds.size()]));
      }

      // Set reservation info if it exists
      ReservationId reservationId = job.getReservationId();
      if (reservationId != null) {
        conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
      }

      // Write job file to submit dir
      writeConf(conf, submitJobFile);   //将conf的内容写入一个xml文件中
      
      //
      // Now, actually submit the job (using the submit name)
      //
      printTokens(jobId, job.getCredentials());
      status = submitClient.submitJob(
          jobId, submitJobDir.toString(), job.getCredentials());  //获取作业提交状态
      if (status != null) {
        return status;               //提交成功返回status
      } else {
        throw new IOException("Could not launch job");  //提交失败抛出异常
      }
    } finally {
      if (status == null) {           //如果提交失败了,要把之前穿件的目录删除
        LOG.info("Cleaning up the staging area " + submitJobDir);
        if (jtFs != null && submitJobDir != null)
          jtFs.delete(submitJobDir, true);  

      }
    }
  }
Configuration conf = job.getConfiguration();
    addMRFrameworkToDistributedCache(conf);

    Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf); //获取目录路径
    //configure the command line options correctly on the submitting dfs
    InetAddress ip = InetAddress.getLocalHost(); //获取本节点的ip地址
    if (ip != null) {
      submitHostAddress = ip.getHostAddress();  //本节点ip地址的字符串形式
      submitHostName = ip.getHostName();     //本节点名称
      conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
      conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
    }
    JobID jobId = submitClient.getNewJobID();  //生成一个作业ID
    job.setJobID(jobId);    //将作业id写入job对象
    Path submitJobDir = new Path(jobStagingArea, jobId.toString());
    JobStatus status = null;
    try {
      conf.set(MRJobConfig.USER_NAME,
          UserGroupInformation.getCurrentUser().getShortUserName());
      conf.set("hadoop.http.filter.initializers", 
          "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
      conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
      LOG.debug("Configuring job " + jobId + " with " + submitJobDir 
          + " as the submit dir");
      // get delegation token for the dir
      TokenCache.obtainTokensForNamenodes(job.getCredentials(),
          new Path[] { submitJobDir }, conf);   //获取与namenode打交道所需的证件
      
      populateTokenCache(conf, job.getCredentials());

      // generate a secret to authenticate shuffle transfers
      if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
        KeyGenerator keyGen;        //生成map与reduce数据流动的密码
        try {
          keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
          keyGen.init(SHUFFLE_KEY_LENGTH);
        } catch (NoSuchAlgorithmException e) {
          throw new IOException("Error generating shuffle secret key", e);
        }
        SecretKey shuffleKey = keyGen.generateKey();
        TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
            job.getCredentials());
      }
      if (CryptoUtils.isEncryptedSpillEnabled(conf)) {
        conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);
        LOG.warn("Max job attempts set to 1 since encrypted intermediate" +
                "data spill is enabled");
      }

      copyAndConfigureFiles(job, submitJobDir);    //将可执行文件之类拷贝到HDFS中

      Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
      
      // Create the splits for the job
      LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
      int maps = writeSplits(job, submitJobDir);    //生成切片,以切片的数量决定mapper数量
      conf.setInt(MRJobConfig.NUM_MAPS, maps);
      LOG.info("number of splits:" + maps);

      // write "queue admins of the queue to which job is being submitted"
      // to job file.
      String queue = conf.get(MRJobConfig.QUEUE_NAME,
          JobConf.DEFAULT_QUEUE_NAME);    //作业调度队列默认为default
      AccessControlList acl = submitClient.getQueueAdmins(queue);
      conf.set(toFullPropertyName(queue,
          QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());

      // removing jobtoken referrals before copying the jobconf to HDFS
      // as the tasks don't need this setting, actually they may break
      // because of it if present as the referral will point to a
      // different job.
      TokenCache.cleanUpTokenReferral(conf);

      if (conf.getBoolean(
          MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
          MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
        // Add HDFS tracking ids
        ArrayList<String> trackingIds = new ArrayList<String>();
        for (Token<? extends TokenIdentifier> t :
            job.getCredentials().getAllTokens()) {
          trackingIds.add(t.decodeIdentifier().getTrackingId());
        }
        conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
            trackingIds.toArray(new String[trackingIds.size()]));
      }

      // Set reservation info if it exists
      ReservationId reservationId = job.getReservationId();
      if (reservationId != null) {
        conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
      }

      // Write job file to submit dir
      writeConf(conf, submitJobFile);   //将conf的内容写入一个xml文件中
      
      //
      // Now, actually submit the job (using the submit name)
      //
      printTokens(jobId, job.getCredentials());
      status = submitClient.submitJob(
          jobId, submitJobDir.toString(), job.getCredentials());  //获取作业提交状态
      if (status != null) {
        return status;               //提交成功返回status
      } else {
        throw new IOException("Could not launch job");  //提交失败抛出异常
      }
    } finally {
      if (status == null) {           //如果提交失败了,要把之前穿件的目录删除
        LOG.info("Cleaning up the staging area " + submitJobDir);
        if (jtFs != null && submitJobDir != null)
          jtFs.delete(submitJobDir, true);  

      }
    }
  }

里面比较重要的两个方法主要在于下面这两个

copyAndConfigureFiles(job, submitJobDir  //将可执行文件之类拷贝到HDFS中
writeSplits(job, submitJobDir);    //生成切片,以切片的数量决定mapper数量 //生成切片,以切片的数量决定mapper数量

我们先到第一个方法中看一下它到底干了什么,都把那些文件拷贝到了hdfs中

再次进入uploadFiles()这个方法里面

public void uploadFiles(Job job, Path submitJobDir) throws IOException {
  Configuration conf = job.getConfiguration();
  short replication =                                                  //获取副本数,默认是10
      (short) conf.getInt(Job.SUBMIT_REPLICATION,
          Job.DEFAULT_SUBMIT_REPLICATION);
 
  if (!(conf.getBoolean(Job.USED_GENERIC_PARSER, false))) {
    LOG.warn("Hadoop command-line option parsing not performed. "
        + "Implement the Tool interface and execute your application "
        + "with ToolRunner to remedy this.");
  }
 
  // get all the command line arguments passed in by the user conf
  String files = conf.get("tmpfiles");      //下面这四个文件就是此方法要上传的内容了每样上传10份,当然下面还有很多代码。。。。
  String libjars = conf.get("tmpjars");
  String archives = conf.get("tmparchives");
  String jobJar = job.getJar();

我们在进入writeSplits这个方法内,看一下切片到底如何划分的

下面我们在进入writeNewSplits方法,发现InputFormat对象的input调用的getsplits方法是FileInputFormat类重写的方法,最后又调用了computeSplitSize方法,返回块的大小.

 protected long computeSplitSize(long blockSize, long minSize,

maxSize) {

return Math.max(minSize, Math.min(maxSize, blockSize)); //maxSize设置的大小,blockSize=128,minSize=1

}

最后将10份file,jobjars,libjars,archieves和切片信息全上传到了hdfs中,上传的目录是在stagingDir目录下,建一个以jobid为文件名的目录。

    4:向ResouceManage(RM) submit application

    5:RM收到submitAoolication()消息后,变将 请求传递给Yarn调度器根据application,RM调用start container,调度器分配一个容易,,然后RM 在节点管理器的管理下在容器中启动application master的进程

  6:    MRAppMaster进行initializeJob

  7:MRAppMaster从hdfs中调用retrieve inputsplits获取切片信息

  8:MRAppMaster向ResourceManager调用allocateResource方法,获取资源

 9:MRAppMaster开启一个容器,在NodeManager的管理下在容器中启动一个taskJVM,在jvm中启动一个YarnChild

 10:YarnChild从hdfs中获取jobresources

 11:run  maptask,默认map完成5%,开启reduceTask,但是这个可以调,通常设置80%

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值