Hadoop源码解读(Job提交)

最新推荐文章于 2022-11-02 09:33:11 发布

yujianxiaoyaobb

最新推荐文章于 2022-11-02 09:33:11 发布

阅读量276

点赞数 1

分类专栏： hadoop 文章标签： hadoop 大数据源代码

本文链接：https://blog.csdn.net/yujianxiaoyaobb/article/details/103184425

版权

hadoop 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Hadoop源码解读(Job提交)

Job提交入口

boolean flag = job.waitForCompletion(true);

进入waitForCompletion(true)方法

if (state == JobState.DEFINE) {
      submit();
}

判断当前的Job状态是否为DEFINE，如果是DEFINE状态就进入submit()方法。

进入submit()方法,这个方法比较重要，我们详细解读下：

下面是submit()里面的具体代码：

public void submit() 
         throws IOException, InterruptedException, ClassNotFoundException {
    ensureState(JobState.DEFINE);
    setUseNewAPI();
    connect();
    final JobSubmitter submitter = 
        getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
    status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
      public JobStatus run() throws IOException, InterruptedException, 
      ClassNotFoundException {
        return submitter.submitJobInternal(Job.this, cluster);
      }
    });
    state = JobState.RUNNING;
    LOG.info("The url to track the job: " + getTrackingURL());
   }

ensureState(JobState.DEFINE);再次确认当前的Job状态是否为DEFINE
setUseNewAPI();使用新的hadoopAPI

if (conf.getUseNewMapper()) {
      String mode = "new map API";
      ensureNotSet("mapred.input.format.class", mode);
      ensureNotSet(oldMapperClass, mode);
      if (numReduces != 0) {
        ensureNotSet("mapred.partitioner.class", mode);
       } else {
        ensureNotSet("mapred.output.format.class", mode);
      }      
    }

进入connect()方法中，此处是重点，这个方法目的是创建Cluster对象

  private synchronized void connect()
          throws IOException, InterruptedException, ClassNotFoundException {
    if (cluster == null) {
      cluster = 
        ugi.doAs(new PrivilegedExceptionAction<Cluster>() {
                   public Cluster run()
                          throws IOException, InterruptedException, 
                                 ClassNotFoundException {
                     return new Cluster(getConfiguration());
                   }
                 });
    }
  }

return new Cluster(getConfiguration());就是来创建Cluster对象，可以debug到这个方法里面观察它如何创建Cluster。

Cluster的构造方法里面调用了初始化一些参数的方法。即initialize(jobTrackAddr, conf)

下面是Cluster的构造方法和构造方法中初始化一些参数的方法，我主要挑几个重点的谈一下。

ClientProtocolProvider provider:是客户端协议的提供者,简单来说就是你的Job运行时的环境的提供者，hadoop程序跑的时候，主要在两个环境，一个是本地环境，另一个是集群，也就是yarn环境。而这个ClientProtocolProvider就可以提供现在你跑的环境。
ClientProtocol clientProtocol就是真正的你hadoop跑的环境，是本地还是yarn，这个是由ClientProtocolProvider 提供的，也就是下面的clientProtocol = provider.create(conf)。下图可以看到，ClientProtocol的两种LocalJobRunner和YARNRunner分别对应本地和yarn环境。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Bp5EvxQx-1574323266384)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1574318978532.png)]

public Cluster(InetSocketAddress jobTrackAddr, Configuration conf) 
      throws IOException {
    this.conf = conf;
    this.ugi = UserGroupInformation.getCurrentUser();
    initialize(jobTrackAddr, conf);
  }

private void initialize(InetSocketAddress jobTrackAddr, Configuration conf)
      throws IOException {

    synchronized (frameworkLoader) {
      for (ClientProtocolProvider provider : frameworkLoader) {
        LOG.debug("Trying ClientProtocolProvider : "
            + provider.getClass().getName());
        ClientProtocol clientProtocol = null; 
        try {
          if (jobTrackAddr == null) {
            clientProtocol = provider.create(conf);
          } else {
            clientProtocol = provider.create(jobTrackAddr, conf);
          }

          if (clientProtocol != null) {
            clientProtocolProvider = provider;
            client = clientProtocol;
            LOG.debug("Picked " + provider.getClass().getName()
                + " as the ClientProtocolProvider");
            break;
          }
          else {
            LOG.debug("Cannot pick " + provider.getClass().getName()
                + " as the ClientProtocolProvider - returned null protocol");
          }
        } 
        catch (Exception e) {
          LOG.info("Failed to use " + provider.getClass().getName()
              + " due to error: ", e);
        }
      }
    }

通过JobSumitter提交Job。将Job和集群Cluster对象传入。这个方法也是十分重要的。我们需要仔细研究。所以进入submitter.submitJobInternal(Job.this, cluster)中

status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
      public JobStatus run() throws IOException, InterruptedException, 
      ClassNotFoundException {
        return submitter.submitJobInternal(Job.this, cluster);
      }
    });

以下为return submitter.submitJobInternal(Job.this, cluster);的具体代码，我们详细解读：

JobStatus submitJobInternal(Job job, Cluster cluster) 
  throws ClassNotFoundException, InterruptedException, IOException {

    //validate the jobs output specs 
    checkSpecs(job);

    Configuration conf = job.getConfiguration();
    addMRFrameworkToDistributedCache(conf);

    Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
    //configure the command line options correctly on the submitting dfs
    InetAddress ip = InetAddress.getLocalHost();
    if (ip != null) {
      submitHostAddress = ip.getHostAddress();
      submitHostName = ip.getHostName();
      conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
      conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
    }
    JobID jobId = submitClient.getNewJobID();
    job.setJobID(jobId);
    Path submitJobDir = new Path(jobStagingArea, jobId.toString());
    JobStatus status = null;
    try {
      conf.set(MRJobConfig.USER_NAME,
          UserGroupInformation.getCurrentUser().getShortUserName());
      conf.set("hadoop.http.filter.initializers", 
          "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
      conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
      LOG.debug("Configuring job " + jobId + " with " + submitJobDir 
          + " as the submit dir");
      // get delegation token for the dir
      TokenCache.obtainTokensForNamenodes(job.getCredentials(),
          new Path[] { submitJobDir }, conf);
      
      populateTokenCache(conf, job.getCredentials());

      // generate a secret to authenticate shuffle transfers
      if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
        KeyGenerator keyGen;
        try {
          keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
          keyGen.init(SHUFFLE_KEY_LENGTH);
        } catch (NoSuchAlgorithmException e) {
          throw new IOException("Error generating shuffle secret key", e);
        }
        SecretKey shuffleKey = keyGen.generateKey();
        TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
            job.getCredentials());
      }
      if (CryptoUtils.isEncryptedSpillEnabled(conf)) {
        conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);
        LOG.warn("Max job attempts set to 1 since encrypted intermediate" +
                "data spill is enabled");
      }

      copyAndConfigureFiles(job, submitJobDir);

      Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
      
      // Create the splits for the job
      LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
      int maps = writeSplits(job, submitJobDir);
      conf.setInt(MRJobConfig.NUM_MAPS, maps);
      LOG.info("number of splits:" + maps);

      // write "queue admins of the queue to which job is being submitted"
      // to job file.
      String queue = conf.get(MRJobConfig.QUEUE_NAME,
          JobConf.DEFAULT_QUEUE_NAME);
      AccessControlList acl = submitClient.getQueueAdmins(queue);
      conf.set(toFullPropertyName(queue,
          QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());

      // removing jobtoken referrals before copying the jobconf to HDFS
      // as the tasks don't need this setting, actually they may break
      // because of it if present as the referral will point to a
      // different job.
      TokenCache.cleanUpTokenReferral(conf);

      if (conf.getBoolean(
          MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
          MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
        // Add HDFS tracking ids
        ArrayList<String> trackingIds = new ArrayList<String>();
        for (Token<? extends TokenIdentifier> t :
            job.getCredentials().getAllTokens()) {
          trackingIds.add(t.decodeIdentifier().getTrackingId());
        }
        conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
            trackingIds.toArray(new String[trackingIds.size()]));
      }

      // Set reservation info if it exists
      ReservationId reservationId = job.getReservationId();
      if (reservationId != null) {
        conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
      }

      // Write job file to submit dir
      writeConf(conf, submitJobFile);
      
      //
      // Now, actually submit the job (using the submit name)
      //
      printTokens(jobId, job.getCredentials());
      status = submitClient.submitJob(
          jobId, submitJobDir.toString(), job.getCredentials());
      if (status != null) {
        return status;
      } else {
        throw new IOException("Could not launch job");
      }
    } finally {
      if (status == null) {
        LOG.info("Cleaning up the staging area " + submitJobDir);
        if (jtFs != null && submitJobDir != null)
          jtFs.delete(submitJobDir, true);

      }
    }
  }

checkSpecs(job);这个方法主要是检查你输出路径是否设置，或者输出路径是否存在，如果有这两个问题，都会直接抛出异常

checkSpecs(job);

 // Ensure that the output directory is set and not already there
    Path outDir = getOutputPath(job);
    if (outDir == null) {
      throw new InvalidJobConfException("Output directory not set.");
    }

    // get delegation token for outDir's file system
    TokenCache.obtainTokensForNamenodes(job.getCredentials(),
        new Path[] { outDir }, job.getConfiguration());

    if (outDir.getFileSystem(job.getConfiguration()).exists(outDir)) {
      throw new FileAlreadyExistsException("Output directory " + outDir + 
                                           " already exists");
    }

下面的代码生成Job工作目录，可以进到这个方法中详细看看是如何生成这个Job工作目录的

Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);

此代码为详细的实现代码

public static Path getStagingDir(Cluster cluster, Configuration conf) 
  throws IOException,InterruptedException {
    Path stagingArea = cluster.getStagingAreaDir();
    FileSystem fs = stagingArea.getFileSystem(conf);
    String realUser;
    String currentUser;
    UserGroupInformation ugi = UserGroupInformation.getLoginUser();
    realUser = ugi.getShortUserName();
    currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
    if (fs.exists(stagingArea)) {
      FileStatus fsStatus = fs.getFileStatus(stagingArea);
      String owner = fsStatus.getOwner();
      if (!(owner.equals(currentUser) || owner.equals(realUser))) {
         throw new IOException("The ownership on the staging directory " +
                      stagingArea + " is not as expected. " +
                      "It is owned by " + owner + ". The directory must " +
                      "be owned by the submitter " + currentUser + " or " +
                      "by " + realUser);
      }
      if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
        LOG.info("Permissions on staging directory " + stagingArea + " are " +
          "incorrect: " + fsStatus.getPermission() + ". Fixing permissions " +
          "to correct value " + JOB_DIR_PERMISSION);
        fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
      }
    } else {
      fs.mkdirs(stagingArea, 
          new FsPermission(JOB_DIR_PERMISSION));
    }
    return stagingArea;
  }

Path stagingArea = cluster.getStagingAreaDir();生成Job工作目录的文件夹
FileSystem fs = stagingArea.getFileSystem(conf);获取到hdfs的文件系统，方便后续操作hdfs
因为Job一开始并没有Job的工作目录，所以走的是else，

即 fs.mkdirs(stagingArea, new FsPermission(JOB_DIR_PERMISSION));

创建了Job工作目录。因为我是在本地运行的。所以在我的window中找到了Job工作目录

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-OoLJygnS-1574323266385)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1574320773479.png)]

JobID jobId = submitClient.getNewJobID();生成你当前这个Job的jobId
Path submitJobDir = new Path(jobStagingArea, jobId.toString());将jobStagingArea和 jobId拼接，即将Job的工作目录和你当前这个Job的Id拼接组合，在这个文件夹里，专门存放你当前这个Job所需要的一些配置文件，切片信息，jar包等等（不过这个文件夹还未创建）
copyAndConfigureFiles(job, submitJobDir);是真正将之前拼接好的当前这个Job工作的目录创建出来

以下为创建出来的文件夹

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xboePVIY-1574323266385)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1574321415086.png)]

int maps = writeSplits(job, submitJobDir);生成切片信息，返回切片的数量并且将切片的文件放入。不过这部分比较复杂，我会在后续的文章中详细解读。
writeConf(conf, submitJobFile);将job执行的配置信息写入到工作目录中

写出的结果如下图

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QgoffE7Y-1574323266386)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1574322560840.png)]

status = submitClient.submitJob(jobId, submitJobDir.toString(), job.getCredentials());

真正提交Job

jtFs.delete(submitJobDir, true);Job执行结束后删除当前Job的工作目录

bDir);生成切片信息，返回切片的数量并且将切片的文件放入。不过这部分比较复杂，我会在后续的文章中详细解读。

	7. writeConf(conf, submitJobFile);将job执行的配置信息写入到工作目录中

    写出的结果如下图

    [外链图片转存中...(img-QgoffE7Y-1574323266386)]

	8. status = submitClient.submitJob(jobId, submitJobDir.toString(), job.getCredentials());

    真正提交Job

	9. jtFs.delete(submitJobDir, true);Job执行结束后删除当前Job的工作目录

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WdtT0Vla-1574323266386)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1574322825425.png)]