Hadoop源码解读(Job提交)

Hadoop源码解读(Job提交)

  1. Job提交入口

    boolean flag = job.waitForCompletion(true);
    
  2. 进入waitForCompletion(true)方法

    if (state == JobState.DEFINE) {
          submit();
    }
    

    判断当前的Job状态是否为DEFINE,如果是DEFINE状态就进入submit()方法。

    1. 进入submit()方法,这个方法比较重要,我们详细解读下:

      下面是submit()里面的具体代码:

      public void submit() 
               throws IOException, InterruptedException, ClassNotFoundException {
          ensureState(JobState.DEFINE);
          setUseNewAPI();
          connect();
          final JobSubmitter submitter = 
              getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
          status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
            public JobStatus run() throws IOException, InterruptedException, 
            ClassNotFoundException {
              return submitter.submitJobInternal(Job.this, cluster);
            }
          });
          state = JobState.RUNNING;
          LOG.info("The url to track the job: " + getTrackingURL());
         }
      
      1. ensureState(JobState.DEFINE);再次确认当前的Job状态是否为DEFINE

      2. setUseNewAPI();使用新的hadoopAPI

      • if (conf.getUseNewMapper()) {
              String mode = "new map API";
              ensureNotSet("mapred.input.format.class", mode);
              ensureNotSet(oldMapperClass, mode);
              if (numReduces != 0) {
                ensureNotSet("mapred.partitioner.class", mode);
               } else {
                ensureNotSet("mapred.output.format.class", mode);
              }      
            }
        
      1. 进入connect()方法中,此处是重点,这个方法目的是创建Cluster对象
        private synchronized void connect()
                throws IOException, InterruptedException, ClassNotFoundException {
          if (cluster == null) {
            cluster = 
              ugi.doAs(new PrivilegedExceptionAction<Cluster>() {
                         public Cluster run()
                                throws IOException, InterruptedException, 
                                       ClassNotFoundException {
                           return new Cluster(getConfiguration());
                         }
                       });
          }
        }
      

      return new Cluster(getConfiguration());就是来创建Cluster对象,可以debug到这个方法里面观察它如何创建Cluster。

      1. Cluster的构造方法里面调用了初始化一些参数的方法。即initialize(jobTrackAddr, conf)

        下面是Cluster的构造方法和构造方法中初始化一些参数的方法,我主要挑几个重点的谈一下。

        1. ClientProtocolProvider provider:是客户端协议的提供者,简单来说就是你的Job运行时的环境的提供者,hadoop程序跑的时候,主要在两个环境,一个是本地环境,另一个是集群,也就是yarn环境。而这个ClientProtocolProvider就可以提供现在你跑的环境。

        2. ClientProtocol clientProtocol就是真正的你hadoop跑的环境,是本地还是yarn,这个是由ClientProtocolProvider 提供的,也就是下面的clientProtocol = provider.create(conf)。下图可以看到,ClientProtocol的两种LocalJobRunner和YARNRunner分别对应本地和yarn环境。

          [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Bp5EvxQx-1574323266384)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1574318978532.png)]

        public Cluster(InetSocketAddress jobTrackAddr, Configuration conf) 
              throws IOException {
            this.conf = conf;
            this.ugi = UserGroupInformation.getCurrentUser();
            initialize(jobTrackAddr, conf);
          }
        
        private void initialize(InetSocketAddress jobTrackAddr, Configuration conf)
              throws IOException {
        
            synchronized (frameworkLoader) {
              for (ClientProtocolProvider provider : frameworkLoader) {
                LOG.debug("Trying ClientProtocolProvider : "
                    + provider.getClass().getName());
                ClientProtocol clientProtocol = null; 
                try {
                  if (jobTrackAddr == null) {
                    clientProtocol = provider.create(conf);
                  } else {
                    clientProtocol = provider.create(jobTrackAddr, conf);
                  }
        
                  if (clientProtocol != null) {
                    clientProtocolProvider = provider;
                    client = clientProtocol;
                    LOG.debug("Picked " + provider.getClass().getName()
                        + " as the ClientProtocolProvider");
                    break;
                  }
                  else {
                    LOG.debug("Cannot pick " + provider.getClass().getName()
                        + " as the ClientProtocolProvider - returned null protocol");
                  }
                } 
                catch (Exception e) {
                  LOG.info("Failed to use " + provider.getClass().getName()
                      + " due to error: ", e);
                }
              }
            }
        
      2. 通过JobSumitter提交Job。将Job和集群Cluster对象传入。这个方法也是十分重要的。我们需要仔细研究。所以进入submitter.submitJobInternal(Job.this, cluster)中

    status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
          public JobStatus run() throws IOException, InterruptedException, 
          ClassNotFoundException {
            return submitter.submitJobInternal(Job.this, cluster);
          }
        });
    

    以下为return submitter.submitJobInternal(Job.this, cluster);的具体代码,我们详细解读:

    JobStatus submitJobInternal(Job job, Cluster cluster) 
      throws ClassNotFoundException, InterruptedException, IOException {
    
        //validate the jobs output specs 
        checkSpecs(job);
    
        Configuration conf = job.getConfiguration();
        addMRFrameworkToDistributedCache(conf);
    
        Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
        //configure the command line options correctly on the submitting dfs
        InetAddress ip = InetAddress.getLocalHost();
        if (ip != null) {
          submitHostAddress = ip.getHostAddress();
          submitHostName = ip.getHostName();
          conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
          conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
        }
        JobID jobId = submitClient.getNewJobID();
        job.setJobID(jobId);
        Path submitJobDir = new Path(jobStagingArea, jobId.toString());
        JobStatus status = null;
        try {
          conf.set(MRJobConfig.USER_NAME,
              UserGroupInformation.getCurrentUser().getShortUserName());
          conf.set("hadoop.http.filter.initializers", 
              "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
          conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
          LOG.debug("Configuring job " + jobId + " with " + submitJobDir 
              + " as the submit dir");
          // get delegation token for the dir
          TokenCache.obtainTokensForNamenodes(job.getCredentials(),
              new Path[] { submitJobDir }, conf);
          
          populateTokenCache(conf, job.getCredentials());
    
          // generate a secret to authenticate shuffle transfers
          if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
            KeyGenerator keyGen;
            try {
              keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
              keyGen.init(SHUFFLE_KEY_LENGTH);
            } catch (NoSuchAlgorithmException e) {
              throw new IOException("Error generating shuffle secret key", e);
            }
            SecretKey shuffleKey = keyGen.generateKey();
            TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
                job.getCredentials());
          }
          if (CryptoUtils.isEncryptedSpillEnabled(conf)) {
            conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);
            LOG.warn("Max job attempts set to 1 since encrypted intermediate" +
                    "data spill is enabled");
          }
    
          copyAndConfigureFiles(job, submitJobDir);
    
          Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
          
          // Create the splits for the job
          LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
          int maps = writeSplits(job, submitJobDir);
          conf.setInt(MRJobConfig.NUM_MAPS, maps);
          LOG.info("number of splits:" + maps);
    
          // write "queue admins of the queue to which job is being submitted"
          // to job file.
          String queue = conf.get(MRJobConfig.QUEUE_NAME,
              JobConf.DEFAULT_QUEUE_NAME);
          AccessControlList acl = submitClient.getQueueAdmins(queue);
          conf.set(toFullPropertyName(queue,
              QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());
    
          // removing jobtoken referrals before copying the jobconf to HDFS
          // as the tasks don't need this setting, actually they may break
          // because of it if present as the referral will point to a
          // different job.
          TokenCache.cleanUpTokenReferral(conf);
    
          if (conf.getBoolean(
              MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
              MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
            // Add HDFS tracking ids
            ArrayList<String> trackingIds = new ArrayList<String>();
            for (Token<? extends TokenIdentifier> t :
                job.getCredentials().getAllTokens()) {
              trackingIds.add(t.decodeIdentifier().getTrackingId());
            }
            conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
                trackingIds.toArray(new String[trackingIds.size()]));
          }
    
          // Set reservation info if it exists
          ReservationId reservationId = job.getReservationId();
          if (reservationId != null) {
            conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
          }
    
          // Write job file to submit dir
          writeConf(conf, submitJobFile);
          
          //
          // Now, actually submit the job (using the submit name)
          //
          printTokens(jobId, job.getCredentials());
          status = submitClient.submitJob(
              jobId, submitJobDir.toString(), job.getCredentials());
          if (status != null) {
            return status;
          } else {
            throw new IOException("Could not launch job");
          }
        } finally {
          if (status == null) {
            LOG.info("Cleaning up the staging area " + submitJobDir);
            if (jtFs != null && submitJobDir != null)
              jtFs.delete(submitJobDir, true);
    
          }
        }
      }
    
    1. checkSpecs(job);这个方法主要是检查你输出路径是否设置,或者输出路径是否存在,如果有这两个问题,都会直接抛出异常

      checkSpecs(job);
      
      •  // Ensure that the output directory is set and not already there
            Path outDir = getOutputPath(job);
            if (outDir == null) {
              throw new InvalidJobConfException("Output directory not set.");
            }
        
            // get delegation token for outDir's file system
            TokenCache.obtainTokensForNamenodes(job.getCredentials(),
                new Path[] { outDir }, job.getConfiguration());
        
            if (outDir.getFileSystem(job.getConfiguration()).exists(outDir)) {
              throw new FileAlreadyExistsException("Output directory " + outDir + 
                                                   " already exists");
            }     
        
      1. 下面的代码生成Job工作目录,可以进到这个方法中详细看看是如何生成这个Job工作目录的
      Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
      
      • 此代码为详细的实现代码

        public static Path getStagingDir(Cluster cluster, Configuration conf) 
          throws IOException,InterruptedException {
            Path stagingArea = cluster.getStagingAreaDir();
            FileSystem fs = stagingArea.getFileSystem(conf);
            String realUser;
            String currentUser;
            UserGroupInformation ugi = UserGroupInformation.getLoginUser();
            realUser = ugi.getShortUserName();
            currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
            if (fs.exists(stagingArea)) {
              FileStatus fsStatus = fs.getFileStatus(stagingArea);
              String owner = fsStatus.getOwner();
              if (!(owner.equals(currentUser) || owner.equals(realUser))) {
                 throw new IOException("The ownership on the staging directory " +
                              stagingArea + " is not as expected. " +
                              "It is owned by " + owner + ". The directory must " +
                              "be owned by the submitter " + currentUser + " or " +
                              "by " + realUser);
              }
              if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
                LOG.info("Permissions on staging directory " + stagingArea + " are " +
                  "incorrect: " + fsStatus.getPermission() + ". Fixing permissions " +
                  "to correct value " + JOB_DIR_PERMISSION);
                fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
              }
            } else {
              fs.mkdirs(stagingArea, 
                  new FsPermission(JOB_DIR_PERMISSION));
            }
            return stagingArea;
          }
        
      • Path stagingArea = cluster.getStagingAreaDir();生成Job工作目录的文件夹

      • FileSystem fs = stagingArea.getFileSystem(conf);获取到hdfs的文件系统,方便后续操作hdfs

      • 因为Job一开始并没有Job的工作目录,所以走的是else,

        即 fs.mkdirs(stagingArea, new FsPermission(JOB_DIR_PERMISSION));

        创建了Job工作目录。因为我是在本地运行的。所以在我的window中找到了Job工作目录

        [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-OoLJygnS-1574323266385)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1574320773479.png)]

      1. JobID jobId = submitClient.getNewJobID();生成你当前这个Job的jobId

      2. Path submitJobDir = new Path(jobStagingArea, jobId.toString());将jobStagingArea和 jobId拼接,即将Job的工作目录和你当前这个Job的Id拼接组合,在这个文件夹里,专门存放你当前这个Job所需要的一些配置文件,切片信息,jar包等等(不过这个文件夹还未创建)

      3. copyAndConfigureFiles(job, submitJobDir);是真正将之前拼接好的当前这个Job工作的目录创建出来

      以下为创建出来的文件夹

      [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xboePVIY-1574323266385)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1574321415086.png)]

      1. int maps = writeSplits(job, submitJobDir);生成切片信息,返回切片的数量并且将切片的文件放入。不过这部分比较复杂,我会在后续的文章中详细解读。

      2. writeConf(conf, submitJobFile);将job执行的配置信息写入到工作目录中

      写出的结果如下图

      [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QgoffE7Y-1574323266386)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1574322560840.png)]

      1. status = submitClient.submitJob(jobId, submitJobDir.toString(), job.getCredentials());

      真正提交Job

      1. jtFs.delete(submitJobDir, true);Job执行结束后删除当前Job的工作目录

bDir);生成切片信息,返回切片的数量并且将切片的文件放入。不过这部分比较复杂,我会在后续的文章中详细解读。

	7. writeConf(conf, submitJobFile);将job执行的配置信息写入到工作目录中

    写出的结果如下图

    [外链图片转存中...(img-QgoffE7Y-1574323266386)]

	8. status = submitClient.submitJob(jobId, submitJobDir.toString(), job.getCredentials());

    真正提交Job

	9. jtFs.delete(submitJobDir, true);Job执行结束后删除当前Job的工作目录

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WdtT0Vla-1574323266386)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1574322825425.png)]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值