MapReduce深入源码分析job提交的整个过程

最新推荐文章于 2023-04-07 22:45:00 发布

_东极

最新推荐文章于 2023-04-07 22:45:00 发布

阅读量360

点赞数

分类专栏： MapReduce

年轻人应该玩转风口浪尖的一切技术

本文链接：https://blog.csdn.net/wwwzydcom/article/details/83957268

版权

MapReduce 专栏收录该内容

21 篇文章 0 订阅

订阅专栏

/**
   * Submit the job to the cluster and wait for it to finish.
   * @param verbose print the progress to the user
   * @return true if the job succeeded
   * @throws IOException thrown if the communication with the 
   *         <code>JobTracker</code> is lost
   */
  public boolean waitForCompletion(boolean verbose
                                   ) throws IOException, InterruptedException,
                                            ClassNotFoundException {
    if (state == JobState.DEFINE) {
      submit(); //提交作业
    }
    if (verbose) {
      monitorAndPrintJob();
    } else {
      // get the completion poll interval from the client.
      int completionPollIntervalMillis = 
        Job.getCompletionPollInterval(cluster.getConf());
      while (!isComplete()) {
        try {
          Thread.sleep(completionPollIntervalMillis);
        } catch (InterruptedException ie) {
        }
      }
    }
    return isSuccessful();
  }

进入submit

/**
   * Submit the job to the cluster and return immediately.
   * @throws IOException
   */
  public void submit() 
         throws IOException, InterruptedException, ClassNotFoundException {
    ensureState(JobState.DEFINE);
    setUseNewAPI();		//老旧的API进行转换 属性兼容性的考虑
    connect(); //客户端和集群之间的连接,连接可以是本地的,也可以是yarn
    final JobSubmitter submitter = 
        getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
    status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
      public JobStatus run() throws IOException, InterruptedException, 
      ClassNotFoundException {
        return submitter.submitJobInternal(Job.this, cluster);
      }
    });
    state = JobState.RUNNING;
    LOG.info("The url to track the job: " + getTrackingURL());
   }

连接时判断集群,如果为空,

private synchronized void connect()
          throws IOException, InterruptedException, ClassNotFoundException {
    if (cluster == null) {
      cluster = 
        ugi.doAs(new PrivilegedExceptionAction<Cluster>() {
                   public Cluster run()
                          throws IOException, InterruptedException, 
                                 ClassNotFoundException {
                     return new Cluster(getConfiguration());
                   }
                 });
    }
  }

获取切片信息

	 private int writeSplits(org.apache.hadoop.mapreduce.JobContext job,
	      Path jobSubmitDir) throws IOException,
	      InterruptedException, ClassNotFoundException {
	    JobConf jConf = (JobConf)job.getConfiguration();
	    int maps;
	    if (jConf.getUseNewMapper()) {
	      maps = writeNewSplits(job, jobSubmitDir);
	    } else {
	      maps = writeOldSplits(jConf, jobSubmitDir);
	    }
	    return maps;
	  }

获取块的大小信息goalSize Long的长度大小

protected long computeSplitSize(long goalSize, long minSize,
                                       long blockSize) {
    return Math.max(minSize, Math.min(goalSize, blockSize));
  }
  
  调整切片任务的大小

切片信息

 long bytesRemaining = length;
          while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) {
            String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations,
                length-bytesRemaining, splitSize, clusterMap);
            splits.add(makeSplit(path, length-bytesRemaining, splitSize,
                splitHosts[0], splitHosts[1]));
            bytesRemaining -= splitSize;
          }

如果是相除大于1.1倍就不切片,小于1.1倍切片

private static final double SPLIT_SLOP = 1.1;

一个个文件的切片

大概流程

客户端代码

waitForCompletion()

源码

submit();

1. 建立连接

	connect();

1）创建提交job的代理

	new Cluster(getConfiguration());

（1）判断是本地yarn还是远程

	initialize(jobTrackAddr, conf)

2. 提交job

submitter.submitJobInternal(Job.this, cluster)

1）创建给集群提交数据的Stag路径

Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);

获取jobId 并创建job路径

 JobID jobId = submitClient.getNewJobID();

3）拷贝jar包到集群

copyAndConfigureFiles(job, submitJobDir);	
	rUploader.uploadFiles(job, jobSubmitDir);

4）计算切片，生成切片规划文件

writeSplits(job, submitJobDir);
	maps = writeNewSplits(job, jobSubmitDir);
		input.getSplits(job);

5）向Stag路径写xml配置文件

writeConf(conf, submitJobFile);
	conf.writeXml(out);

6）提交job,返回提交状态

status = submitClient.submitJob(jobId, submitJobDir.toString(), job.getCredentials());

_东极

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录