大数据—Hadoop（十一）_ MapReduce_04、核心框架原理_源码（1）_ Job提交流程

大数据之负

已于 2022-10-23 23:00:29 修改

阅读量319

点赞数

分类专栏： Hadoop 文章标签： hadoop 大数据 mapreduce

于 2022-10-22 10:28:07 首次发布

本文链接：https://blog.csdn.net/m0_52968216/article/details/127385489

版权

Hadoop 专栏收录该内容

24 篇文章 2 订阅

订阅专栏

文章目录

1、Job提交流程源码
2、总结

1、Job提交流程源码

类名：WordCountDriver

1、boolean result = job.waitForCompletion(true);

类名：Job

2、

public boolean waitForCompletion(boolean verbose
                                  ) throws IOException, InterruptedException,
                                           ClassNotFoundException {
   if (state == JobState.DEFINE) {
     submit();
   }
   if (verbose) {
     monitorAndPrintJob();
   } else {
     // get the completion poll interval from the client.
     int completionPollIntervalMillis = 
       Job.getCompletionPollInterval(cluster.getConf());
     while (!isComplete()) {
       try {
         Thread.sleep(completionPollIntervalMillis);
       } catch (InterruptedException ie) {
       }
     }
   }
   return isSuccessful();
}

类名：Job

1.1 开始提交

3、默认是DEFINE，进入submit() 开始提交

4、

public void submit() 
       throws IOException, InterruptedException, ClassNotFoundException {
  ensureState(JobState.DEFINE);
  setUseNewAPI();
  connect();
  final JobSubmitter submitter = 
      getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
  status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
    public JobStatus run() throws IOException, InterruptedException, 
    ClassNotFoundException {
      return submitter.submitJobInternal(Job.this, cluster);
    }
  });
  state = JobState.RUNNING;
  LOG.info("The url to track the job: " + getTrackingURL());
}

1.1.1 再次确认状态是否是DEFINE，如果不对，会抛异常，不重要，一般都是DEFINE

5、ensureState(JobState.DEFINE);

private void ensureState(JobState state) throws IllegalStateException {
  if (state != this.state) {
    throw new IllegalStateException("Job in state "+ this.state + 
                                    " instead of " + state);
  }

  if (state == JobState.RUNNING && cluster == null) {
    throw new IllegalStateException
      ("Job in state " + this.state
       + ", but it isn't attached to any job tracker!");
  }
}

1.1.2 处理新旧API的兼容性问题，看的意义不大（1.X和2.X，3.X版本差异）

6、setUseNewAPI();
处理新旧API的兼容性问题，看的意义不大，（1.X和2.X，3.X）

1.1.3 处理连接代码（最终返回是集群运行还是本地运行）

7、 connect();
8、

private synchronized void connect()
        throws IOException, InterruptedException, ClassNotFoundException {
  if (cluster == null) {
    cluster = 
      ugi.doAs(new PrivilegedExceptionAction<Cluster>() {
                 public Cluster run()
                        throws IOException, InterruptedException, 
                               ClassNotFoundException {
                   return new Cluster(getConfiguration());
                 }
               });
  }
}

9、创建集群对象 Cluster

return new Cluster(getConfiguration());

10、构造器

public Cluster(Configuration conf) throws IOException {
	this(null, conf);
}

11、

public Cluster(InetSocketAddress jobTrackAddr, Configuration conf) 
  throws IOException {
this.conf = conf;
this.ugi = UserGroupInformation.getCurrentUser();
initialize(jobTrackAddr, conf);
}

12、初始化

initialize(jobTrackAddr, conf);

13、初始化方法

private void initialize(InetSocketAddress jobTrackAddr, Configuration conf)
  throws IOException {

	initProviderList();
	final IOException initEx = new IOException(
	    "Cannot initialize Cluster. Please check your configuration for "
	        + MRConfig.FRAMEWORK_NAME
	        + " and the correspond server addresses.");
	if (jobTrackAddr != null) {
	  LOG.info(
	      "Initializing cluster for Job Tracker=" + jobTrackAddr.toString());
	}
	for (ClientProtocolProvider provider : providerList) {
	  LOG.debug("Trying ClientProtocolProvider : "
	      + provider.getClass().getName());
	  ClientProtocol clientProtocol = null;
	  try {
	    if (jobTrackAddr == null) {
	      clientProtocol = provider.create(conf);
	    } else {
	      clientProtocol = provider.create(jobTrackAddr, conf);
	    }
	
	    if (clientProtocol != null) {
	      clientProtocolProvider = provider;
	      client = clientProtocol;
	      LOG.debug("Picked " + provider.getClass().getName()
	          + " as the ClientProtocolProvider");
	      break;
	    } else {
	      LOG.debug("Cannot pick " + provider.getClass().getName()
	          + " as the ClientProtocolProvider - returned null protocol");
	    }
	  } catch (Exception e) {
	    final String errMsg = "Failed to use " + provider.getClass().getName()
	        + " due to error: ";
	    initEx.addSuppressed(new IOException(errMsg, e));
	    LOG.info(errMsg, e);
	  }
	}
	
	if (null == clientProtocolProvider || null == client) {
	  throw initEx;
	}
}

14、通过循环，确定是yarn客户端还是local客户端

for (ClientProtocolProvider provider : providerList) {
  LOG.debug("Trying ClientProtocolProvider : "
      + provider.getClass().getName());
  ClientProtocol clientProtocol = null;
  try {
    if (jobTrackAddr == null) {
      clientProtocol = provider.create(conf);
    } else {
      clientProtocol = provider.create(jobTrackAddr, conf);
    }

    if (clientProtocol != null) {
      clientProtocolProvider = provider;
      client = clientProtocol;
      LOG.debug("Picked " + provider.getClass().getName()
          + " as the ClientProtocolProvider");
      break;
    } else {
      LOG.debug("Cannot pick " + provider.getClass().getName()
          + " as the ClientProtocolProvider - returned null protocol");
    }
  } catch (Exception e) {
    final String errMsg = "Failed to use " + provider.getClass().getName()
        + " due to error: ";
    initEx.addSuppressed(new IOException(errMsg, e));
    LOG.info(errMsg, e);
  }
}

1.1.4 核心代码，向集群或者本地提交信息

15、return submitter.submitJobInternal(Job.this, cluster);
确定提交的时候主要是提交job.xml和job.split，如果是yarn模式，还需要提交jar包

类名：submitter

16、

JobStatus submitJobInternal(Job job, Cluster cluster) 
throws ClassNotFoundException, InterruptedException, IOException {

  //validate the jobs output specs 
  checkSpecs(job);

  Configuration conf = job.getConfiguration();
  addMRFrameworkToDistributedCache(conf);

  Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
  //configure the command line options correctly on the submitting dfs
  InetAddress ip = InetAddress.getLocalHost();
  if (ip != null) {
    submitHostAddress = ip.getHostAddress();
    submitHostName = ip.getHostName();
    conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName);
    conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress);
  }
  JobID jobId = submitClient.getNewJobID();
  job.setJobID(jobId);
  Path submitJobDir = new Path(jobStagingArea, jobId.toString());
  JobStatus status = null;
  try {
    conf.set(MRJobConfig.USER_NAME,
        UserGroupInformation.getCurrentUser().getShortUserName());
    conf.set("hadoop.http.filter.initializers", 
        "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer");
    conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString());
    LOG.debug("Configuring job " + jobId + " with " + submitJobDir 
        + " as the submit dir");
    // get delegation token for the dir
    TokenCache.obtainTokensForNamenodes(job.getCredentials(),
        new Path[] { submitJobDir }, conf);
    
    populateTokenCache(conf, job.getCredentials());

    // generate a secret to authenticate shuffle transfers
    if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) {
      KeyGenerator keyGen;
      try {
        keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM);
        keyGen.init(SHUFFLE_KEY_LENGTH);
      } catch (NoSuchAlgorithmException e) {
        throw new IOException("Error generating shuffle secret key", e);
      }
      SecretKey shuffleKey = keyGen.generateKey();
      TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(),
          job.getCredentials());
    }
    if (CryptoUtils.isEncryptedSpillEnabled(conf)) {
      conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1);
      LOG.warn("Max job attempts set to 1 since encrypted intermediate" +
              "data spill is enabled");
    }

    copyAndConfigureFiles(job, submitJobDir);

    Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir);
    
    // Create the splits for the job
    LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
    int maps = writeSplits(job, submitJobDir);
    conf.setInt(MRJobConfig.NUM_MAPS, maps);
    LOG.info("number of splits:" + maps);

    int maxMaps = conf.getInt(MRJobConfig.JOB_MAX_MAP,
        MRJobConfig.DEFAULT_JOB_MAX_MAP);
    if (maxMaps >= 0 && maxMaps < maps) {
      throw new IllegalArgumentException("The number of map tasks " + maps +
          " exceeded limit " + maxMaps);
    }

    // write "queue admins of the queue to which job is being submitted"
    // to job file.
    String queue = conf.get(MRJobConfig.QUEUE_NAME,
        JobConf.DEFAULT_QUEUE_NAME);
    AccessControlList acl = submitClient.getQueueAdmins(queue);
    conf.set(toFullPropertyName(queue,
        QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString());

    // removing jobtoken referrals before copying the jobconf to HDFS
    // as the tasks don't need this setting, actually they may break
    // because of it if present as the referral will point to a
    // different job.
    TokenCache.cleanUpTokenReferral(conf);

    if (conf.getBoolean(
        MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED,
        MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) {
      // Add HDFS tracking ids
      ArrayList<String> trackingIds = new ArrayList<String>();
      for (Token<? extends TokenIdentifier> t :
          job.getCredentials().getAllTokens()) {
        trackingIds.add(t.decodeIdentifier().getTrackingId());
      }
      conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS,
          trackingIds.toArray(new String[trackingIds.size()]));
    }

    // Set reservation info if it exists
    ReservationId reservationId = job.getReservationId();
    if (reservationId != null) {
      conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString());
    }

    // Write job file to submit dir
    writeConf(conf, submitJobFile);
    
    //
    // Now, actually submit the job (using the submit name)
    //
    printTokens(jobId, job.getCredentials());
    status = submitClient.submitJob(
        jobId, submitJobDir.toString(), job.getCredentials());
    if (status != null) {
      return status;
    } else {
      throw new IOException("Could not launch job");
    }
  } finally {
    if (status == null) {
      LOG.info("Cleaning up the staging area " + submitJobDir);
      if (jtFs != null && submitJobDir != null)
        jtFs.delete(submitJobDir, true);

    }
  }
}

17、checkSpecs(job);
验证输出路径(有且不能存在)

18、

private void checkSpecs(Job job) throws ClassNotFoundException, 
  InterruptedException, IOException {
	JobConf jConf = (JobConf)job.getConfiguration();
	// Check the output specification
	if (jConf.getNumReduceTasks() == 0 ? 
	    jConf.getUseNewMapper() : jConf.getUseNewReducer()) {
	  org.apache.hadoop.mapreduce.OutputFormat<?, ?> output =
	    ReflectionUtils.newInstance(job.getOutputFormatClass(),
	      job.getConfiguration());
	  output.checkOutputSpecs(job);
	} else {
	  jConf.getOutputFormat().checkOutputSpecs(jtFs, jConf);
	}
}

19、output.checkOutputSpecs(job);
检查输出路径是否存在

20、

public void checkOutputSpecs(JobContext job
                           ) throws FileAlreadyExistsException, IOException{
	// Ensure that the output directory is set and not already there
	Path outDir = getOutputPath(job);
	if (outDir == null) {
	  throw new InvalidJobConfException("Output directory not set.");
	}
	
	// get delegation token for outDir's file system
	TokenCache.obtainTokensForNamenodes(job.getCredentials(),
	    new Path[] { outDir }, job.getConfiguration());
	
	if (outDir.getFileSystem(job.getConfiguration()).exists(outDir)) {
	  throw new FileAlreadyExistsException("Output directory " + outDir + 
	                                       " already exists");
	}
}

21、
输出路径为空，抛出输出路径没有设置异常
Driver中输出路径：file:/D:/hadoop/output20

if (outDir == null) {
  throw new InvalidJobConfException("Output directory not set.");
}

22、
抛出路径已经存在，会报路径已经存在异常

if (outDir.getFileSystem(job.getConfiguration()).exists(outDir)) {
  throw new FileAlreadyExistsException("Output directory " + outDir + 
                                       " already exists");
}

23、创建stag路径，用于临时缓存文件

Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);

jobStagingArea = file:/tmp/hadoop/mapred/staging/1111289279828/.staging

24、创建JobID，每一个任务都有一个独一无二的JobID

JobID jobId = submitClient.getNewJobID();

jobId = job_local1289279828_0001

25、在刚才的路径后面加入JobID

Path submitJobDir = new Path(jobStagingArea, jobId.toString());

submitJobDir = file:/tmp/hadoop/mapred/staging/1111289279828/.staging/job_local1289279828_0001

26、拷贝或者配置相关信息

copyAndConfigureFiles(job, submitJobDir);

创建一个带有job_id的临时空文件夹
并且向集群提交代码的时候，如果是客户端模式，当前代码的Jar包是一定会被上传到yarn集群的。
如果是local模式，jar包就在本地，不需要提交

27、

private void copyAndConfigureFiles(Job job, Path jobSubmitDir) 
  throws IOException {
	Configuration conf = job.getConfiguration();
	boolean useWildcards = conf.getBoolean(Job.USE_WILDCARD_FOR_LIBJARS,
	    Job.DEFAULT_USE_WILDCARD_FOR_LIBJARS);
	JobResourceUploader rUploader = new JobResourceUploader(jtFs, useWildcards);
	
	rUploader.uploadResources(job, jobSubmitDir);
	
	// Get the working directory. If not set, sets it to filesystem working dir
	// This code has been added so that working directory reset before running
	// the job. This is necessary for backward compatibility as other systems
	// might use the public API JobConf#setWorkingDirectory to reset the working
	// directory.
	job.getWorkingDirectory();
}

28、查看提交了哪些信息

rUploader.uploadResources(job, jobSubmitDir);

29、

public void uploadResources(Job job, Path submitJobDir) throws IOException {
	try {
	  initSharedCache(job.getJobID(), job.getConfiguration());
	  uploadResourcesInternal(job, submitJobDir);
	} finally {
	  stopSharedCache();
	}
}

30、只用关心将提交的路径mkdir，（即创建一个带有job_id的空文件夹）

uploadResourcesInternal(job, submitJobDir);

并且向集群提交代码的时候，如果是客户端模式，当前代码的Jar包是一定会被上传到yarn集群的。如果是local模式，jar包就在本地，不需要提交

31、

private void uploadResourcesInternal(Job job, Path submitJobDir)
  throws IOException {

  ……

	// Create the submission directory for the MapReduce job.
	submitJobDir = jtFs.makeQualified(submitJobDir);
	submitJobDir = new Path(submitJobDir.toUri().getPath());
	FsPermission mapredSysPerms =
	    new FsPermission(JobSubmissionFiles.JOB_DIR_PERMISSION);
	mkdirs(jtFs, submitJobDir, mapredSysPerms);
	
	uploadFiles(job, files, submitJobDir, mapredSysPerms, replication,
	    fileSCUploadPolicies, statCache);
	uploadLibJars(job, libjars, submitJobDir, mapredSysPerms, replication,
	    fileSCUploadPolicies, statCache);
	uploadArchives(job, archives, submitJobDir, mapredSysPerms, replication,
	    archiveSCUploadPolicies, statCache);
	uploadJobJar(job, jobJar, submitJobDir, replication, statCache);
	addLog4jToDistributedCache(job, submitJobDir);

  ……
 }

32、这一步是切片源码，下一节会单独讲这一块

int maps = writeSplits(job, submitJobDir);

maps = 1 文件很小，只切成一片
多了4个切片信息文件，主要：job.split

33、将切片个数 = MapTask个数赋值

conf.setInt(MRJobConfig.NUM_MAPS, maps);

NUM_MAPS = “mapreduce.job.maps” = 1

34、控制台打印切片数量

LOG.info("number of splits:" + maps);

2022-10-02 00:08:59,394 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - number of splits:1

35、确定提交的时候主要是提交job.xml和job.split，如果是yarn模式，还需要提交jar包

writeConf(conf, submitJobFile);

多了2个文件，主要：Job.xml 有整个任务运行时参数的默认值

<property><name>mapreduce.job.maxtaskfailures.per.tracker</name><value>3</value><final>false</final><source>mapred-default.xml</source></property>
<property><name>yarn.client.max-cached-nodemanagers-proxies</name><value>0</value><final>false</final><source>yarn-default.xml</source></property>
<property><name>mapreduce.job.speculative.retry-after-speculate</name><value>15000</value><final>false</final><source>mapred-default.xml</source></property>
……

1.1.5 完成后，state变成RUNNING

36、state = JobState.RUNNING;

1.2 监控并打印Job相关信息

37、 monitorAndPrintJob();
MR中的日志主要来源于这一步，并且删除6个临时文件

1.3 输出_success文件

38、return isSuccessful();
在这里插入图片描述

2、总结

建立连接
判断是本地环境还是yarn环境
提交代码
创建stag路径，用于临时缓存文件
创建JobID，把这个ID放在路径下面
如果是集群模式，会把jar拷贝给集群
开始切片，生产切片文件.split
提交.xml文件
提交作业，状态从DEFINE变成RUNNING

在这里插入图片描述

大数据之负

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
大数据—Hadoop（十一）_ MapReduce_04、核心框架原理_源码（1）_ Job提交流程

MapReduce将数据的计算，简单分成Map和Reduce两个阶段。Map阶段，将原本很大的数据集拆分成多个小份，在不同服务器上各个击破。Reduce阶段，则将原本小份的数据结果汇总，进一步计算，得到最终结果。
复制链接

扫一扫