Hadoop源码解读(Job提交)
-
Job提交入口
boolean flag = job.waitForCompletion(true);
-
进入waitForCompletion(true)方法
if (state == JobState.DEFINE) { submit(); }
判断当前的Job状态是否为DEFINE,如果是DEFINE状态就进入submit()方法。
-
进入submit()方法,这个方法比较重要,我们详细解读下:
下面是submit()里面的具体代码:
public void submit() throws IOException, InterruptedException, ClassNotFoundException { ensureState(JobState.DEFINE); setUseNewAPI(); connect(); final JobSubmitter submitter = getJobSubmitter(cluster.getFileSystem(), cluster.getClient()); status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() { public JobStatus run() throws IOException, InterruptedException, ClassNotFoundException { return submitter.submitJobInternal(Job.this, cluster); } }); state = JobState.RUNNING; LOG.info("The url to track the job: " + getTrackingURL()); }
-
ensureState(JobState.DEFINE);再次确认当前的Job状态是否为DEFINE
-
setUseNewAPI();使用新的hadoopAPI
-
if (conf.getUseNewMapper()) { String mode = "new map API"; ensureNotSet("mapred.input.format.class", mode); ensureNotSet(oldMapperClass, mode); if (numReduces != 0) { ensureNotSet("mapred.partitioner.class", mode); } else { ensureNotSet("mapred.output.format.class", mode); } }
- 进入connect()方法中,此处是重点,这个方法目的是创建Cluster对象
private synchronized void connect() throws IOException, InterruptedException, ClassNotFoundException { if (cluster == null) { cluster = ugi.doAs(new PrivilegedExceptionAction<Cluster>() { public Cluster run() throws IOException, InterruptedException, ClassNotFoundException { return new Cluster(getConfiguration()); } }); } }
return new Cluster(getConfiguration());就是来创建Cluster对象,可以debug到这个方法里面观察它如何创建Cluster。
-
Cluster的构造方法里面调用了初始化一些参数的方法。即initialize(jobTrackAddr, conf)
下面是Cluster的构造方法和构造方法中初始化一些参数的方法,我主要挑几个重点的谈一下。
-
ClientProtocolProvider provider:是客户端协议的提供者,简单来说就是你的Job运行时的环境的提供者,hadoop程序跑的时候,主要在两个环境,一个是本地环境,另一个是集群,也就是yarn环境。而这个ClientProtocolProvider就可以提供现在你跑的环境。
-
ClientProtocol clientProtocol就是真正的你hadoop跑的环境,是本地还是yarn,这个是由ClientProtocolProvider 提供的,也就是下面的clientProtocol = provider.create(conf)。下图可以看到,ClientProtocol的两种LocalJobRunner和YARNRunner分别对应本地和yarn环境。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Bp5EvxQx-1574323266384)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1574318978532.png)]
public Cluster(InetSocketAddress jobTrackAddr, Configuration conf) throws IOException { this.conf = conf; this.ugi = UserGroupInformation.getCurrentUser(); initialize(jobTrackAddr, conf); } private void initialize(InetSocketAddress jobTrackAddr, Configuration conf) throws IOException { synchronized (frameworkLoader) { for (ClientProtocolProvider provider : frameworkLoader) { LOG.debug("Trying ClientProtocolProvider : " + provider.getClass().getName()); ClientProtocol clientProtocol = null; try { if (jobTrackAddr == null) { clientProtocol = provider.create(conf); } else { clientProtocol = provider.create(jobTrackAddr, conf); } if (clientProtocol != null) { clientProtocolProvider = provider; client = clientProtocol; LOG.debug("Picked " + provider.getClass().getName() + " as the ClientProtocolProvider"); break; } else { LOG.debug("Cannot pick " + provider.getClass().getName() + " as the ClientProtocolProvider - returned null protocol"); } } catch (Exception e) { LOG.info("Failed to use " + provider.getClass().getName() + " due to error: ", e); } } }
-
-
通过JobSumitter提交Job。将Job和集群Cluster对象传入。这个方法也是十分重要的。我们需要仔细研究。所以进入submitter.submitJobInternal(Job.this, cluster)中
-
status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() { public JobStatus run() throws IOException, InterruptedException, ClassNotFoundException { return submitter.submitJobInternal(Job.this, cluster); } });
以下为return submitter.submitJobInternal(Job.this, cluster);的具体代码,我们详细解读:
JobStatus submitJobInternal(Job job, Cluster cluster) throws ClassNotFoundException, InterruptedException, IOException { //validate the jobs output specs checkSpecs(job); Configuration conf = job.getConfiguration(); addMRFrameworkToDistributedCache(conf); Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf); //configure the command line options correctly on the submitting dfs InetAddress ip = InetAddress.getLocalHost(); if (ip != null) { submitHostAddress = ip.getHostAddress(); submitHostName = ip.getHostName(); conf.set(MRJobConfig.JOB_SUBMITHOST,submitHostName); conf.set(MRJobConfig.JOB_SUBMITHOSTADDR,submitHostAddress); } JobID jobId = submitClient.getNewJobID(); job.setJobID(jobId); Path submitJobDir = new Path(jobStagingArea, jobId.toString()); JobStatus status = null; try { conf.set(MRJobConfig.USER_NAME, UserGroupInformation.getCurrentUser().getShortUserName()); conf.set("hadoop.http.filter.initializers", "org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer"); conf.set(MRJobConfig.MAPREDUCE_JOB_DIR, submitJobDir.toString()); LOG.debug("Configuring job " + jobId + " with " + submitJobDir + " as the submit dir"); // get delegation token for the dir TokenCache.obtainTokensForNamenodes(job.getCredentials(), new Path[] { submitJobDir }, conf); populateTokenCache(conf, job.getCredentials()); // generate a secret to authenticate shuffle transfers if (TokenCache.getShuffleSecretKey(job.getCredentials()) == null) { KeyGenerator keyGen; try { keyGen = KeyGenerator.getInstance(SHUFFLE_KEYGEN_ALGORITHM); keyGen.init(SHUFFLE_KEY_LENGTH); } catch (NoSuchAlgorithmException e) { throw new IOException("Error generating shuffle secret key", e); } SecretKey shuffleKey = keyGen.generateKey(); TokenCache.setShuffleSecretKey(shuffleKey.getEncoded(), job.getCredentials()); } if (CryptoUtils.isEncryptedSpillEnabled(conf)) { conf.setInt(MRJobConfig.MR_AM_MAX_ATTEMPTS, 1); LOG.warn("Max job attempts set to 1 since encrypted intermediate" + "data spill is enabled"); } copyAndConfigureFiles(job, submitJobDir); Path submitJobFile = JobSubmissionFiles.getJobConfPath(submitJobDir); // Create the splits for the job LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir)); int maps = writeSplits(job, submitJobDir); conf.setInt(MRJobConfig.NUM_MAPS, maps); LOG.info("number of splits:" + maps); // write "queue admins of the queue to which job is being submitted" // to job file. String queue = conf.get(MRJobConfig.QUEUE_NAME, JobConf.DEFAULT_QUEUE_NAME); AccessControlList acl = submitClient.getQueueAdmins(queue); conf.set(toFullPropertyName(queue, QueueACL.ADMINISTER_JOBS.getAclName()), acl.getAclString()); // removing jobtoken referrals before copying the jobconf to HDFS // as the tasks don't need this setting, actually they may break // because of it if present as the referral will point to a // different job. TokenCache.cleanUpTokenReferral(conf); if (conf.getBoolean( MRJobConfig.JOB_TOKEN_TRACKING_IDS_ENABLED, MRJobConfig.DEFAULT_JOB_TOKEN_TRACKING_IDS_ENABLED)) { // Add HDFS tracking ids ArrayList<String> trackingIds = new ArrayList<String>(); for (Token<? extends TokenIdentifier> t : job.getCredentials().getAllTokens()) { trackingIds.add(t.decodeIdentifier().getTrackingId()); } conf.setStrings(MRJobConfig.JOB_TOKEN_TRACKING_IDS, trackingIds.toArray(new String[trackingIds.size()])); } // Set reservation info if it exists ReservationId reservationId = job.getReservationId(); if (reservationId != null) { conf.set(MRJobConfig.RESERVATION_ID, reservationId.toString()); } // Write job file to submit dir writeConf(conf, submitJobFile); // // Now, actually submit the job (using the submit name) // printTokens(jobId, job.getCredentials()); status = submitClient.submitJob( jobId, submitJobDir.toString(), job.getCredentials()); if (status != null) { return status; } else { throw new IOException("Could not launch job"); } } finally { if (status == null) { LOG.info("Cleaning up the staging area " + submitJobDir); if (jtFs != null && submitJobDir != null) jtFs.delete(submitJobDir, true); } } }
-
checkSpecs(job);这个方法主要是检查你输出路径是否设置,或者输出路径是否存在,如果有这两个问题,都会直接抛出异常
checkSpecs(job);
-
// Ensure that the output directory is set and not already there Path outDir = getOutputPath(job); if (outDir == null) { throw new InvalidJobConfException("Output directory not set."); } // get delegation token for outDir's file system TokenCache.obtainTokensForNamenodes(job.getCredentials(), new Path[] { outDir }, job.getConfiguration()); if (outDir.getFileSystem(job.getConfiguration()).exists(outDir)) { throw new FileAlreadyExistsException("Output directory " + outDir + " already exists"); }
- 下面的代码生成Job工作目录,可以进到这个方法中详细看看是如何生成这个Job工作目录的
Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
-
此代码为详细的实现代码
public static Path getStagingDir(Cluster cluster, Configuration conf) throws IOException,InterruptedException { Path stagingArea = cluster.getStagingAreaDir(); FileSystem fs = stagingArea.getFileSystem(conf); String realUser; String currentUser; UserGroupInformation ugi = UserGroupInformation.getLoginUser(); realUser = ugi.getShortUserName(); currentUser = UserGroupInformation.getCurrentUser().getShortUserName(); if (fs.exists(stagingArea)) { FileStatus fsStatus = fs.getFileStatus(stagingArea); String owner = fsStatus.getOwner(); if (!(owner.equals(currentUser) || owner.equals(realUser))) { throw new IOException("The ownership on the staging directory " + stagingArea + " is not as expected. " + "It is owned by " + owner + ". The directory must " + "be owned by the submitter " + currentUser + " or " + "by " + realUser); } if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) { LOG.info("Permissions on staging directory " + stagingArea + " are " + "incorrect: " + fsStatus.getPermission() + ". Fixing permissions " + "to correct value " + JOB_DIR_PERMISSION); fs.setPermission(stagingArea, JOB_DIR_PERMISSION); } } else { fs.mkdirs(stagingArea, new FsPermission(JOB_DIR_PERMISSION)); } return stagingArea; }
-
Path stagingArea = cluster.getStagingAreaDir();生成Job工作目录的文件夹
-
FileSystem fs = stagingArea.getFileSystem(conf);获取到hdfs的文件系统,方便后续操作hdfs
-
因为Job一开始并没有Job的工作目录,所以走的是else,
即 fs.mkdirs(stagingArea, new FsPermission(JOB_DIR_PERMISSION));
创建了Job工作目录。因为我是在本地运行的。所以在我的window中找到了Job工作目录
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-OoLJygnS-1574323266385)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1574320773479.png)]
-
JobID jobId = submitClient.getNewJobID();生成你当前这个Job的jobId
-
Path submitJobDir = new Path(jobStagingArea, jobId.toString());将jobStagingArea和 jobId拼接,即将Job的工作目录和你当前这个Job的Id拼接组合,在这个文件夹里,专门存放你当前这个Job所需要的一些配置文件,切片信息,jar包等等(不过这个文件夹还未创建)
-
copyAndConfigureFiles(job, submitJobDir);是真正将之前拼接好的当前这个Job工作的目录创建出来
以下为创建出来的文件夹
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xboePVIY-1574323266385)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1574321415086.png)]
-
int maps = writeSplits(job, submitJobDir);生成切片信息,返回切片的数量并且将切片的文件放入。不过这部分比较复杂,我会在后续的文章中详细解读。
-
writeConf(conf, submitJobFile);将job执行的配置信息写入到工作目录中
写出的结果如下图
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-QgoffE7Y-1574323266386)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1574322560840.png)]
- status = submitClient.submitJob(jobId, submitJobDir.toString(), job.getCredentials());
真正提交Job
- jtFs.delete(submitJobDir, true);Job执行结束后删除当前Job的工作目录
-
-
bDir);生成切片信息,返回切片的数量并且将切片的文件放入。不过这部分比较复杂,我会在后续的文章中详细解读。
7. writeConf(conf, submitJobFile);将job执行的配置信息写入到工作目录中
写出的结果如下图
[外链图片转存中...(img-QgoffE7Y-1574323266386)]
8. status = submitClient.submitJob(jobId, submitJobDir.toString(), job.getCredentials());
真正提交Job
9. jtFs.delete(submitJobDir, true);Job执行结束后删除当前Job的工作目录
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WdtT0Vla-1574323266386)(C:\Users\Administrator\AppData\Roaming\Typora\typora-user-images\1574322825425.png)]