0x00 系列文章目录
- 源码走读-Yarn-ResourceManager01-基础概念
- 源码走读-Yarn-ResourceManager02-RM的启动-脚本
- 源码走读-Yarn-ResourceManager03-RM的启动之RM详解
- 源码走读-Yarn-ResourceManager04-RM调度之FairScheduler
- 源码走读-Yarn-ResourceManager05-MR任务提交-客户端侧分析
- 源码走读-Yarn-ResourceManager06-MR任务提交-服务端分析
- 源码走读-Yarn-ResourceManager07-ShutdownHookManager
- 源码走读-Yarn-ResourceManager08-总结
0x05 RM调度-MR任务提交-客户端侧分析
5.1 mapreduce.job
org.apache.hadoop.mapreduce.Job
我们都知道,MR任务的一般结尾会有一句话是job.waitForCompletion(true)
,这行代码意思是提交任务并等待结束。我们的分析就从这里入手:
public boolean waitForCompletion(boolean verbose
) throws IOException, InterruptedException,
ClassNotFoundException {
if (state == Job.JobState.DEFINE) {
//提交任务
submit();
}
if (verbose) {
//监控任务执行,持续打印输出,直到任务完成(成功或失败)
this.monitorAndPrintJob();
} else {
// get the completion poll interval from the client.
int completionPollIntervalMillis = getCompletionPollInterval(this.cluster.getConf());
while (!isComplete()) {
try {
Thread.sleep(completionPollIntervalMillis);
} catch (InterruptedException ie) {
}
}
}
return isSuccessful();
}
下面看看submit
方法:
// 提交任务都集群,然后立刻返回
public void submit()
throws IOException, InterruptedException, ClassNotFoundException {
ensureState(JobState.DEFINE);
setUseNewAPI();
connect();
// 创建JobSubmitter
final JobSubmitter submitter =
getJobSubmitter(cluster.getFileSystem(), cluster.getClient());
status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() {
public JobStatus run() throws IOException, InterruptedException,
ClassNotFoundException {
// 提交job
return submitter.submitJobInternal(Job.this, cluster);
}
});
state = JobState.RUNNING;
LOG.info("The url to track the job: " + getTrackingURL());
}
可以看到使用了JobSubmitter
类,下面接着看。
5.2 JobSubmitter
org.apache.hadoop.mapreduce.JobSubmitter
接着看submitter.submitJobInternal
方法。由于里面代码太多,我这里只写出最关键的几句:
// 获取jobId
JobID jobId = submitClient.getNewJobID();
job.setJobID(jobId);
// 上传程序Jar、files、依赖的libJar、archive等到HDFS
copyAndConfigureFiles(job, submitJobDir);
// 根据Job输入文件来计算切片,生成切片规划文件并写入staging dir,根据切片数计算map数
// split的时候会用具体的InputFormat的实现类TextInputFormat来读取文件进行划分
// 具体来说,在split时会调用TextInputFormat的getSplits方法
// 拿到的划分后的文件,会获取到文件大小、主机地址、是否在内存等信息
// 分片信息会写入HDFS,以便后续map任务使用
int maps = writeSplits(job, submitJobDir);
// 然后就根据split数确认了map任务数量
conf.setInt(MRJobConfig.NUM_MAPS, maps);
// 把job所有配置信息写入staging文件
writeConf(conf, submitJobFile);
// 真正开始提交job,并拿到提交状态返回
status = this.submitClient.submitJob(jobId, submitJobDir.toString(), job.getCredentials());
可以看到,这里先获取jobID把,再job提交到了submitClient
,他是一个ClientProtocol
接口的实现类。上面的代码看了以后我们知道,现在的主要工作是两步:获取JobId然后提交Job。下面我们分开讲下这两个流程:因为我们是提交到Yarn跑任务的,所以实际使用的是YARNRunner
。
5.3 获取JobID
5.3.1 YARNRunner
org.apache.hadoop.mapred.YARNRunner
先看看前面使用的submitClient.getNewJobID()
:
//这里是调用了resMgrDelegate.getNewJobID来获取jobId
@Override
public JobID getNewJobID() throws IOException, InterruptedException {
return resMgrDelegate.getNewJobID();
}
上面我们看到是用了resMgrDelegate
,那继续看看这个ResourceMgrDelegate
是啥:
5.3.2 ResourceMgrDelegate
ResourceMgrDelegate
的部分代码:
public class ResourceMgrDelegate extends YarnClient{
private YarnConfiguration conf;
private ApplicationSubmissionContext application;
private ApplicationId applicationId;
@Private
@VisibleForTesting
protected YarnClient client;
private Text rmDTService;
/**
* Delegate responsible for communicating with the Resource Manager's
* {@link ApplicationClientProtocol}.
* 被委托负责用ApplicationClientProtocol协议和RM通信交互
* @param conf the configuration object.
*/
public ResourceMgrDelegate(YarnConfiguration conf) {
super(ResourceMgrDelegate.class.getName());
this.conf = conf;
this.client = YarnClient.createYarnClient();
init(conf);
start();
}
@Override
protected void serviceInit(Configuration conf) throws Exception {
client.init(conf);
super.serviceInit(conf);
}
@Override
protected void serviceStart() throws Exception {
client.start();
super.serviceStart();
}
@Override
protected void serviceStop() throws Exception {
client.stop();
super.serviceStop();
}
//获取一个新的JobId
public JobID getNewJobID() throws IOException, InterruptedException {
try {
// 这里的client是一个YarnClientImpl实例
// 获取一个ApplicationSubmissionContext
this.application = client.createApplication().getApplicationSubmissionContext();
// 获取applicationId
this.applicationId = this.application.getApplicationId();
return TypeConverter.fromYarn(applicationId);
} catch (YarnException e) {
throw new IOException(e);
}
}
}
看到serviceInit
serviceStart
等方法有没有很熟悉?对他的父类YarnClient
就是继承自AbstractService
。
这里有用YarnClientImpl
和ApplicationSubmissionContext
,我们在后面介绍。
5.3.3 YarnClientImpl
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl
我们先看看前面提到的ResourceMgrDelegate
中用的client.createApplication
方法:
@Override
public YarnClientApplication createApplication()
throws YarnException, IOException {
// 生成一个ApplicationSubmissionContextPBImpl实例
// ApplicationSubmissionContext 表示所有RM为应用程序启动AM所需的所有信息
ApplicationSubmissionContext context = Records.newRecord
(ApplicationSubmissionContext.class);
// 获得一个新应用程序的返回信息
GetNewApplicationResponse newApp = getNewApplication();
ApplicationId appId = newApp.getApplicationId();
// 将appId保存到ApplicationSubmissionContext中
context.setApplicationId(appId);
// 拿到appId后实例化一个封装了applictionResponse和context的对象
return new YarnClientApplication(newApp, context);
}
接着看看getNewApplication
方法:
private GetNewApplicationResponse getNewApplication()
throws YarnException, IOException {
GetNewApplicationRequest request =
Records.newRecord(GetNewApplicationRequest.class);
// 这里的rmClient是关键
return rmClient.getNewApplication(request);
}
上面方法中的rmClient
是实现了ApplicationClientProtocol
接口的类,下面看看rmClient.getNewApplication
方法。
5.3.4 ApplicationClientProtocolPBClientImpl
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl
/**
* 这个方法继承自ApplicationClientProtocol,注释如下:
* 这个方法是客户端用来获取一个新的ApplicationId,然后用它来提交新的application
* 该方法执行后,RM会在返回一个GetNewApplicationResponse,
* 他包含了一个新的、单调递增的ApplicationId以及一些详细的集群信息如最大资源容量
*
* 也就是说这里只是获取appId,不会真正执行app
*/
@Override
public GetNewApplicationResponse getNewApplication(
GetNewApplicationRequest request) throws YarnException,
IOException {
GetNewApplicationRequestProto requestProto =
((GetNewApplicationRequestPBImpl) request).getProto();
try {
return new GetNewApplicationResponsePBImpl(proxy.getNewApplication(null,
requestProto));
} catch (ServiceException e) {
RPCUtil.unwrapAndThrowException(e);
return null;
}
}
到这里,客户端侧的获取JobId流程就介绍完了。下面,我们接着讲SubmitJob流程。
5.4 提交Job
5.4.1 YARNRunner
下面接着看JobSubmitter
中调用的YARNRunner的submitJob
方法:
@Override
public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts)
throws IOException, InterruptedException {
addHistoryToken(ts);
// 将所需信息构建为appContext,来为开启 MR 的 ApplicationMaster做准备
ApplicationSubmissionContext appContext =
createApplicationSubmissionContext(conf, jobSubmitDir, ts);
// 提交给ResourceManager
try {
//这里就是刚才ResourceMgrDelegate.getNewJobId获取到的applicationId
ApplicationId applicationId =
resMgrDelegate.submitApplication(appContext);
// appMaster信息报告
ApplicationReport appMaster = resMgrDelegate
.getApplicationReport(applicationId);
String diagnostics =
(appMaster == null ?
"application report is null" : appMaster.getDiagnostics());
if (appMaster == null
|| appMaster.getYarnApplicationState() == YarnApplicationState.FAILED
|| appMaster.getYarnApplicationState() == YarnApplicationState.KILLED) {
throw new IOException("Failed to run job : " +
diagnostics);
}
return clientCache.getClient(jobId).getJobStatus(jobId);
} catch (YarnException e) {
throw new IOException(e);
}
}
到这里,就介绍完了YARNRunner
中的submitJob
两个方法,依然是mapreduce
包中的代码。下面的开始进入yarn
包。
5.4.2 YarnClientImpl
直接看submitApplication
方法:
@Override
public ApplicationId submitApplication(ApplicationSubmissionContext appContext)
throws YarnException, IOException {
ApplicationId applicationId = appContext.getApplicationId();
if (applicationId == null) {
throw new ApplicationIdNotProvidedException(
"ApplicationId is not provided in ApplicationSubmissionContext");
}
SubmitApplicationRequest request =
Records.newRecord(SubmitApplicationRequest.class);
request.setApplicationSubmissionContext(appContext);
// Automatically add the timeline DT into the CLC
// Only when the security and the timeline service are both enabled
if (isSecurityEnabled() && timelineServiceEnabled) {
addTimelineDelegationToken(appContext.getAMContainerSpec());
}
// 提交application的请求
rmClient.submitApplication(request);
int pollCount = 0;
long startTime = System.currentTimeMillis();
EnumSet<YarnApplicationState> waitingStates =
EnumSet.of(YarnApplicationState.NEW,
YarnApplicationState.NEW_SAVING,
YarnApplicationState.SUBMITTED);
EnumSet<YarnApplicationState> failToSubmitStates =
EnumSet.of(YarnApplicationState.FAILED,
YarnApplicationState.KILLED);
while (true) {
try {
ApplicationReport appReport = getApplicationReport(applicationId);
YarnApplicationState state = appReport.getYarnApplicationState();
if (!waitingStates.contains(state)) {
if(failToSubmitStates.contains(state)) {
throw new YarnException("Failed to submit " + applicationId +
" to YARN : " + appReport.getDiagnostics());
}
LOG.info("Submitted application " + applicationId);
break;
}
long elapsedMillis = System.currentTimeMillis() - startTime;
if (enforceAsyncAPITimeout() &&
elapsedMillis >= asyncApiPollTimeoutMillis) {
throw new YarnException("Timed out while waiting for application " +
applicationId + " to be submitted successfully");
}
// Notify the client through the log every 10 poll, in case the client
// is blocked here too long.
if (++pollCount % 10 == 0) {
LOG.info("Application submission is not finished, " +
"submitted application " + applicationId +
" is still in " + state);
}
try {
Thread.sleep(submitPollIntervalMillis);
} catch (InterruptedException ie) {
LOG.error("Interrupted while waiting for application "
+ applicationId
+ " to be successfully submitted.");
}
} catch (ApplicationNotFoundException ex) {
// FailOver or RM restart happens before RMStateStore saves
// ApplicationState
LOG.info("Re-submit application " + applicationId + "with the " +
"same ApplicationSubmissionContext");
rmClient.submitApplication(request);
}
}
return applicationId;
}
5.4.3 ApplicationClientProtocolPBClientImpl
@Override
public SubmitApplicationResponse submitApplication(
SubmitApplicationRequest request) throws YarnException,
IOException {
SubmitApplicationRequestProto requestProto =
((SubmitApplicationRequestPBImpl) request).getProto();
try {
return new SubmitApplicationResponsePBImpl(proxy.submitApplication(null,
requestProto));
} catch (ServiceException e) {
RPCUtil.unwrapAndThrowException(e);
return null;
}
}
5.5 小结
本章主要通过一个Map任务的提交过程,来分析客户端侧的源码。
下一章,我们会进入服务端代码继续分析任务提交过程-源码走读-Yarn-ResourceManager06-MR任务提交-服务端分析