接系列(二)。
介绍完ApplicationSubmissionContext之后,继续下去:
@Override
public YarnClientApplication createApplication()
throws YarnException, IOException {
ApplicationSubmissionContext context = Records.newRecord
(ApplicationSubmissionContext.class);
GetNewApplicationResponse newApp = getNewApplication();
ApplicationId appId = newApp.getApplicationId();
context.setApplicationId(appId);
return new YarnClientApplication(newApp, context);
}
其实这里有一个问题,需要深入探索下newRecord方法,此处就不多写了,另外撰文再说。
先看看getNewApplication方法:
private GetNewApplicationResponse getNewApplication()
throws YarnException, IOException {
GetNewApplicationRequest request =
Records.newRecord(GetNewApplicationRequest.class);
return rmClient.getNewApplication(request);
}
果然,不出意料,利用建立好的rpc连接来实现。
这里,补上一点东西,具体的rmClient的实现是什么?因为ApplicationClientProtocol有很多的实现类,在我们提交任务的过程中,具体是哪一个发挥了作用?
/**
* Delegate responsible for communicating with the Resource Manager's
* {@link ApplicationClientProtocol}.
*
* @param conf
* the configuration object.
*/
public ResourceMgrDelegate(YarnConfiguration conf) {
super(ResourceMgrDelegate.class.getName());
this.conf = conf;
this.client = YarnClient.createYarnClient();
init(conf);
start();
}
仔细看这个构造器,发现,这里传入的其实是YarnConfiguration,而不是我们前面所使用的Configuration,点进去看一下,铺天盖地全是相关的配置:
@Override
protected void serviceStart() throws Exception {
try {
rmClient = ClientRMProxy.createRMProxy(getConfig(),
ApplicationClientProtocol.class);
if (historyServiceEnabled) {
historyClient.start();
}
if (timelineServiceEnabled) {
timelineClient.start();
}
} catch (IOException e) {
throw new YarnRuntimeException(e);
}
super.serviceStart();
}
而这里,服务启动时候传入的,也是YarnConfiguration,看看这个createRMProxy方法:
/**
* Create a proxy for the specified protocol. For non-HA,
* this is a direct connection to the ResourceManager address. When HA is
* enabled, the proxy handles the failover between the ResourceManagers as
* well.
*/
@Private
protected static <T> T createRMProxy(final Configuration configuration,
final Class<T> protocol, RMProxy instance) throws IOException {
YarnConfiguration conf = (configuration instanceof YarnConfiguration)
? (YarnConfiguration) configuration
: new YarnConfiguration(configuration);
RetryPolicy retryPolicy = createRetryPolicy(conf);
if (HAUtil.isHAEnabled(conf)) {
RMFailoverProxyProvider<T> provider =
instance.createRMFailoverProxyProvider(conf, protocol);
return (T) RetryProxy.create(protocol, provider, retryPolicy);
} else {
InetSocketAddress rmAddress = instance.getRMAddress(conf, protocol);
LOG.info("Connecting to ResourceManager at " + rmAddress);
T proxy = RMProxy.<T>getProxy(conf, protocol, rmAddress);
return (T) RetryProxy.create(protocol, proxy, retryPolicy);
}
}
更加具体的实现,此处不予赘述了,因为YarnConfiguration的存在,最终加载了的的实现类是:
ApplicationClientProtocolPBClientImpl。
@Override
public GetNewApplicationResponse getNewApplication(
GetNewApplicationRequest request) throws YarnException,
IOException {
GetNewApplicationRequestProto requestProto =
((GetNewApplicationRequestPBImpl) request).getProto();
try {
return new GetNewApplicationResponsePBImpl(proxy.getNewApplication(null,
requestProto));
} catch (ServiceException e) {
RPCUtil.unwrapAndThrowException(e);
return null;
}
}
这里,我们能看到对request做了序列化,采用的是ProtocolBuffer的序列化机制。
到这里,client的代码告一段落,我们看下服务器端对应的处理是怎么做的:
/**
* The client interface to the Resource Manager. This module handles all the rpc
* interfaces to the resource manager from the client.
*/
public class ClientRMService extends AbstractService implements
ApplicationClientProtocol
这个类,在org.apache.yarn.server.resourcemanager下面,是ResourceManager对应处理的逻辑所在地,我们看下其中getNewApplication的处理机制。
@Override
public GetNewApplicationResponse getNewApplication(
GetNewApplicationRequest request) throws YarnException {
GetNewApplicationResponse response = recordFactory
.newRecordInstance(GetNewApplicationResponse.class);
response.setApplicationId(getNewApplicationId());
// Pick up min/max resource from scheduler...
response.setMaximumResourceCapability(scheduler
.getMaximumResourceCapability());
return response;
}
这里,看下getNewApplicationId方法:
ApplicationId getNewApplicationId() {
ApplicationId applicationId = org.apache.hadoop.yarn.server.utils.BuilderUtils
.newApplicationId(recordFactory, ResourceManager.getClusterTimeStamp(),
applicationCounter.incrementAndGet());
LOG.info("Allocated new applicationId: " + applicationId.getId());
return applicationId;
}
这样,对于新建的一个Application就获取到一个新的id了,
/**
* <p><code>ApplicationId</code> represents the <em>globally unique</em>
* identifier for an application.</p>
*
* <p>The globally unique nature of the identifier is achieved by using the
* <em>cluster timestamp</em> i.e. start-time of the
* <code>ResourceManager</code> along with a monotonically increasing counter
* for the application.</p>
*/
@Public
@Stable
public abstract class ApplicationId implements Comparable<ApplicationId>
这里,我们可以看下ApplicationId的生成逻辑,是如何生成一个全局唯一的ApplicationId:
@Private
@Unstable
public static ApplicationId newInstance(long clusterTimestamp, int id) {
ApplicationId appId = Records.newRecord(ApplicationId.class);
appId.setClusterTimestamp(clusterTimestamp);
appId.setId(id);
appId.build();
return appId;
}
逻辑很简单,重要的是依旧采用了protocol buffer的机制,就是其中的build方法,这里是2.x.x后续版本的一大特点,序列化采用了Google的Protocol Buffer机制。
执行完毕之后,客户端得到了一个全局唯一的ApplicationId,接下来,我们继续看JobSubmitter中submitJobInternal的逻辑:
到这里,JobClient与ResourceManager的交互逻辑介绍到这儿,那么,接下来该怎么做?
问题又来了,JobClient获取到了一个ApplicationId,那么,ResourceManager端,如何让该数据继续保持运行呢?其是如何继续申请资源,保持服务运行,一直到执行完毕的呢?
寻找一个切入点,琢磨琢磨。
status = submitClient.submitJob(
jobId, submitJobDir.toString(), job.getCredentials());
在获取唯一的ApplicationId之后,我们发现,另一个执行的逻辑,把文件的目录,提交给了ResourceManager,让其继续执行程序,我们看下这个方法:
/**
* Submit a Job for execution. Returns the latest profile for
* that job.
*/
public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts)
throws IOException, InterruptedException;
注释很简单,把作业目录,作业的Application以及credentials信息,提交给RM,先看下client这边的处理逻辑:
@Override
public JobStatus submitJob(JobID jobId, String jobSubmitDir, Credentials ts)
throws IOException, InterruptedException {
addHistoryToken(ts);
// Construct necessary information to start the MR AM
ApplicationSubmissionContext appContext = createApplicationSubmissionContext(
conf, jobSubmitDir, ts);
// Submit to ResourceManager
try {
ApplicationId applicationId = resMgrDelegate
.submitApplication(appContext);
ApplicationReport appMaster = resMgrDelegate
.getApplicationReport(applicationId);
String diagnostics = (appMaster == null ? "application report is null"
: appMaster.getDiagnostics());
if (appMaster == null
|| appMaster.getYarnApplicationState() == YarnApplicationState.FAILED
|| appMaster.getYarnApplicationState() == YarnApplicationState.KILLED) {
throw new IOException("Failed to run job : " + diagnostics);
}
return clientCache.getClient(jobId).getJobStatus(jobId);
} catch (YarnException e) {
throw new IOException(e);
}
}
这个方法内容比较重要,先来说下createApplicationSubmissionContext,这里面的代码非常多,位于YARNRunner,大家可以仔细看代码,这里挑选一些重要的来看:
插播一些内容;前文提到的第一次提交获取ApplicationId的时候,其实只提交了非常简单的ApplicationSubmissionContext的内容,告知ResourceManager,我这里有个新的任务需要提交,而这次,才是真正开始要开始任务的执行了,那么,需要上传的内容非常仔细了。
capability.setMemory(conf.getInt(MRJobConfig.MR_AM_VMEM_MB,
MRJobConfig.DEFAULT_MR_AM_VMEM_MB));
capability.setVirtualCores(conf.getInt(MRJobConfig.MR_AM_CPU_VCORES,
MRJobConfig.DEFAULT_MR_AM_CPU_VCORES));
这里有两个配置,分别是内存和需要的核数的配置,如果在命令行中指定的话,会写到conf中,并且在此处获取。
Path jobConfPath = new Path(jobSubmitDir, MRJobConfig.JOB_CONF_FILE);
提交文件的工作目录,这个目录的来源,可以追溯到JobSubmissionFiles中的getStagingDir方法:
/**
* Initializes the staging directory and returns the path. It also
* keeps track of all necessary ownership & permissions
* @param cluster
* @param conf
*/
public static Path getStagingDir(Cluster cluster, Configuration conf)
throws IOException,InterruptedException {
余下逻辑可以参照JobSubmitter中的代码,大致意思是把此次所需要提交的文件目录的地址获取到。
localResources.put(
MRJobConfig.JOB_CONF_FILE,
createApplicationResource(defaultFileContext, jobConfPath,
LocalResourceType.FILE));
通常来说,工作目录都是在HDFS上,然后加载到localResources中,这是个Map,没什么可说的。
// Setup the command to run the AM
List<String> vargs = new ArrayList<String>(8);
vargs.add(MRApps.crossPlatformifyMREnv(jobConf, Environment.JAVA_HOME)
+ "/bin/java");
// TODO: why do we use 'conf' some places and 'jobConf' others?
long logSize = jobConf.getLong(MRJobConfig.MR_AM_LOG_KB,
MRJobConfig.DEFAULT_MR_AM_LOG_KB) << 10;
String logLevel = jobConf.get(MRJobConfig.MR_AM_LOG_LEVEL,
MRJobConfig.DEFAULT_MR_AM_LOG_LEVEL);
int numBackups = jobConf.getInt(MRJobConfig.MR_AM_LOG_BACKUPS,
MRJobConfig.DEFAULT_MR_AM_LOG_BACKUPS);
MRApps.addLog4jSystemProperties(logLevel, logSize, numBackups, vargs,
conf);
// Check for Java Lib Path usage in MAP and REDUCE configs
warnForJavaLibPath(conf.get(MRJobConfig.MAP_JAVA_OPTS, ""), "map",
MRJobConfig.MAP_JAVA_OPTS, MRJobConfig.MAP_ENV);
warnForJavaLibPath(
conf.get(MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS, ""), "map",
MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS,
MRJobConfig.MAPRED_ADMIN_USER_ENV);
warnForJavaLibPath(conf.get(MRJobConfig.REDUCE_JAVA_OPTS, ""),
"reduce", MRJobConfig.REDUCE_JAVA_OPTS, MRJobConfig.REDUCE_ENV);
warnForJavaLibPath(
conf.get(MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS, ""),
"reduce", MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS,
MRJobConfig.MAPRED_ADMIN_USER_ENV);
// Add AM admin command opts before user command opts
// so that it can be overridden by user
String mrAppMasterAdminOptions = conf.get(
MRJobConfig.MR_AM_ADMIN_COMMAND_OPTS,
MRJobConfig.DEFAULT_MR_AM_ADMIN_COMMAND_OPTS);
warnForJavaLibPath(mrAppMasterAdminOptions, "app master",
MRJobConfig.MR_AM_ADMIN_COMMAND_OPTS,
MRJobConfig.MR_AM_ADMIN_USER_ENV);
vargs.add(mrAppMasterAdminOptions);
// Add AM user command opts
String mrAppMasterUserOptions = conf.get(
MRJobConfig.MR_AM_COMMAND_OPTS,
MRJobConfig.DEFAULT_MR_AM_COMMAND_OPTS);
warnForJavaLibPath(mrAppMasterUserOptions, "app master",
MRJobConfig.MR_AM_COMMAND_OPTS, MRJobConfig.MR_AM_ENV);
vargs.add(mrAppMasterUserOptions);
这一段代码,根据Conf和Job自身的Conf,设置好ApplicationMaster启动的相应参数,这里面有一句:
vargs.add(MRJobConfig.APPLICATION_MASTER_CLASS);
这里添加的是:org.apache.hadoop.mapreduce.v2.app.MRAppMaster
这里就牵涉到系列一中提到的ApplicationMaster,对于每个提交的作业来说,都有其ApplicationMaster,我们在启动程序的时候可以注意到这句话:
LOG.debug("Command to launch container for ApplicationMaster is : "
+ mergedCommand);
日志中会打印执行的命令。
ContainerLaunchContext amContainer = ContainerLaunchContext
.newInstance(localResources, environment, vargsFinal, null,
securityTokens, acls);
这里,必须重点看一下,我们需要看看这个类和这个方法:
/**
* <p><code>ContainerLaunchContext</code> represents all of the information
* needed by the <code>NodeManager</code> to launch a container.</p>
*
* <p>It includes details such as:
* <ul>
* <li>{@link ContainerId} of the container.</li>
* <li>{@link Resource} allocated to the container.</li>
* <li>User to whom the container is allocated.</li>
* <li>Security tokens (if security is enabled).</li>
* <li>
* {@link LocalResource} necessary for running the container such
* as binaries, jar, shared-objects, side-files etc.
* </li>
* <li>Optional, application-specific binary service data.</li>
* <li>Environment variables for the launched process.</li>
* <li>Command to launch the container.</li>
* </ul>
* </p>
*
* @see ContainerManagementProtocol#startContainers(org.apache.hadoop.yarn.api.protocolrecords.StartContainersRequest)
*/
@Public
@Stable
public abstract class ContainerLaunchContext
看看这个类的注释,其中包含了所有的信息,可以让NodeManager来提供一个Container,来把ApplicationMaster启动起来。
看到这里的代码,其实更好理解Container的动态划分的概念了,对于Container资源的占用,其实是通过服务启动来占用的,我们启动了一个ApplicationMaster,来实现了资源的占用。
最后,我们建立了自己的ApplicationMaster启动的相关参数,并且定义了一个Container,然后返回了一个ApplicationSubmissionContext。
然后,通过RPC把这个ApplicationSubmissionContext提交给RM。
@Override
public SubmitApplicationResponse submitApplication(
SubmitApplicationRequest request) throws YarnException,
IOException {
SubmitApplicationRequestProto requestProto =
((SubmitApplicationRequestPBImpl) request).getProto();
try {
return new SubmitApplicationResponsePBImpl(proxy.submitApplication(null,
requestProto));
} catch (ServiceException e) {
RPCUtil.unwrapAndThrowException(e);
return null;
}
}
代码很清晰,我们看下RM端是如何对这个ApplicationSubmissionContext处理的。
@Override
public SubmitApplicationResponse submitApplication(
SubmitApplicationRequest request) throws YarnException {
ApplicationSubmissionContext submissionContext = request
.getApplicationSubmissionContext();
ApplicationId applicationId = submissionContext.getApplicationId();
// ApplicationSubmissionContext needs to be validated for safety - only
// those fields that are independent of the RM's configuration will be
// checked here, those that are dependent on RM configuration are validated
// in RMAppManager.
String user = null;
try {
// Safety
user = UserGroupInformation.getCurrentUser().getShortUserName();
} catch (IOException ie) {
LOG.warn("Unable to get the current user.", ie);
RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
ie.getMessage(), "ClientRMService",
"Exception in submitting application", applicationId);
throw RPCUtil.getRemoteException(ie);
}
// Check whether app has already been put into rmContext,
// If it is, simply return the response
if (rmContext.getRMApps().get(applicationId) != null) {
LOG.info("This is an earlier submitted application: " + applicationId);
return SubmitApplicationResponse.newInstance();
}
if (submissionContext.getQueue() == null) {
submissionContext.setQueue(YarnConfiguration.DEFAULT_QUEUE_NAME);
}
if (submissionContext.getApplicationName() == null) {
submissionContext.setApplicationName(
YarnConfiguration.DEFAULT_APPLICATION_NAME);
}
if (submissionContext.getApplicationType() == null) {
submissionContext
.setApplicationType(YarnConfiguration.DEFAULT_APPLICATION_TYPE);
} else {
if (submissionContext.getApplicationType().length() > YarnConfiguration.APPLICATION_TYPE_LENGTH) {
submissionContext.setApplicationType(submissionContext
.getApplicationType().substring(0,
YarnConfiguration.APPLICATION_TYPE_LENGTH));
}
}
try {
// call RMAppManager to submit application directly
rmAppManager.submitApplication(submissionContext,
System.currentTimeMillis(), user);
LOG.info("Application with id " + applicationId.getId() +
" submitted by user " + user);
RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST,
"ClientRMService", applicationId);
} catch (YarnException e) {
LOG.info("Exception in submitting application with id " +
applicationId.getId(), e);
RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
e.getMessage(), "ClientRMService",
"Exception in submitting application", applicationId);
throw e;
}
SubmitApplicationResponse response = recordFactory
.newRecordInstance(SubmitApplicationResponse.class);
return response;
}
经过了一系列的检验,我们把注意力集中在rmAppManager提交的操作上。
这里的RMAppManager中,记录了所有提交给RM的Application。
而具体的,RM如何与NM沟通,并把ApplicationMaster在NM上的Container启动起来,下文再叙。