概述
YarnClient通过ApplicationClientProtocol与ResourceManager通信。YarnClient通过它完成向RM提交应用程序、查看应用程序状态、控制应用程序(杀死)等。在ResourceManager中,负责与YarnClient通信的组件是ClientRMService。如果是submitApplication等通信,ClientRMService再转交给RMAppMamger。
YarnClient 提交应用
submitApplication()方法会提交一个应用给yarn,这是一个阻塞调用。也就是说只有当应用成功提交给ResourceManager时才会返回ApplicationId。
当提交应用后,它在内部会调用ApplicationClientProtocol#getApplicationReport()阻塞等待,直至应用成功提交。如果RM fail或者RM restart,getApplicationReport()会抛出ApplicationNotFoundException。submitApplication() API会重提交这个应用。
YarnClient interface
/**
* <p>
* Submit a new application to <code>YARN.</code> It is a blocking call - it
* will not return {@link ApplicationId} until the submitted application is
* submitted successfully and accepted by the ResourceManager.
* </p>
*
* <p>
* Users should provide an {@link ApplicationId} as part of the parameter
* {@link ApplicationSubmissionContext} when submitting a new application,
* otherwise it will throw the {@link ApplicationIdNotProvidedException}.
* </p>
*
* <p>This internally calls {@link ApplicationClientProtocol#submitApplication
* (SubmitApplicationRequest)}, and after that, it internally invokes
* {@link ApplicationClientProtocol#getApplicationReport
* (GetApplicationReportRequest)} and waits till it can make sure that the
* application gets properly submitted. If RM fails over or RM restart
* happens before ResourceManager saves the application's state,
* {@link ApplicationClientProtocol
* #getApplicationReport(GetApplicationReportRequest)} will throw
* the {@link ApplicationNotFoundException}. This API automatically resubmits
* the application with the same {@link ApplicationSubmissionContext} when it
* catches the {@link ApplicationNotFoundException}</p>
*
* @param appContext
* {@link ApplicationSubmissionContext} containing all the details
* needed to submit a new application
* @return {@link ApplicationId} of the accepted application
* @throws YarnException
* @throws IOException
* @see #createApplication()
*/
public abstract ApplicationId submitApplication(
ApplicationSubmissionContext appContext) throws YarnException,
IOException;
YarnClientImpl
//通过ApplicatonClientProtocol发起提交应用请求
protected ApplicationClientProtocol rmClient;
@Override
public ApplicationId
submitApplication(ApplicationSubmissionContext appContext)
throws YarnException, IOException {
ApplicationId applicationId = appContext.getApplicationId();
if (applicationId == null) {
throw new ApplicationIdNotProvidedException(
"ApplicationId is not provided in ApplicationSubmissionContext");
}
SubmitApplicationRequest request =
Records.newRecord(SubmitApplicationRequest.class);
request.setApplicationSubmissionContext(appContext);
// Automatically add the timeline DT into the CLC
// Only when the security and the timeline service are both enabled
if (isSecurityEnabled() && timelineServiceEnabled) {
addTimelineDelegationToken(appContext.getAMContainerSpec());
}
//TODO: YARN-1763:Handle RM failovers during the submitApplication call.
rmClient.submitApplication(request);
int pollCount = 0;
long startTime = System.currentTimeMillis();
while (true) {
try {
YarnApplicationState state =
getApplicationReport(applicationId).getYarnApplicationState();
if (!state.equals(YarnApplicationState.NEW) &&
!state.equals(YarnApplicationState.NEW_SAVING)) {
LOG.info("Submitted application " + applicationId);
break;
}
long elapsedMillis = System.currentTimeMillis() - startTime;
if (enforceAsyncAPITimeout() &&
elapsedMillis >= asyncApiPollTimeoutMillis) {
throw new YarnException("Timed out while waiting for application " +
applicationId + " to be submitted successfully");
}
// Notify the client through the log every 10 poll, in case the client
// is blocked here too long.
if (++pollCount % 10 == 0) {
LOG.info("Application submission is not finished, " +
"submitted application " + applicationId +
" is still in " + state);
}
try {
Thread.sleep(submitPollIntervalMillis);
} catch (InterruptedException ie) {
LOG.error("Interrupted while waiting for application "
+ applicationId
+ " to be successfully submitted.");
}
} catch (ApplicationNotFoundException ex) {
// FailOver or RM restart happens before RMStateStore saves
// ApplicationState
LOG.info("Re-submit application " + applicationId + "with the " +
"same ApplicationSubmissionContext");
rmClient.submitApplication(request);
}
}
return applicationId;
}
ClientRMService处理submit application
ClientRMService继承了ApplicationClientProtocol,会处理来自YarnClient的所有RPC请求。
在submit application中,它会调用RMAppManager来提交应用。
@Override
public SubmitApplicationResponse submitApplication(
SubmitApplicationRequest request) throws YarnException, IOException {
ApplicationSubmissionContext submissionContext = request
.getApplicationSubmissionContext();
ApplicationId applicationId = submissionContext.getApplicationId();
...................
// Check whether app has already been put into rmContext,
// If it is, simply return the response
if (rmContext.getRMApps().get(applicationId) != null) {
LOG.info("This is an earlier submitted application: " + applicationId);
return SubmitApplicationResponse.newInstance();
}
....................
try {
// call RMAppManager to submit application directly
rmAppManager.submitApplication(submissionContext,
System.currentTimeMillis(), user);
LOG.info("Application with id " + applicationId.getId() +
" submitted by user " + user);
RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST,
"ClientRMService", applicationId, callerContext);
} catch (YarnException e) {
LOG.info("Exception in submitting " + applicationId, e);
RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
e.getMessage(), "ClientRMService",
"Exception in submitting application", applicationId, callerContext);
throw e;
}
return recordFactory
.newRecordInstance(SubmitApplicationResponse.class);
}
RMAppManager处理submit application
通过包含Application信息的ApplicationSubmissionContext创建RMAppImpl,并存入RMContext。
RMContext的EventHandler处理该应用的开始事件。
protected void submitApplication(
ApplicationSubmissionContext submissionContext, long submitTime,
String user) throws YarnException {
ApplicationId applicationId = submissionContext.getApplicationId();
// Passing start time as -1. It will be eventually set in RMAppImpl
// constructor.
//创建RMAppImpl,并存入RMContext
RMAppImpl application = createAndPopulateNewRMApp(
submissionContext, submitTime, user, false, -1);
try {
if (UserGroupInformation.isSecurityEnabled()) {
this.rmContext.getDelegationTokenRenewer()
.addApplicationAsync(applicationId,
BuilderUtils.parseCredentials(submissionContext),
submissionContext.getCancelTokensWhenComplete(),
application.getUser(),
BuilderUtils.parseTokensConf(submissionContext));
} else {
// Dispatcher is not yet started at this time, so these START events
// enqueued should be guaranteed to be first processed when dispatcher
// gets started.
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.START));
}
} catch (Exception e) {
LOG.warn("Unable to parse credentials for " + applicationId, e);
// Sending APP_REJECTED is fine, since we assume that the
// RMApp is in NEW state and thus we haven't yet informed the
// scheduler about the existence of the application
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId,
RMAppEventType.APP_REJECTED, e.getMessage()));
throw RPCUtil.getRemoteException(e);
}
}
创建RMApp(RMAppImpl)
//将提交应用的applicationId对应RMApp存入到RMContext
private final RMContext rmContext;
private RMAppImpl createAndPopulateNewRMApp(
ApplicationSubmissionContext submissionContext, long submitTime,
String user, boolean isRecovery, long startTime) throws YarnException {
.......................
// Create RMApp
RMAppImpl application =
new RMAppImpl(applicationId, rmContext, this.conf,
submissionContext.getApplicationName(), user,
submissionContext.getQueue(),
submissionContext, this.scheduler, this.masterService,
submitTime, submissionContext.getApplicationType(),
submissionContext.getApplicationTags(), amReqs, placementContext,
startTime);
// Concurrent app submissions with same applicationId will fail here
// Concurrent app submissions with different applicationIds will not
// influence each other
//RMAppImpl存入RMContext
if (rmContext.getRMApps().putIfAbsent(applicationId, application) !=
null) {
String message = "Application with id " + applicationId
+ " is already present! Cannot add a duplicate!";
LOG.warn(message);
throw new YarnException(message);
}
if (YarnConfiguration.timelineServiceV2Enabled(conf)) {
// Start timeline collector for the submitted app
application.startTimelineCollector();
}
// Inform the ACLs Manager
this.applicationACLsManager.addApplication(applicationId,
submissionContext.getAMContainerSpec().getApplicationACLs());
return application;
}
rmContext.getDispatcher().getEventHandler().handle(new RMAppEvent(applicationId, RMAppEventType.START))源码分析
RMContext关联的Dispatcher
RMContext使用的是AsyncDispatcher
protected RMContextImpl rmContext;
private Dispatcher rmDispatcher;
@Override
protected void serviceInit(Configuration conf) throws Exception {
.............
// register the handlers for all AlwaysOn services using setupDispatcher().
rmDispatcher = setupDispatcher();
addIfService(rmDispatcher);
rmContext.setDispatcher(rmDispatcher);
.............
}
/**
* Register the handlers for alwaysOn services
*/
private Dispatcher setupDispatcher() {
Dispatcher dispatcher = createDispatcher();
dispatcher.register(RMFatalEventType.class,
new ResourceManager.RMFatalEventDispatcher());
return dispatcher;
}
protected Dispatcher createDispatcher() {
return new AsyncDispatcher("RM Event dispatcher");
}
AsyncDispather注册事件类型,以及该类型的事件处理器
事件类型及事件处理器存入map集合eventDispatchers
protected final Map<Class<? extends Enum>, EventHandler> eventDispatchers;
public void register(Class<? extends Enum> eventType, EventHandler handler) {
EventHandler<Event> registeredHandler = (EventHandler)this.eventDispatchers.get(eventType);
LOG.info("Registering " + eventType + " for " + handler.getClass());
if (registeredHandler == null) {
this.eventDispatchers.put(eventType, handler);
} else {
AsyncDispatcher.MultiListenerHandler multiHandler;
if (!(registeredHandler instanceof AsyncDispatcher.MultiListenerHandler)) {
multiHandler = new AsyncDispatcher.MultiListenerHandler();
multiHandler.addHandler(registeredHandler);
multiHandler.addHandler(handler);
this.eventDispatchers.put(eventType, multiHandler);
} else {
multiHandler = (AsyncDispatcher.MultiListenerHandler)registeredHandler;
multiHandler.addHandler(handler);
}
}
}
ResourceManager的内部类RMActiveService为AsyncDispatcher注册事件
@Override
protected void serviceInit(Configuration configuration) throws Exception {
rmDispatcher.register(SchedulerEventType.class, schedulerDispatcher);
// Register event handler for RmAppEvents
rmDispatcher.register(RMAppEventType.class,
new ApplicationEventDispatcher(rmContext));
// Register event handler for RmAppAttemptEvents
rmDispatcher.register(RMAppAttemptEventType.class,
new ApplicationAttemptEventDispatcher(rmContext));
// Register event handler for RmNodes
rmDispatcher.register(
RMNodeEventType.class, new NodeEventDispatcher(rmContext));
rmDispatcher.register(RMAppManagerEventType.class, rmAppManager);
rmDispatcher.register(AMLauncherEventType.class,
applicationMasterLauncher);
}
AsyncDispatcher处理RMAppEventType类型的事件
AsyncDispatcher的EventHandler ——GenericEventHandler
GenericEventHandler.handle()方法也只是把event存入eventQueue
private EventHandler handlerInstance;
private final BlockingQueue<Event> eventQueue;
public EventHandler getEventHandler() {
if (this.handlerInstance == null) {
this.handlerInstance = new AsyncDispatcher.GenericEventHandler();
}
return this.handlerInstance;
}
class GenericEventHandler implements EventHandler<Event> {
GenericEventHandler() {
}
public void handle(Event event) {
if (!AsyncDispatcher.this.blockNewEvents) {
AsyncDispatcher.this.drained = false;
int qSize = AsyncDispatcher.this.eventQueue.size();
if (qSize != 0 && qSize % 1000 == 0) {
AsyncDispatcher.LOG.info("Size of event-queue is " + qSize);
}
int remCapacity = AsyncDispatcher.this.eventQueue.remainingCapacity();
if (remCapacity < 1000) {
AsyncDispatcher.LOG.warn("Very low remaining capacity in the event-queue: " + remCapacity);
}
try {
AsyncDispatcher.this.eventQueue.put(event);
} catch (InterruptedException var5) {
if (!AsyncDispatcher.this.stopped) {
AsyncDispatcher.LOG.warn("AsyncDispatcher thread interrupted", var5);
}
throw new YarnRuntimeException(var5);
}
}
}
}
AsyncDispatcher的异步线程从eventQueue取出event进行处理
创建和运行异步线程
private Thread eventHandlingThread;
@Override
protected void serviceStart() throws Exception {
//start all the components
super.serviceStart();
eventHandlingThread = new Thread(createThread());
eventHandlingThread.setName("AsyncDispatcher event handler");
eventHandlingThread.start();
}
异步线程从eventQueue取出event分发给相应的EventHandler
Runnable createThread() {
return new Runnable() {
@Override
public void run() {
while (!stopped && !Thread.currentThread().isInterrupted()) {
drained = eventQueue.isEmpty();
// blockNewEvents is only set when dispatcher is draining to stop,
// adding this check is to avoid the overhead of acquiring the lock
// and calling notify every time in the normal run of the loop.
if (blockNewEvents) {
synchronized (waitForDrained) {
if (drained) {
waitForDrained.notify();
}
}
}
Event event;
try {
//从eventQueue取出event
event = eventQueue.take();
} catch(InterruptedException ie) {
if (!stopped) {
LOG.warn("AsyncDispatcher thread interrupted", ie);
}
return;
}
if (event != null) {
//分发event
dispatch(event);
}
}
}
};
}
AsyncDispatcher中的map集合—— eventDispatchers记录了每个event对应的EventHandler。找到对应的EventHandler处理该event即可。
@SuppressWarnings("unchecked")
protected void dispatch(Event event) {
//all events go thru this loop
if (LOG.isDebugEnabled()) {
LOG.debug("Dispatching the event " + event.getClass().getName() + "."
+ event.toString());
}
Class<? extends Enum> type = event.getType().getDeclaringClass();
try{
//从eventDispatchers中找到event对应的EventHandler
EventHandler handler = eventDispatchers.get(type);
if(handler != null) {
//专项EventHandler处理专项event
handler.handle(event);
} else {
throw new Exception("No handler for registered for " + type);
}
} catch (Throwable t) {
//TODO Maybe log the state of the queue
LOG.fatal("Error in dispatcher thread", t);
// If serviceStop is called, we should exit this thread gracefully.
if (exitOnDispatchException
&& (ShutdownHookManager.get().isShutdownInProgress()) == false
&& stopped == false) {
Thread shutDownThread = new Thread(createShutDownThread());
shutDownThread.setName("AsyncDispatcher ShutDown handler");
shutDownThread.start();
}
}
}
ApplicationEventDispatcher处理RMAppEvent
RMAppEvent注册的EventHandler是ApplicationEventDispatcher。
ApplicationEventDispatcher其实是通过前面创建并存入RMContext的RMAppImpl,来处理RMAppEvent。
@Private
public static final class ApplicationEventDispatcher implements
EventHandler<RMAppEvent> {
private final RMContext rmContext;
public ApplicationEventDispatcher(RMContext rmContext) {
this.rmContext = rmContext;
}
@Override
public void handle(RMAppEvent event) {
ApplicationId appID = event.getApplicationId();
RMApp rmApp = this.rmContext.getRMApps().get(appID);
if (rmApp != null) {
try {
rmApp.handle(event);
} catch (Throwable t) {
LOG.error("Error in handling event type " + event.getType()
+ " for application " + appID, t);
}
}
}
}
RMAppImpl处理RMAppEvent
@Override
public void handle(RMAppEvent event) {
this.writeLock.lock();
try {
ApplicationId appID = event.getApplicationId();
LOG.debug("Processing event for " + appID + " of type "
+ event.getType());
final RMAppState oldState = getState();
try {
/* keep the master in sync with the state machine */
this.stateMachine.doTransition(event.getType(), event);
} catch (InvalidStateTransitionException e) {
LOG.error("App: " + appID
+ " can't handle this event at current state", e);
onInvalidStateTransition(event.getType(), oldState);
}
// Log at INFO if we're not recovering or not in a terminal state.
// Log at DEBUG otherwise.
if ((oldState != getState()) &&
(((recoveredFinalState == null)) ||
(event.getType() != RMAppEventType.RECOVER))) {
LOG.info(String.format(STATE_CHANGE_MESSAGE, appID, oldState,
getState(), event.getType()));
} else if ((oldState != getState()) && LOG.isDebugEnabled()) {
LOG.debug(String.format(STATE_CHANGE_MESSAGE, appID, oldState,
getState(), event.getType()));
}
} finally {
this.writeLock.unlock();
}
}
RMAppImpl的stateMachine (状态机)
每个RMAppImpl都有一个对应的stateMachine,在实例化RMAppImpl时会初始化该stateMachine。
创建InternalStateMachine。它是StateMachineFactory的内部类。
//RMAppImpl.java
private final StateMachine<RMAppState, RMAppEventType, RMAppEvent> stateMachine;
public RMAppImpl(ApplicationId applicationId, RMContext rmContext,
Configuration config, String name, String user, String queue,
ApplicationSubmissionContext submissionContext, YarnScheduler scheduler,
ApplicationMasterService masterService, long submitTime,
String applicationType, Set<String> applicationTags,
List<ResourceRequest> amReqs, ApplicationPlacementContext
placementContext, long startTime) {
...................
this.stateMachine = stateMachineFactory.make(this);
...................
}
//StateMachineFactory.java
public StateMachine<STATE, EVENTTYPE, EVENT> make(OPERAND operand) {
return new StateMachineFactory.InternalStateMachine(operand, this.defaultInitialState);
}
状态迁移
private class InternalStateMachine implements StateMachine<STATE, EVENTTYPE, EVENT> {
private final OPERAND operand;
private STATE currentState;
InternalStateMachine(OPERAND operand, STATE initialState) {
this.operand = operand;
this.currentState = initialState;
if (!StateMachineFactory.this.optimized) {
StateMachineFactory.this.maybeMakeStateMachineTable();
}
}
public synchronized STATE getCurrentState() {
return this.currentState;
}
public synchronized STATE doTransition(EVENTTYPE eventType, EVENT event) throws InvalidStateTransitonException {
this.currentState = StateMachineFactory.this.doTransition(this.operand, this.currentState, eventType, event);
return this.currentState;
}
}
StateMachineFactory#doTransition()方法取出transition处理相应类型RMAppEvent
/**
* Effect a transition due to the effecting stimulus.
* @param state current state
* @param eventType trigger to initiate the transition
* @param cause causal eventType context
* @return transitioned state
*/
private STATE doTransition
(OPERAND operand, STATE oldState, EVENTTYPE eventType, EVENT event)
throws InvalidStateTransitionException {
// We can assume that stateMachineTable is non-null because we call
// maybeMakeStateMachineTable() when we build an InnerStateMachine ,
// and this code only gets called from inside a working InnerStateMachine .
Map<EVENTTYPE, Transition<OPERAND, STATE, EVENTTYPE, EVENT>> transitionMap
= stateMachineTable.get(oldState);
if (transitionMap != null) {
Transition<OPERAND, STATE, EVENTTYPE, EVENT> transition
= transitionMap.get(eventType);
if (transition != null) {
return transition.doTransition(operand, oldState, event, eventType);
}
}
throw new InvalidStateTransitionException(oldState, eventType);
}
RMAppImpl添加start类型的RMAppEventType的处理Transition
.addTransition(RMAppState.NEW, RMAppState.NEW_SAVING,
RMAppEventType.START, new RMAppNewlySavingTransition())