Hadoop源码分析(四) ApplicationMasterLauncher源码分析 2021SC@SDUSC
2021SC@SDUSC
一、ApplicationMasterLauncher简介
上一章,介绍了ApplicationMaster的有关内容,其中ApplicationMasterLauncher是ApplicationMaster管理的一部分,它主要负责与NodeManager通信,以完成ApplicationMaster的启动。当用户向yarn ResourceManager提交应用程序时,ResourceManager收到提交请求后,先向资源调度器申请用以启动ApplicationMaster的资源,申请到资源以后,再由ApplicationMasterLauncher与对应的NodeManager通信,从而启动应用程序的ApplicationMaster。
二、ApplicationMasterLauncher工作流程
ApplicationMasterLauncher主要处理AMLauncherEventType.LAUNCH和AMLauncherEventType.CLEANUP两类事件。ApplicationMasterLauncher主要是基于生产者消费者模式,根据AttemptStoredTransition生成的AMLauncherEventType.LAUNCH事件生成对应类型的AMLauncher放入masterEvents事件队列。ApplicationMasterLauncher维护了一个线程池,从而能够尽快地处理事件:如果ApplicationMasterLauncher收到LAUNCH事件,它会与对应的NodeManager通信,要求它启动ApplicationMaster;如果ApplicationMasterLauncher收到CLEANUP事件,它与对应的NodeManager通信,要求它杀死ApplicationMaster。
而ApplicationMasterLauncher初始化启动时,会启动launcherHandlingThread线程,该线程又维护了一个线程池ThreadPoolExecutor launcherPool,也会异步执行AMLauncher对应的launch方法或cleanup方法。
三、ApplicationMasterLauncher源码分析
3.1 主要属性
launcherPool:工作线程池
laucherHandlingPool:工作线程
masterEvents:阻塞式队列
context:RM的context信息
//org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher.Java
//ApplicationMasterLauncher线程池
private ThreadPoolExecutor launcherPool;
//工作线程,也是该部分的核心线程,用于监控masterEvents队列,并将任务下发给launcherPool执行。
private LauncherThread launcherHandlingThread;
//阻塞式队列
private final BlockingQueue<Runnable> masterEvents
= new LinkedBlockingQueue<Runnable>();
//RM的context信息: RMContextImpl
protected final RMContext context;
3.2 初始化方法
serviceInit方法,主要用于初始化工作线程池的launcherPool。
@Override
protected void serviceInit(Configuration conf) throws Exception {
//设置默认线程数量为50
int threadCount = conf.getInt(
YarnConfiguration.RM_AMLAUNCHER_THREAD_COUNT,
YarnConfiguration.DEFAULT_RM_AMLAUNCHER_THREAD_COUNT);
ThreadFactory tf = new ThreadFactoryBuilder()
.setNameFormat("ApplicationMasterLauncher #%d")
.build();
//构建launcherPool,工作线程池
launcherPool = new ThreadPoolExecutor(threadCount, threadCount, 1,
TimeUnit.HOURS, new LinkedBlockingQueue<Runnable>());
//构建launcherPool线程工厂类
launcherPool.setThreadFactory(tf);
Configuration newConf = new YarnConfiguration(conf);
//设置重置次数,最大为10次
newConf.setInt(CommonConfigurationKeysPublic.
IPC_CLIENT_CONNECT_MAX_RETRIES_ON_SOCKET_TIMEOUTS_KEY,
conf.getInt(YarnConfiguration.RM_NODEMANAGER_CONNECT_RETRIES,
YarnConfiguration.DEFAULT_RM_NODEMANAGER_CONNECT_RETRIES));
setConfig(newConf);
super.serviceInit(newConf);
}
3.3 构造方法
当ResourceManager调用serviceInit方法时调用该构造方法,主要是构建工作线程: launcherHandlingThread。
public ApplicationMasterLauncher(RMContext context) {
super(ApplicationMasterLauncher.class.getName());
this.context = context;
//构建工作线程: launcherHandlingThread
this.launcherHandlingThread = new LauncherThread();
}
3.4 启动方法
serviceStart方法,主要是启动工作线程launcherHandlingThread
@Override
protected void serviceStart() throws Exception {
//启动
launcherHandlingThread.start();
super.serviceStart();
}
3.5 run方法
LauncherThread是主工作线程,run方法主要从队列masterEvents中领取任务,然后交给工作线程池launcherPool执行。
private class LauncherThread extends Thread {
public LauncherThread() {
super("ApplicationMaster Launcher");
}
@Override
public void run() {
while (!this.isInterrupted()) {
Runnable toLaunch;
try {
//从队列masterEvents中领取任务
toLaunch = masterEvents.take();
//用于处理masterEvents队列中的事件的线程池
launcherPool.execute(toLaunch);
} catch (InterruptedException e) {
LOG.warn(this.getClass().getName() + " interrupted. Returning.");
return;
}
}
}
}
3.6 handle方法
处理AMLauncherEvent事件,根据事件的类型处理相应的任务,主要有两种任务:LAUNCH和CLEANUP。
@Override
public synchronized void handle(AMLauncherEvent appEvent) {
AMLauncherEventType event = appEvent.getType();
RMAppAttempt application = appEvent.getAppAttempt();
switch (event) {
//LAUNCH事件,用于启动ApplicationMaster
case LAUNCH:
launch(application);
break;
//CLEANUP事件,用于杀死ApplicationMaster
case CLEANUP:
cleanup(application);
break;
default:
break;
}
}
3.7 launch方法
主要是构建启动类型的任务加入到masterEvents,之后会有主线程执行
//构建AMLauncher,交由工作线程池执行
protected Runnable createRunnableLauncher(RMAppAttempt application,
AMLauncherEventType event) {
Runnable launcher =
new AMLauncher(context, application, event, getConfig());
return launcher;
}
private void launch(RMAppAttempt application) {
//构建启动类型的任务
Runnable launcher = createRunnableLauncher(application,
AMLauncherEventType.LAUNCH);
//添加任务队列
masterEvents.add(launcher);
}
3.8 AMLauncher
AMLauncher构建好之后,交由工作线程池执行,主要负责处理LAUNCH和CLEANUP事件。
3.8.1 AMLauncher通讯协议
ContainerManagementProtocol是AM与NM之间的通讯协议,AM通过与该RPC协议通信要求NM启动或者停止Container,获取各个Container的使用状态等信息。
该协议主要提供了三个RPC函数:
(1)startContainers:AM通过该RPC要求NM启动一个Container。该函数有一个StartContainerRequest类型的参数,封装了Container启动所需的本地资源、环境变量、执行命令、Token等信息。若Container启动成功,则该函数返回一个StartContainerResponse对象。
@Public
@Stable
StartContainersResponse startContainers(StartContainersRequest request)
throws YarnException, IOException;
(2)stopContainers:AM通过该RPC要求NM杀死一个Container,有一个StartContainerRequest类型的参数,用于指定待杀死的ContainerID。若Container被成功杀死,则该函数返回一个StartContainerResponse对象。
@Public
@Stable
StopContainersResponse stopContainers(StopContainersRequest request)
throws YarnException, IOException;
(3)getContainerStatuses:AM通过该RPC获取一个Container的运行状态。该函数参数类型为GetContainerStatusRequest,封装了目标Container的ID,返回值为封装了Container当前运行状态的类型为GetContainerStatusResponse对象。
@Public
@Stable
GetContainerStatusesResponse getContainerStatuses(
GetContainerStatusesRequest request) throws YarnException,
IOException;
3.8.2 AMLauncher构造方法
ApplicationMasterLauncher中的createRunnableLauncher方法调用后,生成AMLauncher 对象,然后加入到masterEvents队列,等到主线程launcherHandlingThread 获取交由工作线程launcherPool 调用。
@SuppressWarnings("rawtypes")
private final EventHandler handler;
public AMLauncher(RMContext rmContext, RMAppAttempt application,
AMLauncherEventType eventType, Configuration conf) {
this.application = application;
this.conf = conf;
this.eventType = eventType;
this.rmContext = rmContext;
this.handler = rmContext.getDispatcher().getEventHandler();
this.masterContainer = application.getMasterContainer();
this.timelineServiceV2Enabled = YarnConfiguration.
timelineServiceV2Enabled(conf);
}
3.8.3 AMLauncher的run方法
线程池ApplicationMasterLauncher中的launcherPool 执行后,根据事件类型eventType处理对应的事件,即LAUNCH和CLEANUP事件。
@SuppressWarnings("unchecked")
public void run() {
switch (eventType) {
//处理LAUNCH事件,启动AM
case LAUNCH:
try {
LOG.info("Launching master" + application.getAppAttemptId());
//尝试启动Container
launch();
//处理RMAppAttemptEvent启动事件
handler.handle(new RMAppAttemptEvent(application.getAppAttemptId(),
RMAppAttemptEventType.LAUNCHED, System.currentTimeMillis()));
} catch(Exception ie) {
onAMLaunchFailed(masterContainer.getId(), ie);
}
break;
//处理CLEANUP事件,执行请求操作
case CLEANUP:
try {
LOG.info("Cleaning master " + application.getAppAttemptId());
//执行清理操作
cleanup();
} catch(IOException ie) {
LOG.info("Error cleaning master ", ie);
} catch (YarnException e) {
StringBuilder sb = new StringBuilder("Container ");
sb.append(masterContainer.getId().toString())
.append(" is not handled by this NodeManager");
if (!e.getMessage().contains(sb.toString())) {
// Ignoring if container is already killed by Node Manager.
LOG.info("Error cleaning master ", e);
}
}
break;
default:
LOG.warn("Received unknown event-type " + eventType + ". Ignoring.");
break;
}
}
3.8.4 AMLauncher的launch方法
AM通过launch方法与NM建立连接,通过ContainerManagementProtocol中的startContainers协议发送信息启动Container。ApplicationMasterLauncher收到LAUNCH事件,它会与对应的NodeManager通信,要求它启动ApplicationMaster。
private void launch() throws IOException, YarnException {
//建立连接
connect();
//获取containerId
ContainerId masterContainerID = masterContainer.getId();
//获取applicationContext信息
ApplicationSubmissionContext applicationContext =
application.getSubmissionContext();
LOG.info("Setting up container " + masterContainer
+ " for AM " + application.getAppAttemptId());
ContainerLaunchContext launchContext =
createAMContainerLaunchContext(applicationContext, masterContainerID);
StartContainerRequest scRequest =
StartContainerRequest.newInstance(launchContext,
masterContainer.getContainerToken());
List<StartContainerRequest> list = new ArrayList<StartContainerRequest>();
list.add(scRequest);
StartContainersRequest allRequests =
StartContainersRequest.newInstance(list);
//获取响应信息
StartContainersResponse response =
containerMgrProxy.startContainers(allRequests);
if (response.getFailedRequests() != null
&& response.getFailedRequests().containsKey(masterContainerID)) {
Throwable t =
response.getFailedRequests().get(masterContainerID).deSerialize();
parseAndThrowException(t);
} else {
//succeeded_requests,成功的请求
LOG.info("Done launching container " + masterContainer + " for AM "
+ application.getAppAttemptId());
}
}
3.8.5 AMLauncher的cleanup方法
ApplicationMasterLauncher收到CLEANUP事件,它与对应的NodeManager通信,要求它杀死ApplicationMaster。
private void cleanup() throws IOException, YarnException {
//建立连接
connect();
//获取container的id信息
ContainerId containerId = masterContainer.getId();
List<ContainerId> containerIds = new ArrayList<ContainerId>();
containerIds.add(containerId);
//构建stop请求
StopContainersRequest stopRequest =
StopContainersRequest.newInstance(containerIds);
//发送stop请求
StopContainersResponse response =
containerMgrProxy.stopContainers(stopRequest);
//处理响应信息
if (response.getFailedRequests() != null
&& response.getFailedRequests().containsKey(containerId)) {
Throwable t = response.getFailedRequests().get(containerId).deSerialize();
parseAndThrowException(t);
}
}
之后会详细介绍ApplicationMasterService和AMLivelinessMonitor的相关内容,以便完善对ApplicationMaster的分析研究。