一. 前言
处理来自ApplicationMaster的请求, 主要包括注册和心跳两种请求, 其中, 注册是ApplicationMaster启动时发生的行为, 注册请求包中
包含ApplicationMaster启动节点; 对外RPC端口号和tracking URL等信息; 而心跳则是周期性行为, 汇报信息包含所需资源描述、 待释放的Container列表、 黑名单列表等, 而AMS则为之返回新分配的Container、 失败的Container、 待抢占的Container列表等信息。
二. 接口协议
用于对所有提交的ApplicationMaster进行管理。
该组件响应所有来自AM的请求,实现了ApplicationMasterProtocol协议,这个协议是AM与RM通信的唯一协议。
主要包括以下任务:
注册新的AM、来自任意正在结束的AM的终止/取消注册请求、认证来自不同AM的所有请求,
确保合法的AM发送的请求传递给RM中的应用程序对象、获取来自所有运行AM的Container的分配和释放请求、异步的转发给Yarn调度器。
ApplicaitonMaster Service确保了任意时间点、任意AM只有一个线程可以发送请求给RM,因为在RM上所有来自AM的RPC请求都串行化了。
方法名称 | 描述 |
---|---|
registerApplicationMaster | 新的ApplicationMaster向RM注册 ApplicationMaster会提供RPC端口,url等信息给RM,响应信息会返回集群所能响应的最大资源能力 |
finishApplicationMaster | ApplicationMaster通知RM自己的状态为成功/失败 |
allocate | ApplicationMaster向ResourceManager申请资源/心跳 |
三.构造方法
通过ResourceManager 的serviceInit 方法构建.
public ApplicationMasterService(RMContext rmContext,
YarnScheduler scheduler) {
this(ApplicationMasterService.class.getName(), rmContext, scheduler);
}
public ApplicationMasterService(String name, RMContext rmContext,
YarnScheduler scheduler) {
super(name);
this.amLivelinessMonitor = rmContext.getAMLivelinessMonitor();
this.rScheduler = scheduler;
this.rmContext = rmContext;
// AMSProcessingChain通过责任链模式处理ApplicationMaster的注册
// 责任链上的processor的头节点目前是DefaultAMSProcessor
this.amsProcessingChain = new AMSProcessingChain(new DefaultAMSProcessor());
}
四. 属性
// AM监控
private final AMLivelinessMonitor amLivelinessMonitor;
// 调度器
private YarnScheduler rScheduler;
// 接口地址
protected InetSocketAddress masterServiceAddress;
// 服务实体
protected Server server;
protected final RecordFactory recordFactory = RecordFactoryProvider.getRecordFactory(null);
// 存储响应实体
private final ConcurrentMap<ApplicationAttemptId, AllocateResponseLock> responseMap = new ConcurrentHashMap<ApplicationAttemptId, AllocateResponseLock>();
// ApplicationAttemptId 状态
private final ConcurrentHashMap<ApplicationAttemptId, Boolean> finishedAttemptCache = new ConcurrentHashMap<>();
// RM信息
protected final RMContext rmContext;
// 存放AM的处理Chain
private final AMSProcessingChain amsProcessingChain;
// 是否启用timelineServiceV2 , 默认 false
private boolean timelineServiceV2Enabled;
五.serviceInit 方法
这里就是初始化了masterServiceAddress 服务地址: 0.0.0.0/0.0.0.0:8030
然后开始初始话 initializeProcessingChain
@Override
protected void serviceInit(Configuration conf) throws Exception {
// 构建 rpc 服务
// 0.0.0.0/0.0.0.0:8030
masterServiceAddress = conf.getSocketAddr(
YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_SCHEDULER_ADDRESS,
YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS,
YarnConfiguration.DEFAULT_RM_SCHEDULER_PORT);
// 初始化amsProcessingChain
initializeProcessingChain(conf);
}
- initializeProcessingChain
private void initializeProcessingChain(Configuration conf) {
amsProcessingChain.init(rmContext, null);
// 处理放置策略, 默认 拒绝
// yarn.resourcemanager.placement-constraints.handler : disabled
addPlacementConstraintHandler(conf);
// 从配置文件中获取 ApplicationMasterServiceProcessor 添加到amsProcessingChain 中.
List<ApplicationMasterServiceProcessor> processors = getProcessorList(conf);
if (processors != null) {
Collections.reverse(processors);
for (ApplicationMasterServiceProcessor p : processors) {
// Ensure only single instance of PlacementProcessor is included
if (p instanceof AbstractPlacementProcessor) {
LOG.warn("Found PlacementProcessor=" + p.getClass().getCanonicalName()
+ " defined in "
+ YarnConfiguration.RM_APPLICATION_MASTER_SERVICE_PROCESSORS
+ ", however PlacementProcessor handler should be configured "
+ "by using " + YarnConfiguration.RM_PLACEMENT_CONSTRAINTS_HANDLER
+ ", this processor will be ignored.");
continue;
}
this.amsProcessingChain.addProcessor(p);
}
}
}
六.serviceStart方法
核心就是启动server服务: BoYi-Pro.local/192.168.xx.xxx:8030
@Override
protected void serviceStart() throws Exception {
Configuration conf = getConfig();
YarnRPC rpc = YarnRPC.create(conf);
Configuration serverConf = conf;
// If the auth is not-simple, enforce it to be token-based.
serverConf = new Configuration(conf);
serverConf.set( CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, SaslRpcServer.AuthMethod.TOKEN.toString());
// ProtobufRpcEngin$Server ==> 0.0.0.0: 8030
this.server = getServer(rpc, serverConf, masterServiceAddress, this.rmContext.getAMRMTokenSecretManager());
// TODO more exceptions could be added later.
this.server.addTerseExceptions(ApplicationMasterNotRegisteredException.class);
// Enable service authorization?
if (conf.getBoolean(
CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION,
false)) {
InputStream inputStream =
this.rmContext.getConfigurationProvider()
.getConfigurationInputStream(conf,
YarnConfiguration.HADOOP_POLICY_CONFIGURATION_FILE);
if (inputStream != null) {
conf.addResource(inputStream);
}
refreshServiceAcls(conf, RMPolicyProvider.getInstance());
}
this.server.start();
// 刷新配置 BoYi-Pro.local/192.168.xx.xxx:8030
this.masterServiceAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_SCHEDULER_ADDRESS,
YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS,
server.getListenerAddress());
this.timelineServiceV2Enabled = YarnConfiguration.timelineServiceV2Enabled(conf);
super.serviceStart();
}
七. registerApplicationMaster
ApplicationMasterProtocol协议中定义的方法. 用于注册application.
获取注册请求, 校验通过之后, 构建response , 然后使用amsProcessingChain进行注册操作.
@Override
public RegisterApplicationMasterResponse registerApplicationMaster(
RegisterApplicationMasterRequest request) throws YarnException,
IOException {
AMRMTokenIdentifier amrmTokenIdentifier =
YarnServerSecurityUtils.authorizeRequest();
ApplicationAttemptId applicationAttemptId =
amrmTokenIdentifier.getApplicationAttemptId();
// 获取ApplicationId
ApplicationId appID = applicationAttemptId.getApplicationId();
// 是否注册过 ? 获取注册响应对象 锁????
// 因为在注册AM之前 会注册一个 app attempt
AllocateResponseLock lock = responseMap.get(applicationAttemptId);
if (lock == null) {
RMAuditLogger.logFailure(this.rmContext.getRMApps().get(appID).getUser(),
AuditConstants.REGISTER_AM, "Application doesn't exist in cache "
+ applicationAttemptId, "ApplicationMasterService",
"Error in registering application master", appID,
applicationAttemptId);
throwApplicationDoesNotExistInCacheException(applicationAttemptId);
}
// 同一时间只能有一个线程注册
// Allow only one thread in AM to do registerApp at a time.
synchronized (lock) {
AllocateResponse lastResponse = lock.getAllocateResponse();
if (hasApplicationMasterRegistered(applicationAttemptId)) {
// allow UAM re-register if work preservation is enabled
ApplicationSubmissionContext appContext =
rmContext.getRMApps().get(appID).getApplicationSubmissionContext();
if (!(appContext.getUnmanagedAM()
&& appContext.getKeepContainersAcrossApplicationAttempts())) {
String message =
AMRMClientUtils.APP_ALREADY_REGISTERED_MESSAGE + appID;
LOG.warn(message);
RMAuditLogger.logFailure(
this.rmContext.getRMApps().get(appID).getUser(),
AuditConstants.REGISTER_AM, "", "ApplicationMasterService",
message, appID, applicationAttemptId);
throw new InvalidApplicationMasterRequestException(message);
}
}
// 更新心跳时间
this.amLivelinessMonitor.receivedPing(applicationAttemptId);
// 将 response id 设置为0,以标识应用程序主机是否注册了相应的 attemptid
// Setting the response id to 0 to identify if the
// application master is register for the respective attemptid
lastResponse.setResponseId(0);
// 更新lastResponse
lock.setAllocateResponse(lastResponse);
RegisterApplicationMasterResponse response =
recordFactory.newRecordInstance(
RegisterApplicationMasterResponse.class);
// 执行注册操作
this.amsProcessingChain.registerApplicationMaster(amrmTokenIdentifier.getApplicationAttemptId(), request, response);
return response;
}
}
八.finishApplicationMaster
ApplicationMasterProtocol协议中定义的方法. 用于 App Master 通知ApplicationMasterService .
直接调用this.amsProcessingChain .finishApplicationMaster 执行注册操作.
@Override
public FinishApplicationMasterResponse finishApplicationMaster(
FinishApplicationMasterRequest request) throws YarnException,
IOException {
// 获取 applicationAttemptId
ApplicationAttemptId applicationAttemptId =
YarnServerSecurityUtils.authorizeRequest().getApplicationAttemptId();
// 获取 ApplicationId
ApplicationId appId = applicationAttemptId.getApplicationId();
// 获取RMApp
RMApp rmApp =
rmContext.getRMApps().get(applicationAttemptId.getApplicationId());
// Remove collector address when app get finished.
if (timelineServiceV2Enabled) {
((RMAppImpl) rmApp).removeCollectorData();
}
// checking whether the app exits in RMStateStore at first not to throw
// ApplicationDoesNotExistInCacheException before and after
// RM work-preserving restart.
if (rmApp.isAppFinalStateStored()) {
LOG.info(rmApp.getApplicationId() + " unregistered successfully. ");
return FinishApplicationMasterResponse.newInstance(true);
}
AllocateResponseLock lock = responseMap.get(applicationAttemptId);
if (lock == null) {
throwApplicationDoesNotExistInCacheException(applicationAttemptId);
}
// Allow only one thread in AM to do finishApp at a time.
synchronized (lock) {
if (!hasApplicationMasterRegistered(applicationAttemptId)) {
String message =
"Application Master is trying to unregister before registering for: "
+ appId;
LOG.error(message);
RMAuditLogger.logFailure(
this.rmContext.getRMApps()
.get(appId).getUser(),
AuditConstants.UNREGISTER_AM, "", "ApplicationMasterService",
message, appId,
applicationAttemptId);
throw new ApplicationMasterNotRegisteredException(message);
}
FinishApplicationMasterResponse response =
FinishApplicationMasterResponse.newInstance(false);
// finishedAttemptCache 是否存在applicationAttemptId
if (finishedAttemptCache.putIfAbsent(applicationAttemptId, true)
== null) {
// 没有处理过,直接处理
this.amsProcessingChain
.finishApplicationMaster(applicationAttemptId, request, response);
}
// 处理监控心跳
this.amLivelinessMonitor.receivedPing(applicationAttemptId);
return response;
}
}
九.allocate
还是调用amsProcessingChain.allocate 处理请求.
@Override
public AllocateResponse allocate(AllocateRequest request)
throws YarnException, IOException {
AMRMTokenIdentifier amrmTokenIdentifier = YarnServerSecurityUtils.authorizeRequest();
ApplicationAttemptId appAttemptId = amrmTokenIdentifier.getApplicationAttemptId();
// 更新心跳时间
this.amLivelinessMonitor.receivedPing(appAttemptId);
/*
如果缓存中没有数据,直接抛出异常.
check if its in cache
* */
AllocateResponseLock lock = responseMap.get(appAttemptId);
if (lock == null) {
String message =
"Application attempt " + appAttemptId
+ " doesn't exist in ApplicationMasterService cache.";
LOG.error(message);
throw new ApplicationAttemptNotFoundException(message);
}
synchronized (lock) {
AllocateResponse lastResponse = lock.getAllocateResponse();
if (!hasApplicationMasterRegistered(appAttemptId)) {
String message =
"AM is not registered for known application attempt: "
+ appAttemptId
+ " or RM had restarted after AM registered. "
+ " AM should re-register.";
throw new ApplicationMasterNotRegisteredException(message);
}
// Normally request.getResponseId() == lastResponse.getResponseId()
if (AMRMClientUtils.getNextResponseId(
request.getResponseId()) == lastResponse.getResponseId()) {
// heartbeat one step old, simply return lastReponse
return lastResponse;
} else if (request.getResponseId() != lastResponse.getResponseId()) {
throw new InvalidApplicationMasterRequestException(AMRMClientUtils
.assembleInvalidResponseIdExceptionMessage(appAttemptId,
lastResponse.getResponseId(), request.getResponseId()));
}
// 构建响应
AllocateResponse response = recordFactory.newRecordInstance(AllocateResponse.class);
// 关键点 ~~~~
this.amsProcessingChain.allocate( amrmTokenIdentifier.getApplicationAttemptId(), request, response);
// update AMRMToken if the token is rolled-up
MasterKeyData nextMasterKey =
this.rmContext.getAMRMTokenSecretManager().getNextMasterKeyData();
if (nextMasterKey != null
&& nextMasterKey.getMasterKey().getKeyId() != amrmTokenIdentifier
.getKeyId()) {
// 获取RM application
RMApp app = this.rmContext.getRMApps().get(appAttemptId.getApplicationId());
RMAppAttempt appAttempt = app.getRMAppAttempt(appAttemptId);
RMAppAttemptImpl appAttemptImpl = (RMAppAttemptImpl)appAttempt;
Token<AMRMTokenIdentifier> amrmToken = appAttempt.getAMRMToken();
if (nextMasterKey.getMasterKey().getKeyId() !=
appAttemptImpl.getAMRMTokenKeyId()) {
LOG.info("The AMRMToken has been rolled-over. Send new AMRMToken back"
+ " to application: " + appAttemptId.getApplicationId());
amrmToken = rmContext.getAMRMTokenSecretManager()
.createAndGetAMRMToken(appAttemptId);
appAttemptImpl.setAMRMToken(amrmToken);
}
response.setAMRMToken(org.apache.hadoop.yarn.api.records.Token
.newInstance(amrmToken.getIdentifier(), amrmToken.getKind()
.toString(), amrmToken.getPassword(), amrmToken.getService()
.toString()));
}
/*
* As we are updating the response inside the lock object so we don't
* need to worry about unregister call occurring in between (which
* removes the lock object).
*/
response.setResponseId(
AMRMClientUtils.getNextResponseId(lastResponse.getResponseId()));
lock.setAllocateResponse(response);
return response;
}
}