概述
ResourceTracker是ResourceManager与NodeManager通信使用的RPC协议。基于ResourceTracker,NodeManager可完成向ResourceManager注册、周期性心跳汇报工作,并在周期性心跳汇报中领取RM下达的命令,比如重新初始化、清理 Container等。在这个过程中,NM扮演的是RPC client的角色,而RM扮演的是RPC server的角色,而这一过程是pull模型,即总是由slave节点NM主动发起,向RM注册或周期性汇报。
ResourceTracker协议中的三种通信内容如下:
- registerNodeManager:NodeManager向ResourceManager注册;
- nodeHeartbeat:NodeManager周期性心跳汇报;
- unRegisterNodeManager:NodeManager取消注册;
/**
* This is used by the Node Manager to register/nodeHeartbeat/unregister with
* the ResourceManager.
*/
public interface ResourceTracker {
@Idempotent
RegisterNodeManagerResponse registerNodeManager(
RegisterNodeManagerRequest request) throws YarnException, IOException;
@AtMostOnce
NodeHeartbeatResponse nodeHeartbeat(NodeHeartbeatRequest request)
throws YarnException, IOException;
@Idempotent
UnRegisterNodeManagerResponse unRegisterNodeManager(
UnRegisterNodeManagerRequest request) throws YarnException, IOException;
}
RM端心跳调度
NodeManager通过nodeHeartbeart()方法,向ResourceManager汇报了自身资源情况(比如,当前可用资源,正在使用的资源,已经释放的资源)。这个RPC会触发ResourceManager调用nodeUpdate()方法,这个方法为这个节点进行一次资源调度,即,从维护的Queue中取出合适的应用的资源请求(合适 ,指的是这个资源请求既不违背队列的最大资源使用限制,也不违背这个NodeManager的剩余资源量限制)放到这个NodeManager上运行。
在RM端ResourceTracker协议的实现类是ResourceTrackerService。在ResourceTrackerService#nodeHeartbeat()方法中,最终会触发nodeUpdate()方法。
RM端心跳调度源码分析
ResourceTrackerService#nodeHeartbeat()方法
ResourceTrackerService在nodeHeartbeat()方法中创建STATUS_UPDATE类型的RMNodeEvent
public NodeHeartbeatResponse nodeHeartbeat(NodeHeartbeatRequest request)
throws YarnException, IOException {
// Heartbeat response
NodeHeartbeatResponse nodeHeartBeatResponse =
YarnServerBuilderUtils.newNodeHeartbeatResponse(
getNextResponseId(lastNodeHeartbeatResponse.getResponseId()),
NodeAction.NORMAL, null, null, null, null, nextHeartBeatInterval);
rmNode.setAndUpdateNodeHeartbeatResponse(nodeHeartBeatResponse);
.....
// 4. Send status to RMNode, saving the latest response.
RMNodeStatusEvent nodeStatusEvent =
new RMNodeStatusEvent(nodeId, remoteNodeStatus);
if (request.getLogAggregationReportsForApps() != null
&& !request.getLogAggregationReportsForApps().isEmpty()) {
nodeStatusEvent.setLogAggregationReportsForApps(request
.getLogAggregationReportsForApps());
}
this.rmContext.getDispatcher().getEventHandler().handle(nodeStatusEvent);
......
return nodeHeartBeatResponse;
}
StatusUpdateWhenHealthyTransition处理STATUS_UPDATE类型的RMNodeEvent。
//Transitions from RUNNING state
.addTransition(NodeState.RUNNING,
EnumSet.of(NodeState.RUNNING, NodeState.UNHEALTHY),
RMNodeEventType.STATUS_UPDATE,
new StatusUpdateWhenHealthyTransition())
StatusUpdateWhenHealthyTransition创建nodeUpdate类型的SchedulerEvent
......
if(rmNode.nextHeartBeat) {
rmNode.nextHeartBeat = false;
rmNode.context.getDispatcher().getEventHandler().handle(
new NodeUpdateSchedulerEvent(rmNode));
}
.......
FairScheduler处理nodeUpdate的调度事件
@Override
public void handle(SchedulerEvent event) {
switch (event.getType()) {
......
case NODE_UPDATE:
if (!(event instanceof NodeUpdateSchedulerEvent)) {
throw new RuntimeException("Unexpected event type: " + event);
}
NodeUpdateSchedulerEvent nodeUpdatedEvent = (NodeUpdateSchedulerEvent)event;
nodeUpdate(nodeUpdatedEvent.getRMNode());
break;
.......
}
}
FairScheduler#nodeUpdate方法
protected void nodeUpdate(RMNode nm) {
writeLock.lock();
try {
long start = getClock().getTime();
super.nodeUpdate(nm);
FSSchedulerNode fsNode = getFSSchedulerNode(nm.getNodeID());
attemptScheduling(fsNode);
long duration = getClock().getTime() - start;
fsOpDurations.addNodeUpdateDuration(duration);
} finally {
writeLock.unlock();
}
}
attemptScheduling方法往下的源码分析请参考:yarn3.2 源码分析之RM端assignContainer流程