在NMSimulator.java中:
public void middleStep() throws Exception {
// we check the lifetime for each running containers
ContainerSimulator cs = null;
synchronized(completedContainerList) {
while ((cs = containerQueue.poll()) != null) {
runningContainers.remove(cs.getId());
completedContainerList.add(cs.getId());
LOG.debug("Container {} has completed", cs.getId());
}
}
// send heart beat
NodeHeartbeatRequest beatRequest =
Records.newRecord(NodeHeartbeatRequest.class);
beatRequest.setLastKnownNMTokenMasterKey(masterKey);
NodeStatus ns = Records.newRecord(NodeStatus.class);
ns.setContainersStatuses(generateContainerStatusList());
ns.setNodeId(node.getNodeID());
ns.setKeepAliveApplications(new ArrayList<ApplicationId>());
ns.setResponseId(responseId++);
ns.setNodeHealthStatus(NodeHealthStatus.newInstance(true, "", 0));
//set node & containers utilization
if (resourceUtilizationRatio > 0 && resourceUtilizationRatio <=1) {
int pMemUsed = Math.round(node.getTotalCapability().getMemorySize()
* resourceUtilizationRatio);
float cpuUsed = node.getTotalCapability().getVirtualCores()
* resourceUtilizationRatio;
ResourceUtilization resourceUtilization = ResourceUtilization.newInstance(
pMemUsed, pMemUsed, cpuUsed);
ns.setContainersUtilization(resourceUtilization);
ns.setNodeUtilization(resourceUtilization);
}
beatRequest.setNodeStatus(ns);
NodeHeartbeatResponse beatResponse =
rm.getResourceTrackerService().nodeHeartbeat(beatRequest);
NM和AM类都继承了TaskRunner类,然后重写了firstStep(), middleStep()和lastStep()方法。
其中NM的middleStep方法中,向RM发送heartbeat,可以看出NM是直接调用了RM的getResourceTrackerService中的nodeHeartbeat方法来产生心跳。
而在实际YARN部署环境中,通过NM通过RPC机制(底层仍然是TCP)远程调用RM的nodeHeartbaet方法。
对于AM:
其实AM的传输心跳的方式我一直没找清楚,只是估计是这个方法:(后面再进行验证)
AMSimulator.java中:
@Override
public void middleStep() throws Exception {
if (isAMContainerRunning) {
// process responses in the queue
processResponseQueue();
// send out request
sendContainerRequest();
// check whether finish
checkStop();
}
}
同样是middleStep中,调用了sendContainerRequest方法,该方法在MRAMSimulator.java和StreamAMSimulator.java中进行了重载:
MRAMSimulator.java
......
final AllocateRequest request = createAllocateRequest(ask);
if (totalContainers == 0) {
request.setProgress(1.0f);
} else {
request.setProgress((float) finishedContainers / totalContainers);
}
UserGroupInformation ugi =
UserGroupInformation.createRemoteUser(appAttemptId.toString());
Token<AMRMTokenIdentifier> token = rm.getRMContext().getRMApps()
.get(appAttemptId.getApplicationId())
.getRMAppAttempt(appAttemptId).getAMRMToken();
ugi.addTokenIdentifier(token.decodeIdentifier());
AllocateResponse response = ugi.doAs(
new PrivilegedExceptionAction<AllocateResponse>() {
@Override
public AllocateResponse run() throws Exception {
return rm.getApplicationMasterService().allocate(request);
}
});
if (response != null) {
responseQueue.put(response);
}
}
AM通过rm.getRMContext().getRMApps()方法通信。也是直接调用RM的方式。
可以看出,SLS代码真的非常简单。