在Hadoop中JT(JobTracker)与TT(TaskTracker)之间的通信是通过心跳机制完成的。JT实现InterTrackerProtocol协议,该协议定义了JT与TT之间的通信机制——心跳。心跳机制实际上就是一个RPC请求,JT作为Server,而TT作为Client,TT通过RPC调用JT的heartbeat方法,将TT自身的一些状态信息发送给JT,同时JT通过返回值返回对TT的指令。
心跳有三个作用:
1)判断TT是否活着
2)报告TT的资源情况以及任务运行情况
3)为TT发送指令(如运行task,kill task等)
下面详细阅读下涉及到心跳调用的源码。
首先我们需要清楚,心跳机制是TT调用JT的方法,而非JT主动调用TT的方法。TT通过transmitHeartBeat方法调用JT的heartbeat方法。
1.TaskTracker.transmitHeartBeat:
- // Send Counters in the status once every COUNTER_UPDATE_INTERVAL
- boolean sendCounters;
- if (now > (previousUpdate + COUNTER_UPDATE_INTERVAL)) {
- sendCounters = true;
- previousUpdate = now;
- }
- else {
- sendCounters = false;
- }
2.TaskTracker.transmitHeartBeat:
- 1.TaskTracker.transmitHeartBeat:
- // Check if the last heartbeat got through...
- // if so then build the heartbeat information for the JobTracker;
- // else resend the previous status information.
- //
- if (status == null) {
- synchronized (this) {
- status = new TaskTrackerStatus(taskTrackerName, localHostname,
- httpPort,
- cloneAndResetRunningTaskStatuses(
- sendCounters),
- taskFailures,
- localStorage.numFailures(),
- maxMapSlots,
- maxReduceSlots);
- }
- } else {
- LOG.info("Resending 'status' to '" + jobTrackAddr.getHostName() +
- "' with reponseId '" + heartbeatResponseId);
- }
- private synchronized List<TaskStatus> cloneAndResetRunningTaskStatuses(
- boolean sendCounters) {
- List<TaskStatus> result = new ArrayList<TaskStatus>(runningTasks.size());
- for(TaskInProgress tip: runningTasks.values()) {
- TaskStatus status = tip.getStatus();
- status.setIncludeCounters(sendCounters);
- // send counters for finished or failed tasks and commit pending tasks
- if (status.getRunState() != TaskStatus.State.RUNNING) {
- status.setIncludeCounters(true);
- }
- result.add((TaskStatus)status.clone());
- status.clearStatus();
- }
- return result;
- }
3.TaskTrackerStatus():
- public TaskTrackerStatus(String trackerName, String host,
- int httpPort, List<TaskStatus> taskReports,
- int taskFailures, int dirFailures,
- int maxMapTasks, int maxReduceTasks) {
- this.trackerName = trackerName;
- this.host = host;
- this.httpPort = httpPort;
- this.taskReports = new ArrayList<TaskStatus>(taskReports);
- this.taskFailures = taskFailures;
- this.dirFailures = dirFailures;
- this.maxMapTasks = maxMapTasks;
- this.maxReduceTasks = maxReduceTasks;
- this.resStatus = new ResourceStatus();
- this.healthStatus = new TaskTrackerHealthStatus();
- }
1)taskReports:包含该TT上目前所有的Task状态信息,其中的counters信息会根据之前判断sendCounters值进行决定是否发送,上一步有提到。
2)taskFailures:该TT上失败的Task总数(重启会清空),该参数帮助JT决定是否向该TT提交Task,因为失败数越多表明该TT可能出现Task失败的概率越大。
3)dirFailures:这个值是mapred.local.dir参数设置的目录中有多少是不可用的(以后会详细提到)
4)maxMapSlots/maxReduceSlots:这个值是TT可使用的最大map和reduce slot数量
初始化完成,继续回到TaskTracker.transmitHeartBeat方法。
4.TaskTracker.transmitHeartBeat:
- // Check if we should ask for a new Task
- //
- boolean askForNewTask;
- long localMinSpaceStart;
- synchronized (this) {
- askForNewTask =
- ((status.countOccupiedMapSlots() < maxMapSlots ||
- status.countOccupiedReduceSlots() < maxReduceSlots) &&
- acceptNewTasks);
- localMinSpaceStart = minSpaceStart;
- }
- if (askForNewTask) {
- askForNewTask = enoughFreeSpace(localMinSpaceStart);
- long freeDiskSpace = getFreeSpace();
- long totVmem = getTotalVirtualMemoryOnTT();
- long totPmem = getTotalPhysicalMemoryOnTT();
- long availableVmem = getAvailableVirtualMemoryOnTT();
- long availablePmem = getAvailablePhysicalMemoryOnTT();
- long cumuCpuTime = getCumulativeCpuTimeOnTT();
- long cpuFreq = getCpuFrequencyOnTT();
- int numCpu = getNumProcessorsOnTT();
- float cpuUsage = getCpuUsageOnTT();
- status.getResourceStatus().setAvailableSpace(freeDiskSpace);
- status.getResourceStatus().setTotalVirtualMemory(totVmem);
- status.getResourceStatus().setTotalPhysicalMemory(totPmem);
- status.getResourceStatus().setMapSlotMemorySizeOnTT(
- mapSlotMemorySizeOnTT);
- status.getResourceStatus().setReduceSlotMemorySizeOnTT(
- reduceSlotSizeMemoryOnTT);
- status.getResourceStatus().setAvailableVirtualMemory(availableVmem);
- status.getResourceStatus().setAvailablePhysicalMemory(availablePmem);
- status.getResourceStatus().setCumulativeCpuTime(cumuCpuTime);
- status.getResourceStatus().setCpuFrequency(cpuFreq);
- status.getResourceStatus().setNumProcessors(numCpu);
- status.getResourceStatus().setCpuUsage(cpuUsage);
- }
首先第一步status.countOccupiedMapSlots()获得该TT上已占用的map slot数量:
- /**
- * Get the number of occupied map slots.
- * @return the number of occupied map slots
- */
- public int countOccupiedMapSlots() {
- int mapSlotsCount = 0;
- for (TaskStatus ts : taskReports) {
- if (ts.getIsMap() && isTaskRunning(ts)) {
- mapSlotsCount += ts.getNumSlots();
- }
- }
- return mapSlotsCount;
- }
localMinSpaceStart = minSpaceStart,minSpaceStart由mapred.local.dir.minspacestart参数决定,默认是0,即无限制,该值的意思应该是可接收新任务的localDirs最小的可用空间大小。接下来可以看到该值能够影响acceptNewTasks值。
当acceptNewTasks==true时,即初步判断可以接收新任务,会再次根据localMinSpaceStart判断是否可接收新任务。
- /**
- * Check if any of the local directories has enough
- * free space (more than minSpace)
- *
- * If not, do not try to get a new task assigned
- * @return
- * @throws IOException
- */
- private boolean enoughFreeSpace(long minSpace) throws IOException {
- if (minSpace == 0) {
- return true;
- }
- return minSpace < getFreeSpace();
- }
- private long getFreeSpace() throws IOException {
- long biggestSeenSoFar = 0;
- String[] localDirs = localStorage.getDirs();
- for (int i = 0; i < localDirs.length; i++) {
- DF df = null;
- if (localDirsDf.containsKey(localDirs[i])) {
- df = localDirsDf.get(localDirs[i]);
- } else {
- df = new DF(new File(localDirs[i]), fConf);
- localDirsDf.put(localDirs[i], df);
- }
- long availOnThisVol = df.getAvailable();
- if (availOnThisVol > biggestSeenSoFar) {
- biggestSeenSoFar = availOnThisVol;
- }
- }
- //Should ultimately hold back the space we expect running tasks to use but
- //that estimate isn't currently being passed down to the TaskTrackers
- return biggestSeenSoFar;
- }
接下来就是获取TT的一些资源信息,如总虚拟内存,总物理内存,可用的虚拟内存,可用的物理内存,CPU使用情况等。接着将这些值添加到status中去,发送给JT。
5.TaskTracker.transmitHeartBeat:
- //add node health information
- TaskTrackerHealthStatus healthStatus = status.getHealthStatus();
- synchronized (this) {
- if (healthChecker != null) {
- healthChecker.setHealthStatus(healthStatus);
- } else {
- healthStatus.setNodeHealthy(true);
- healthStatus.setLastReported(0L);
- healthStatus.setHealthReport("");
- }
- }
6.TaskTracker.transmitHeartBeat:
- //
- // Xmit the heartbeat
- //
- HeartbeatResponse heartbeatResponse = jobClient.heartbeat(status,
- justStarted,
- justInited,
- askForNewTask,
- heartbeatResponseId);
7.TaskTracker.transmitHeartBeat:
- //
- // The heartbeat got through successfully!
- //
- heartbeatResponseId = heartbeatResponse.getResponseId();
- synchronized (this) {
- for (TaskStatus taskStatus : status.getTaskReports()) {
- if (taskStatus.getRunState() != TaskStatus.State.RUNNING &&
- taskStatus.getRunState() != TaskStatus.State.UNASSIGNED &&
- taskStatus.getRunState() != TaskStatus.State.COMMIT_PENDING &&
- !taskStatus.inTaskCleanupPhase()) {
- if (taskStatus.getIsMap()) {
- mapTotal--;
- } else {
- reduceTotal--;
- }
- myInstrumentation.completeTask(taskStatus.getTaskID());
- runningTasks.remove(taskStatus.getTaskID());
- }
- }
- // Clear transient status information which should only
- // be sent once to the JobTracker
- for (TaskInProgress tip: runningTasks.values()) {
- tip.getStatus().clearStatus();
- }
- }
- // Force a rebuild of 'status' on the next iteration
- status = null;
- return heartbeatResponse;
myInstrumentation.completeTask(taskStatus.getTaskID())此处将该TT所有完成任务数加一,runningTasks.remove(taskStatus.getTaskID())则是将该task从runningTasks队列中移除,所以可以知道runningTasks中只包含未完成的task信息。
接下来是清除TaskInProgress的TaskStatus的临时信息(diagnosticInfo),从clearStatus()方法的注释可以看出diagnosticInfo信息只是在Task向TaskTracker,或者TaskTracker向JobTracker发送一个状态更新信息时的临时诊断信息,所以在发送完成之后需要清除。
到这里整个TaskTracker发送心跳信息的过程就完成了,方法返回值是HeartbeatResponse对象,即心跳的返回值。