hadoop任务分配

最新推荐文章于 2021-06-26 15:21:09 发布

Holybug

最新推荐文章于 2021-06-26 15:21:09 发布

阅读量2.3k

点赞数 1

分类专栏： hadoop 文章标签： hadoop 源代码

hadoop 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

一、详谈JobInProgress中Map/Reduce任务分配

众所周知，JobTracker节点使用配置的任务调度器TaskScheduler来为某一个具体的TaskTracker节点分配任务，同时这个任务调度器只能决定给该TaskTracker节点分配哪一个Job或者那些Job的任务以及分配多少个任务，但是它却不能决定给当前的TaskTracker节点分配一个Job的具体的哪一个任务。另外，针对一个具体的TaskTracker节点而言，任何一个作业都可以判断它的那些Map任务相对于该TaskTracker节点来说属于本地任务，那些Map任务是属于非本地任务的，当然，对于Reduce任务来说，是没有本地任务与非本地任务这一说法的。因此，具体来讲就是，当任务调度器决定为一个TaskTracker节点分配一个Job的本地任务时，它会调用该JobInProgress对象的obtainNewLocalMapTask()方法，分配一个非本地任务时，它会调用对应的obtainNewNonLocalMapTask()方法，那么以这个TaskTracker节点在集群中的物理位置为参考，这个Job可能有多个本地任务和多个非本地任务，至于为该TaskTracker节点分配哪一个本地或者非本地任务就由JobInProgress来决定了；当任务调度器为TaskTracker节点分配一个Job的Reduce任务时，就会调用该Job对应的JobInProgress对象的obtainNewReduceTask()方法。至于JobInProgress对象究竟是如何分配一个本地或非本地Map任务、Reduce任务的，那将是本文接下来要详细讲述的重点了。

1.分配作业的Map任务

作业的Map任务之所以有本地和非本地之分，主要是因为该Map任务的输入数据和执行该Map任务的TaskTracker节点在集群中的位置有可能同。本地与非本地Map任务是相对于执行或将要执行该任务的TaskTracker节点来说的，当任务调度器决定为一个TaskTracker节点分配某一个Job的一个本地Map任务时，它(JobInProgress)会查找这个Job中的那些Map任务的输入数据合该TaskTracker节点在同一台PC或机架上，那么这些Map任务对于TaskTracker节点来说就是本地任务了。这里要值得一提的是，在作业初始化的时候，就为每一个Map任务做了一个本地化的预分配工作，即根据Map任务的输入数据的物理位置，将该Map任务挂载到对应的物理节点上，该过程的源代码为：

[java]view plaincopy 
    
 <span xmlns="http://www.w3.org/1999/xhtml" style="">private Map<Node, List<TaskInProgress>> createCache(JobClient.RawSplit[] splits, int maxLevel) {  
     Map<Node, List<TaskInProgress>> cache = new IdentityHashMap<Node, List<TaskInProgress>>(maxLevel);  
       
     for (int i = 0; i < splits.length; i++) {  
       String[] splitLocations = splits[i].getLocations();//获取该数据切片坐在的物理位置(多个副本)  
       if (splitLocations.length == 0) {  
         nonLocalMaps.add(maps[i]);  
         continue;  
       }  
   
       //针对每一个副本的物理位置  
       for(String host: splitLocations) {  
         //解析副本在集群中的哪一个节点上  
         Node node = jobtracker.resolveAndAddToTopology(host);  
         LOG.info("tip:" + maps[i].getTIPId() + " has split on node:" + node);  
         for (int j = 0; j < maxLevel; j++) {  
           List<TaskInProgress> hostMaps = cache.get(node);  
           if (hostMaps == null) {  
             hostMaps = new ArrayList<TaskInProgress>();  
             cache.put(node, hostMaps);  
             hostMaps.add(maps[i]);//将Map任务挂载到该节点上  
           }  
           //去重，避免一个节点挂载了两个相同的Map任务  
           if (hostMaps.get(hostMaps.size() - 1) != maps[i]) {  
             hostMaps.add(maps[i]);  
           }  
           node = node.getParent();//获取节点的父节点(由于maxLevel的值是2，所以父节点就是rack节点)  
        }  
       }  
     }  
       
     return cache;  
   }</span>  

通过这样的一个预处理过程，最终Node与Map任务之间的映射关系被保存在它的一个属性nonRunningMapCache中了。当JobInProgress为一个TaskTracker节点分配一个本地Map任务时，它可以只需要解析该TaskTracker节点在集群中的哪一个节点node上，根据该node就可以从nonRunningMapCache中获取一个Map任务，该Map任务相对于当前这个TaskTracker来说就是本地任务了；当JobInProgress为一个TaskTracker节点分配一个非本地Map任务时，它可以获取集群中所有的rack节点(除它自己所在的rack外)，通过这些rack节点node，就可以从nonRunningMapCache中获取一个Map任务，该Map任务相对于当前这个TaskTracker来说就是非本地任务了。根据上面的源代码可以看出，所谓的本地任务之分就是由maxLevel来确定的，即Map任务的输入数据与TaskTracker节点在集群中的物理距离，在目前的版本中(Hadoop-0.20.2.0)，maxLevel的默认值是2，也可由JobTracker节点的配置文件来设置，对应的配置项为：mapred.task.cache.levels。另外，从上面的源代码可以看出，这个预处理过程也明确地定义了非本地Map任务，即map操作的输入数据的位置为null的Map任务，这并不代表说该Map任务没有输入数据。因为，Hadoop为用户提供了自定义数据切片的API(用户自己实现InputSplit)，这里的RawSplit并没有直接保存map操作所需的输入数据的位置信息，而是对真正的InputSplit进行了封装，这告诉我们两个很重要的情况，一：用户在自定义Map任务的InputSplit时，应考虑这个Map任务是否可以作为某些TaskTracker节点的本地任务(比如某一个Map任务的输入数据在跨越多个节点，那么这个Map任务永远也不可能是本地任务)；二：Map任务的InputSplit实现可以为map操作带入少量的输入数据(例如，某一个Map任务需要两个输入数据，一个数据很大，另一个数据很小，只有几百或上千Bytes，那么，用户就可以自定义一个InputSplit来保存这个小数据，很明显，用HDFS保存这样小的数据根本不划算)。这个非本地Map任务保存在nonLocalMaps属性中。

1.1 分配本地Map任务

JobInProgress给某一个TaskTracker节点分配一个本地Map任务的操作比较的简单，不过，这其中有一个异常情况，就是当这个TaskTracker节点无法被解析成集群中的一个Node时，那么，本次的本地Map任务分配会被当做一次分配非本地Map任务来操作。这个过程的源代码如下：

[java]view plaincopy 
    
 <span xmlns="http://www.w3.org/1999/xhtml" style="">public synchronized Task obtainNewLocalMapTask(TaskTrackerStatus tts,int clusterSize, int numUniqueHosts) throws IOException {  
     if (!tasksInited.get()) {  
       return null;  
     }  
   
     //为当前的计算节点获取一个本地map任务  
     int target = findNewMapTask(tts, clusterSize, numUniqueHosts, maxLevel, status.mapProgress());  
     if (target == -1) {  
       return null;  
     }  
   
     Task result = maps[target].getTaskToRun(tts.getTrackerName());  
     if (result != null) {  
       addRunningTaskToTIP(maps[target], result.getTaskID(), tts, true);  
     }  
   
     return result;  
   }  
 </span>  

[java]view plaincopy 
    
 <span xmlns="http://www.w3.org/1999/xhtml" style="">/** 
  * 为当前的计算节点从作业的map任务集中选取一个合适的任务； 
  * 参数maxCacheLevel决定了当前分配的是本地任务还是非本地任务 
  */  
 private synchronized int findNewMapTask(final TaskTrackerStatus tts, final int clusterSize, final int numUniqueHosts, final int maxCacheLevel, final double avgProgress) {  
   
     ...  
   
    Node node = jobtracker.getNode(tts.getHost());  //根据当前计算节点的主机/IP来获取其在集群拓扑结构中对应的位置节点  
       
     //  
     // I) Non-running TIP :  
     // 1. check from local node to the root [bottom up cache lookup]  
     //    i.e if the cache is available and the host has been resolved  
     //    (node!=null)  
     if (node != null) {  
       Node key = node;    //当前待分配的map任务的输入数据所在的节点  
       int level = 0;  
       // maxCacheLevel might be greater than this.maxLevel if findNewMapTask is  
       // called to schedule any task (local, rack-local, off-switch or speculative)  
       // tasks or it might be NON_LOCAL_CACHE_LEVEL (i.e. -1) if findNewMapTask is  
       //  (i.e. -1) if findNewMapTask is to only schedule off-switch/speculative  
       // tasks  
       int maxLevelToSchedule = Math.min(maxCacheLevel, maxLevel);  
       for (level = 0;level < maxLevelToSchedule; ++level) {  
         List <TaskInProgress> cacheForLevel = nonRunningMapCache.get(key);    //获取节点key上还未分配的map任务  
         if (cacheForLevel != null) {  
           tip = findTaskFromList(cacheForLevel, tts, numUniqueHosts,level == 0);   //从一个map任务集中为当前的计算节点找到一个合适的任务  
           if (tip != null) {  
             // Add to running cache  
             scheduleMap(tip);  
   
             // remove the cache if its empty  
             if (cacheForLevel.size() == 0) {  
               nonRunningMapCache.remove(key);  
             }  
   
             return tip.getIdWithinJob();  
           }  
         }  
           
         key = key.getParent();  
       }  
         
       // Check if we need to only schedule a local task (node-local/rack-local)  
       if (level == maxCacheLevel) {  
         return -1;  
       }  
         
     }  
   
     ...  
   
 }</span>  

还有一个值得注意的问题就是，如果该TaskTracker节点所在的Node上有Map任务时，当从该Node上分配挂载的本地任务时，如果以前发生过该TaskTracker节点执行某一Map任务失败了的情况，则应将该Map任务从Node上删除，同时，对于无法执行或正在执行的Map任务也应该从Node上删除，对应的源码为：

[java]view plaincopy 
    
 <span xmlns="http://www.w3.org/1999/xhtml" style="">private synchronized TaskInProgress findTaskFromList(Collection<TaskInProgress> tips, TaskTrackerStatus ttStatus, int numUniqueHosts, boolean removeFailedTip) {  
       
     Iterator<TaskInProgress> iter = tips.iterator();  
     while (iter.hasNext()) {  
       TaskInProgress tip = iter.next();  
   
       // Select a tip if  
       //   1. runnable   : still needs to be run and is not completed  
       //   2. ~running   : no other node is running it  
       //   3. earlier attempt failed : has not failed on this host  
       //                               and has failed on all the other hosts  
       // A TIP is removed from the list if   
       // (1) this tip is scheduled  
       // (2) if the passed list is a level 0 (host) cache  
       // (3) when the TIP is non-schedulable (running, killed, complete)  
       if (tip.isRunnable() && !tip.isRunning()) {  
         // check if the tip has failed on this host  
         if (!tip.hasFailedOnMachine(ttStatus.getHost()) || tip.getNumberOfFailedMachines() >= numUniqueHosts) {  
           // check if the tip has failed on all the nodes  
           iter.remove();  
           return tip;  
         }  
         else if (removeFailedTip) {   
           // the case where we want to remove a failed tip from the host cache  
           // point#3 in the TIP removal logic above  
           iter.remove();  
         }  
       } else {  
         // see point#3 in the comment above for TIP removal logic  
         iter.remove();  
       }  
     }  
       
     return null;  
   }  
 </span>  

1.2 分配非本地Map任务

JobInProgress为某一个TaskTracker节点分配一个非本地Map任务相对于分配一个本地任务来说要复杂的多，它首先会先从nonRunningMapCache中选择一个非本地任务，如果没有找到再从nonLocalMaps中选择一个任务，如果还没有找到，则判断这个作业是否设置了hasSpeculativeMaps，如果没有设置，则不再为该TaskTracker节点分配非本地Map任务了；如果设置了，则从正在被其它TaskTracker节点执行的本地或非本地Map任务中选一个，不过这是有优先顺序的，首先从正在运行的runningMapCache中寻找一个本地Map任务，如果没有找到再从runningMapCache中寻找一个非本地Map任务，最后再从nonLocalRunningMaps中寻找一个非本地Map任务，此时还没有找到的话，就不再为该TaskTracker节点分配Map任务了。这个过程的源代码如下：

[java]view plaincopy 
    
 public synchronized Task obtainNewNonLocalMapTask(TaskTrackerStatus tts, int clusterSize, int numUniqueHosts)  
   throws IOException {  
     if (!tasksInited.get()) {  
       return null;  
     }  
   
     int target = findNewMapTask(tts, clusterSize, numUniqueHosts, NON_LOCAL_CACHE_LEVEL, status.mapProgress());  
     if (target == -1) {  
       return null;  
     }  
   
     Task result = maps[target].getTaskToRun(tts.getTrackerName());  
     if (result != null) {  
       addRunningTaskToTIP(maps[target], result.getTaskID(), tts, true);  
     }  
   
     return result;  
   }  
   
 private synchronized int findNewMapTask(final TaskTrackerStatus tts, final int clusterSize, final int numUniqueHosts, final int maxCacheLevel, final double avgProgress) {  
   
     ...  
   
    Collection<Node> nodesAtMaxLevel = jobtracker.getNodesAtMaxLevel();  
   
     // get the node parent at max level  
     Node nodeParentAtMaxLevel = (node == null) ? null : JobTracker.getParentNode(node, maxLevel - 1);  
       
     for (Node parent : nodesAtMaxLevel) {  
   
       // skip the parent that has already been scanned  
       if (parent == nodeParentAtMaxLevel) {  
         continue;  
       }  
   
       List<TaskInProgress> cache = nonRunningMapCache.get(parent);  
       if (cache != null) {  
         tip = findTaskFromList(cache, tts, numUniqueHosts, false);  
         if (tip != null) {  
           // Add to the running cache  
           scheduleMap(tip);  
   
           // remove the cache if empty  
           if (cache.size() == 0) {  
             nonRunningMapCache.remove(parent);  
           }  
           LOG.info("Choosing a non-local task " + tip.getTIPId());  
           return tip.getIdWithinJob();  
         }  
       }  
     }  
   
     // 3. Search non-local tips for a new task  
     tip = findTaskFromList(nonLocalMaps, tts, numUniqueHosts, false);  
     if (tip != null) {  
       // Add to the running list  
       scheduleMap(tip);  
   
       LOG.info("Choosing a non-local task " + tip.getTIPId());  
       return tip.getIdWithinJob();  
     }  
   
     // II) Running TIP :  
     if (hasSpeculativeMaps) {  
       long currentTime = System.currentTimeMillis();  
   
       // 1. Check bottom up for speculative tasks from the running cache  
       if (node != null) {  
         Node key = node;  
         for (int level = 0; level < maxLevel; ++level) {  
           Set<TaskInProgress> cacheForLevel = runningMapCache.get(key);  
           if (cacheForLevel != null) {  
             tip = findSpeculativeTask(cacheForLevel, tts, avgProgress, currentTime, level == 0);  
             if (tip != null) {  
               if (cacheForLevel.size() == 0) {  
                 runningMapCache.remove(key);  
               }  
               return tip.getIdWithinJob();  
             }  
           }  
           key = key.getParent();  
         }  
       }  
   
       // 2. Check breadth-wise for speculative tasks  
       for (Node parent : nodesAtMaxLevel) {  
         // ignore the parent which is already scanned  
         if (parent == nodeParentAtMaxLevel) {  
           continue;  
         }  
   
         Set<TaskInProgress> cache = runningMapCache.get(parent);  
         if (cache != null) {  
           tip = findSpeculativeTask(cache, tts, avgProgress, currentTime, false);  
           if (tip != null) {  
             // remove empty cache entries  
             if (cache.size() == 0) {  
               runningMapCache.remove(parent);  
             }  
             LOG.info("Choosing a non-local task " + tip.getTIPId() + " for speculation");  
             return tip.getIdWithinJob();  
           }  
         }  
       }  
   
       // 3. Check non-local tips for speculation  
       tip = findSpeculativeTask(nonLocalRunningMaps, tts, avgProgress, currentTime, false);  
       if (tip != null) {  
         LOG.info("Choosing a non-local task " + tip.getTIPId() + " for speculation");  
         return tip.getIdWithinJob();  
       }  
     }  
       
     return -1;  
 }  

2. 分配作业的Reduce任务

由于 Reduce任务的输入数据来源于该作业所有的Map任务的输出，而执行Map任务的TaskTracker节点将map的输出保存在自己本地，所以Reduce任务的输入数据在绝大多数情况下不可能都在某一个TaskTracker节点上，因此对于任何一个TaskTracker节点来说没有本地和非本地的Reduce任务之分。JobInProgress为某一个TaskTracker节点分配一个Reduce任务的操作就相当的简单了，这个过程类似于分配非本地Map任务。它首先直接从nonRunningReduces中寻找一个任务，如果没有找到则在看这个作业设置了hasSpeculativeReduces没有，若没有则不分配了；若设置了，则从runningReduces中寻找一个正在被其它TaskTracker节点执行的Reduce任务分配给该TaskTracker节点。该过程对应的源代码如下：

[java]view plaincopy 
    
 public synchronized Task obtainNewReduceTask(TaskTrackerStatus tts, int clusterSize, int numUniqueHosts) throws IOException {  
     if (status.getRunState() != JobStatus.RUNNING) {  
       return null;  
     }  
       
     // Ensure we have sufficient map outputs ready to shuffle before   
     // scheduling reduces  
     if (!scheduleReduces()) {  
       return null;  
     }  
   
     int  target = findNewReduceTask(tts, clusterSize, numUniqueHosts, status.reduceProgress());  
     if (target == -1) {  
       return null;  
     }  
       
     Task result = reduces[target].getTaskToRun(tts.getTrackerName());  
     if (result != null) {  
       addRunningTaskToTIP(reduces[target], result.getTaskID(), tts, true);  
     }  
   
     return result;  
   }  
   
 private synchronized int findNewReduceTask(TaskTrackerStatus tts, int clusterSize, int numUniqueHosts, double avgProgress) {  
     if (numReduceTasks == 0) {  
       return -1;  
     }  
   
     String taskTracker = tts.getTrackerName();  
     TaskInProgress tip = null;  
       
     // Update the last-known clusterSize  
     this.clusterSize = clusterSize;  
   
     if (!shouldRunOnTaskTracker(taskTracker)) {  
       return -1;  
     }  
   
     long outSize = resourceEstimator.getEstimatedReduceInputSize();  
     long availSpace = tts.getResourceStatus().getAvailableSpace();  
     if(availSpace < outSize) {  
       LOG.warn("No local disk space for reduce task. TaskTracker[" + taskTracker + "] has " + availSpace + " bytes free; but we expect reduce input to take " + outSize);  
       return -1; //see if a different TIP might work better.   
     }  
       
     // 1. check for a never-executed reduce tip  
     // reducers don't have a cache and so pass -1 to explicitly call that out  
     tip = findTaskFromList(nonRunningReduces, tts, numUniqueHosts, false);  
     if (tip != null) {  
       scheduleReduce(tip);  
       return tip.getIdWithinJob();  
     }  
   
     // 2. check for a reduce tip to be speculated  
     if (hasSpeculativeReduces) {  
       tip = findSpeculativeTask(runningReduces, tts, avgProgress, System.currentTimeMillis(), false);  
       if (tip != null) {  
         scheduleReduce(tip);  
         return tip.getIdWithinJob();  
       }  
     }  
   
     return -1;  
   }  
   
 private synchronized TaskInProgress findSpeculativeTask(Collection<TaskInProgress> list, TaskTrackerStatus ttStatus, double avgProgress, long currentTime, boolean shouldRemove) {  
       
     Iterator<TaskInProgress> iter = list.iterator();  
   
     while (iter.hasNext()) {  
       TaskInProgress tip = iter.next();  
       // should never be true! (since we delete completed/failed tasks)  
       if (!tip.isRunning()) {  
         iter.remove();  
         continue;  
       }  
   
       //当前TaskTracker节点没有运行该任务  
       if (!tip.hasRunOnMachine(ttStatus.getHost(), ttStatus.getTrackerName())) {  
         if (tip.hasSpeculativeTask(currentTime, avgProgress)) {  
           // In case of shared list we don't remove it. Since the TIP failed   
           // on this tracker can be scheduled on some other tracker.  
           if (shouldRemove) {  
             iter.remove(); //this tracker is never going to run it again  
           }  
           return tip;  
         }   
       } else {  
         // Check if this tip can be removed from the list.  
         // If the list is shared then we should not remove.  
         if (shouldRemove) {  
           // This tracker will never speculate this tip  
           iter.remove();  
         }  
       }  
     }  
     return null;  
   }  
     

在目前的Hadoop版本设计中，作业中任务的调度细节被封装到了JobInProgress中，使得作业调度器TaskScheduler可完全控制的调度粒度限制在Job级，同时JobInProgress为上层的TaskScheduler实现的任务调度提供API，这样做就大大地降低了用户自行设计TaskScheduler的门槛，即可以很容易的根据自己的应用场景集中在作业级别上实现合适的调度策略。

http://blog.csdn.net/xhh198781/article/details/7551268

二、 Job的任务执行流程之TaskCleanup

任何一个作业在Hadoop集群中执行主要包括四个阶段：setup、map、reduce、cleanup，但在这四个阶段都出现任务实例在TaskTracker节点执行失败的情况。当一个任务实例在TaskTracker节点的JVM中执行时除了成功执行意外，还有可能出现一些异常情况：1).在JVM中执行失败；2).JVM进程被操作系统stop；3).任务实例被JobTracker节点要求kill；这些异常情况都会造成该任务实例执行的失败，从而使得该任务进入FAILED、FAILED_UNCLEAN、KILLED_UNCLEAN等三种状态中的某一种。这里就有一个问题了，一个任务实例失败时到底会进入哪一种状态？这其实很好判断：

1).如果一个任务实例在JVM中运行时出现异常或错误而无法再继续运行，同时在调用了该任务所属作业对应的OutputCimmitter输出提交器的abortTask()方法之后离开JVM的话，这个任务实例会进入FAILED状态；

2).如果一个任务实例在JVM中运行时出现异常或错误而无法再继续运行，同时在没有调用该任务所属作业对应的OutputCimmitter输出提交器的abortTask()方法就离开了JVM的话，这个任务实例会进入FAILED_UNCLEAN状态；

3).如果一个任务实例在JVM中正常运行时突然被停止了(如：任务实例所在的JVM进程被OSstop或者被TaskTracker节点强制命令停止)，此时还来不起调用该任务所属作业对应的输出提交器OutputCimmitter的abortTask()方法，所以它会进入KILLED_UNCLEAN状态。

本文将主要围绕JobTracker节点对处于FAILED_UNCLEAN和KILLED_UNCLEAN状态的任务实例的处理来详细地展开讲解。

TaskTracker在任务实例停止执行之后，就会把这个任务实例对应的状态报告给JobTracker节点来处理，当然，前面说过，JobTracker节点是不会直接处理任何任务实例的状态报告的，而是交给对应的JobInProgress来处理。对于处于FAILED_UNCLEAN和KILLED_UNCLEAN状态的任务实例，JobInProgress会将他们存储在对应的待清理的任务队列中，当然，一个作业主要包含两种这样的任务队列，一种存储Map型的任务实例，另一种存储Reduce型的任务实例，然后它会交给合适的TaskTracker节点来执行对该任务的清理操作。这种清理工作就是前面所说的TaskCleanup任务。这个处理过程是是很简单的，对应的源代码如下：

[java]view plaincopy 
    
 <span xmlns="http://www.w3.org/1999/xhtml" style=""><span xmlns="http://www.w3.org/1999/xhtml" style=""><span xmlns="http://www.w3.org/1999/xhtml" style=""><span xmlns="http://www.w3.org/1999/xhtml" style="">class JobInProgress {  
   
 ...  
   
    public synchronized void updateTaskStatus(TaskInProgress tip, TaskStatus status) {  
   
       ...  
      if (state == TaskStatus.State.FAILED_UNCLEAN || state == TaskStatus.State.KILLED_UNCLEAN) {  
         tip.incompleteSubTask(taskid, this.status);  
         // add this task, to be rescheduled as cleanup attempt  
         if (tip.isMapTask()) {  
             mapCleanupTasks.add(taskid);  
         } else {  
           reduceCleanupTasks.add(taskid);  
         }  
         // Remove the task entry from jobtracker  
         jobtracker.removeTaskEntry(taskid);  
       }  
      ...  
   }   
   
 ...  
   
 }  
 </span></span></span></span>  

上一篇博文也说过，当一个作业中有TaskCleanup任务的话，就会优先调度这些TaskCleanup任务，而不会调度它的正式Map/Reduce任务。对应的调度策略也很简单，源码如下：

[java]view plaincopy 
    
 <span xmlns="http://www.w3.org/1999/xhtml" style=""><span xmlns="http://www.w3.org/1999/xhtml" style=""><span xmlns="http://www.w3.org/1999/xhtml" style=""><span xmlns="http://www.w3.org/1999/xhtml" style="">public Task obtainTaskCleanupTask(TaskTrackerStatus tts, boolean isMapSlot) throws IOException {  
     if (!tasksInited.get()) {  
       return null;  
     }  
       
     synchronized (this) {  
       if (this.status.getRunState() != JobStatus.RUNNING || jobFailed || jobKilled) {  
         return null;  
       }  
         
       String taskTracker = tts.getTrackerName();  
       if (!shouldRunOnTaskTracker(taskTracker)) {  
         return null;  
       }  
         
       TaskAttemptID taskid = null;  
       TaskInProgress tip = null;  
       if (isMapSlot) {  
         if (!mapCleanupTasks.isEmpty()) {  
           taskid = mapCleanupTasks.remove(0);  
           tip = maps[taskid.getTaskID().getId()];  
         }  
       } else {  
         if (!reduceCleanupTasks.isEmpty()) {  
           taskid = reduceCleanupTasks.remove(0);  
           tip = reduces[taskid.getTaskID().getId()];  
         }  
       }  
         
       if (tip != null) {  
         return tip.addRunningTask(taskid, taskTracker, true);  
       }  
         
       return null;  
     }  
       
   }  
 </span></span></span></span>  

TaskTracker节点对TaskCleanup任务的本地化和调度同JobSetup、JobCleanup、Map、Reduce任务是一样，最终都会交给一个JVM实例来负责执行。在JVM中，它主要会调用作业对应的输出提交器OutputCimmitter的abortTask()方法，即放弃该任务，在FileOutputCimmitter实现中就是清理该任务实例在执行过程中所占用的临时存储空间。这里要提醒的是，无论这个TaskCleanup任务在JVM中执行成功或者失败，或者在本地化时就出错而被kill掉，它都会进入对应的 FAILED 或者 KILLED 状态：如果该 TaskCleanup 任务处于 FAILED_UNCLEAN状态，它就会进入FAILED状态；和如果该 TaskCleanup处于KILLED_UNCLEAN状态，它就会进入KILLED状态。 TaskCleanup 任务被TaskTracker节点执行完之后的处理同 JobSetup、JobCleanup、Map、Reduce任务实例也是一样的，所以就不再赘述了。

http://blog.csdn.net/xhh198781/article/details/7429859