Task运行过程分析1

1、Task运行过程概述
在MapReduce计算框架中,一个应用程序被划分成Map和Reduce两个计算阶段,它们分别由一个或者多个Map Task和Reduce Task组成。其中,每个Map Task处理输入数据集合中的一片数据(InputSplit),并将产生的若干个数据片段写到本地磁盘上,而Reduce Task则从每个Map Task上远程拷贝相应的数据片段,经分组聚集和归约后,将结果写到HDFS上作为最终结果,如下图所示:
这里写图片描述
总体上看,Map Task与Reduce Task之间的数据传输采用了pull模型。为了能够容错,Map Task将中间计算结果存放到本地磁盘上,而Reduce Task则通过Http请求从各个Map Task端拖取(pull)相应的输入数据。为了更好地支持大量Reduce Task并发从Map Task端拷贝数据,Hadoop采用了Jetty Server作为HTTP Server处理并发数据读请求。

对于Map Task而言,它的执行过程可概述为:首先,通过用户提供的InputFormat将对应的InputSplit解析成一系列key/value,并依次交给用户编写的map()函数处理;接着按照指定的Partitioner对数据分片,以确定每个key/value将交给哪个Reduce Task处理;之后将数据交给用户定义的Combiner进行一次本地规约(用户没有定义则直接跳过);最后将处理结果保存到本地磁盘上。
对于Reduce Task而言,由于它的输入数据来自各个Map Task,因此首先需要通过HTTP请求从各个已经运行完成的Map Task上拷贝对应的数据分片,待所有数据拷贝完成后,再以key为关键字对所有数据进行排序,通过排序,key 相同的记录聚集到一起形成若干分组,然后将每组数据交给用户编写的reduce()函数处理,并将数据结果直接写到HDFS上作为最终输出结果。

这里插一句,当我在想这个流程的时候,有几个问题一直困扰着我。。。
Map Task如何启动的?
Map Task的输入从哪里来?
Reduce Task如何知道自己处理哪些数据?
自己查阅了相关资料,也有一些自己的理解,如果不正确,希望指出来。。。

1、首先Map Task如何启动
实际上,TaskTracker初始化时,会初始化并启动两个TaskLauncher类型的线程,mapLauncher,reduceLauncher。在TaskTracker从JobTracher获取到任务后,对应的会把任务添加到两个TaskLauncher的Queue中。
在org.apache.hadoop.mapred.TaskTracker.java中的initialize()函数中初始化并启动这两个线程

 mapLauncher = new TaskLauncher(TaskType.MAP, maxMapSlots);
 reduceLauncher = new TaskLauncher(TaskType.REDUCE, maxReduceSlots);
 mapLauncher.start();
 reduceLauncher.start();

TaskLauncher是一个线程,一直检查tasksToLaunch列表中是否有任务,有的情况下取出执行。
TaskLauncher是org.apache.hadoop.mapred.TaskTracker.java的一个内部类。。。

 class TaskLauncher extends Thread {
    private IntWritable numFreeSlots;
    private final int maxSlots;
    private List<TaskInProgress> tasksToLaunch;

    public TaskLauncher(TaskType taskType, int numSlots) {
      this.maxSlots = numSlots;
      this.numFreeSlots = new IntWritable(numSlots);
      this.tasksToLaunch = new LinkedList<TaskInProgress>();
      setDaemon(true);
      setName("TaskLauncher for " + taskType + " tasks");
    }

    public void addToTaskQueue(LaunchTaskAction action) {
      synchronized (tasksToLaunch) {
        TaskInProgress tip = registerTask(action, this);
        tasksToLaunch.add(tip);
        tasksToLaunch.notifyAll();
      }
    }

    public void cleanTaskQueue() {
      tasksToLaunch.clear();
    }

    public void addFreeSlots(int numSlots) {
      synchronized (numFreeSlots) {
        numFreeSlots.set(numFreeSlots.get() + numSlots);
        assert (numFreeSlots.get() <= maxSlots);
        LOG.info("addFreeSlot : current free slots : " + numFreeSlots.get());
        numFreeSlots.notifyAll();
      }
    }

    void notifySlots() {
      synchronized (numFreeSlots) {
        numFreeSlots.notifyAll();
      }
    }

    int getNumWaitingTasksToLaunch() {
      synchronized (tasksToLaunch) {
        return tasksToLaunch.size();
      }
    }

    public void run() {
      while (!Thread.interrupted()) {
        try {
          TaskInProgress tip;
          Task task;
          synchronized (tasksToLaunch) {
            while (tasksToLaunch.isEmpty()) {
              tasksToLaunch.wait();
            }
            //get the TIP
            tip = tasksToLaunch.remove(0);
            task = tip.getTask();
            LOG.info("Trying to launch : " + tip.getTask().getTaskID() + 
                     " which needs " + task.getNumSlotsRequired() + " slots");
          }
          //wait for free slots to run
          synchronized (numFreeSlots) {
            boolean canLaunch = true;
            while (numFreeSlots.get() < task.getNumSlotsRequired()) {
              //Make sure that there is no kill task action for this task!
              //We are not locking tip here, because it would reverse the
              //locking order!
              //Also, Lock for the tip is not required here! because :
              // 1. runState of TaskStatus is volatile
              // 2. Any notification is not missed because notification is
              // synchronized on numFreeSlots. So, while we are doing the check,
              // if the tip is half way through the kill(), we don't miss
              // notification for the following wait().
              if (!tip.canBeLaunched()) {
                //got killed externally while still in the launcher queue
                LOG.info("Not blocking slots for " + task.getTaskID()
                    + " as it got killed externally. Task's state is "
                    + tip.getRunState());
                canLaunch = false;
                break;
              }
              LOG.info("TaskLauncher : Waiting for " + task.getNumSlotsRequired() + 
                       " to launch " + task.getTaskID() + ", currently we have " + 
                       numFreeSlots.get() + " free slots");
              numFreeSlots.wait();
            }
            if (!canLaunch) {
              continue;
            }
            LOG.info("In TaskLauncher, current free slots : " + numFreeSlots.get()+
                     " and trying to launch "+tip.getTask().getTaskID() + 
                     " which needs " + task.getNumSlotsRequired() + " slots");
            numFreeSlots.set(numFreeSlots.get() - task.getNumSlotsRequired());
            assert (numFreeSlots.get() >= 0);
          }
          synchronized (tip) {
            //to make sure that there is no kill task action for this
            if (!tip.canBeLaunched()) {
              //got killed externally while still in the launcher queue
              LOG.info("Not launching task " + task.getTaskID() + " as it got"
                + " killed externally. Task's state is " + tip.getRunState());
              addFreeSlots(task.getNumSlotsRequired());
              continue;
            }
            tip.slotTaken = true;
          }
          
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值