Hadoop MapReduce之ReduceTask任务执行（六）

最新推荐文章于 2021-07-11 21:41:33 发布

__海盗__

最新推荐文章于 2021-07-11 21:41:33 发布

阅读量985

点赞数

分类专栏： hadoop

本文链接：https://blog.csdn.net/lihm0_1/article/details/17142607

版权

hadoop 专栏收录该内容

55 篇文章 0 订阅

订阅专栏

前面我们分别讨论了reduce的三个阶段，copy、sort、reduce，都是分开解析的，其实这些功能都包含在一个函数中，而且阶段分明，通过下面的分析，会对reduce流程理解会更清晰。下面函数的入口是Child.main -> taskFinal.run(job, umbilical)

public void run(JobConf job, final TaskUmbilicalProtocol umbilical)
    throws IOException, InterruptedException, ClassNotFoundException {
    this.umbilical = umbilical;
    job.setBoolean("mapred.skip.on", isSkipping());
		//设置reduce任务的三个阶段
    if (isMapOrReduce()) {
      copyPhase = getProgress().addPhase("copy");
      sortPhase  = getProgress().addPhase("sort");
      reducePhase = getProgress().addPhase("reduce");
    }
    // 启动通信进程用于和父进程通信
    TaskReporter reporter = new TaskReporter(getProgress(), umbilical,
        jvmContext);
    reporter.startCommunicationThread();
    boolean useNewApi = job.getUseNewReducer();
    initialize(job, getJobID(), reporter, useNewApi);


    // 检测作业类型
    if (jobCleanup) {
      runJobCleanupTask(umbilical, reporter);
      return;
    }
    if (jobSetup) {
      runJobSetupTask(umbilical, reporter);
      return;
    }
    if (taskCleanup) {
      runTaskCleanupTask(umbilical, reporter);
      return;
    }
    
    // Initialize the codec
    codec = initCodec();
		//进入copy阶段
    boolean isLocal = "local".equals(job.get("mapred.job.tracker", "local"));
    if (!isLocal) {
      reduceCopier = new ReduceCopier(umbilical, job, reporter);
      if (!reduceCopier.fetchOutputs()) {
        if(reduceCopier.mergeThrowable instanceof FSError) {
          throw (FSError)reduceCopier.mergeThrowable;
        }
        throw new IOException("Task: " + getTaskID() + 
            " - The reduce copier failed", reduceCopier.mergeThrowable);
      }
    }
    copyPhase.complete();                         // copy is already complete
    //进入排序阶段
    setPhase(TaskStatus.Phase.SORT);
    statusUpdate(umbilical);


    final FileSystem rfs = FileSystem.getLocal(job).getRaw();
    RawKeyValueIterator rIter = isLocal
      ? Merger.merge(job, rfs, job.getMapOutputKeyClass(),
          job.getMapOutputValueClass(), codec, getMapFiles(rfs, true),
          !conf.getKeepFailedTaskFiles(), job.getInt("io.sort.factor", 100),
          new Path(getTaskID().toString()), job.getOutputKeyComparator(),
          reporter, spilledRecordsCounter, null)
      : reduceCopier.createKVIterator(job, rfs, reporter);
        
    // free up the data structures
    mapOutputFilesOnDisk.clear();
    
    sortPhase.complete();                         // sort is complete
    //进入reduce阶段
    setPhase(TaskStatus.Phase.REDUCE); 
    statusUpdate(umbilical);
    Class keyClass = job.getMapOutputKeyClass();
    Class valueClass = job.getMapOutputValueClass();
    RawComparator comparator = job.getOutputValueGroupingComparator();


    if (useNewApi) {
      runNewReducer(job, umbilical, reporter, rIter, comparator, 
                    keyClass, valueClass);
    } else {
      runOldReducer(job, umbilical, reporter, rIter, comparator, 
                    keyClass, valueClass);
    }
    //全部执行完毕，结束与TT通信
    done(umbilical, reporter);
  }

__海盗__

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hadoop MapReduce之ReduceTask任务执行（六）

前面我们分别讨论了reduce的三个阶段，copy、sort、reduce，都是分开解析的，其实这些功能都包含在一个函数中，而且阶段分明，通过下面的分析，会对reduce流程理解会更清晰。下面函数的入口是Child.main -> taskFinal.run(job, umbilical)public void run(JobConf job, final TaskUmbilicalProto
复制链接

扫一扫