hadoop的源码分析--Reduce的input过程

 public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    try {
      while (context.nextKey()) {//看上下文中 是否还有一组key
      // 如果有执行下一组 将当前的key 和一组数据放入到reduce方法中
        reduce(context.getCurrentKey(), context.getValues(), context);
        // If a back up store is used, reset it
        //目前我的理解是 拿到值的迭代器 看是否是 reduce的迭代器 ,如果是该类的子类 ,表示使用了迭代器。然后清空一下迭代器。
        Iterator<VALUEIN> iter = context.getValues().iterator();
        if(iter instanceof ReduceContext.ValueIterator) {
          ((ReduceContext.ValueIterator<VALUEIN>)iter).resetBackupStore();        
        }
      }
    } finally {
      cleanup(context);
    }
  }
}

reduce 有三个阶段,分别是 Shuffle ,Sort, Reduce三个阶段

  1. Shuffle
    The Reducer copies the sorted output from each Mapper using HTTP across the network.、
    reducer 会使用http协议通过网络进行数据的拉取,shuffle就是一个数据拉取的过程

  2. Sort
    The framework merge sorts Reducer inputs by keys (since different Mappers may have output the same key).
    框架会通过key的值对reducer 输入的数据进行归并排序 (因为不同的mapper可能会产生有相同的key值)
    The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.
    拉取和排序阶段是同时进行的也就是说,等输出的数组被拿到,他们就已经被合并了。

SecondarySort
To achieve a secondary sort on the values returned by the value iterator, the application should extend the key with the secondary key and define a grouping comparator.
为了对值迭代器所返回的值进行二次排序,此应用必须使用第二个key扩充关键字,并且定义一个分组比较器
The keys will be sorted using the entire key, but will be grouped using the grouping comparator to decide which keys and values are sent in the same call to reduce.
这些键将使用整个键进行排序,但将使用分组比较器进行分组,以确定在同一个reduce调用中发送哪些键和值。
The grouping comparator is specified via Job.setGroupingComparatorClass(Class). The sort order is controlled by Job.setSortComparatorClass(Class).
分组比较器可以通过Job.setGroupingComparatorClass(Class). 方法实例化, 排序的顺序通过Job.setSortComparatorClass(Class).方法控制

分组比较器的调用优先级 :
1. 用户自定义的分组比较器
2. 用户自定义的排序比较器
3. 系统默认的排序比较器

  1. Reduce
    In this phase the reduce(Object, Iterable, Reducer.Context) method is called for each <key, (collection of values)> in the sorted inputs.
    在这个阶段每一个已被排序的输入都会调用一次
    The output of the reduce task is typically written to a RecordWriter via Reducer.Context.write(Object, Object).The output of the Reducer is not re-sorted.
    reduce的输出通常会通过Reducer.Context.write(Object, Object).写入到RecordWriter 中,Reducer的输出不会再次排序
  public void run(JobConf job, TaskUmbilicalProtocol umbilical) throws IOException, InterruptedException, ClassNotFoundException {
        job.setBoolean("mapreduce.job.skiprecords", this.isSkipping());
        if (this.isMapOrReduce()) {//判断是否是Map或reduce阶段,设置各个时期
            this.copyPhase = this.getProgress().addPhase("copy");
            this.sortPhase = this.getProgress().addPhase("sort");
            this.reducePhase = this.getProgress().addPhase("reduce");
        }
        TaskReporter reporter = this.startReporter(umbilical);
        boolean useNewApi = job.getUseNewReducer();//记录是否使用新的API
        //进行初始化操作  
        this.initialize(job, this.getJobID(), reporter, useNewApi);


        if (this.jobCleanup) {
            this.runJobCleanupTask(umbilical, reporter);
        } else if (this.jobSetup) {
            this.runJobSetupTask(umbilical, reporter);
        } else if (this.taskCleanup) {
            this.runTaskCleanupTask(umbilical, reporter);
        } else {
            this.codec = this.initCodec();//得到文件的编码 
            RawKeyValueIterator rIter = null;
            ShuffleConsumerPlugin shuffleConsumerPlugin = null;
            Class combinerClass = this.conf.getCombinerClass();
            CombineOutputCollector combineCollector = null != combinerClass ? new CombineOutputCollector(this.reduceCombineOutputCounter, reporter, this.conf) : null;
            Class<? extends ShuffleConsumerPlugin> clazz = job.getClass("mapreduce.job.reduce.shuffle.consumer.plugin.class", Shuffle.class, ShuffleConsumerPlugin.class);
            shuffleConsumerPlugin = (ShuffleConsumerPlugin)ReflectionUtils.newInstance(clazz, job);
            LOG.info("Using ShuffleConsumerPlugin: " + shuffleConsumerPlugin);
            Context shuffleContext = new Context(this.getTaskID(), job, FileSystem.getLocal(job), umbilical, super.lDirAlloc, reporter, this.codec, combinerClass, combineCollector, this.spilledRecordsCounter, this.reduceCombineInputCounter, this.shuffledMapsCounter, this.reduceShuffleBytes, this.failedShuffleCounter, this.mergedMapOutputsCounter, this.taskStatus, this.copyPhase, this.sortPhase, this, this.mapOutputFile, this.localMapFiles);
            shuffleConsumerPlugin.init(shuffleContext);
            rIter = shuffleConsumerPlugin.run();

            //以上是shuffle的拉去过程,先不去考虑吧细节,只知道返回了一个迭代器对象

            this.mapOutputFilesOnDisk.clear();
            this.sortPhase.complete();
            this.setPhase(Phase.REDUCE);
            this.statusUpdate(umbilical);
            Class keyClass = job.getMapOutputKeyClass();
            Class valueClass = job.getMapOutputValueClass();
            RawComparator comparator = job.getOutputValueGroupingComparator();
            if (useNewApi) {
                this.runNewReducer(job, umbilical, reporter, rIter, comparator, keyClass, valueClass);
            } else {
                this.runOldReducer(job, umbilical, reporter, rIter, comparator, keyClass, valueClass);
            }

            shuffleConsumerPlugin.close();
            this.done(umbilical, reporter);
        }
    }

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值