MapReduce框架中map、reduce方法的运行方式

最新推荐文章于 2022-08-13 13:09:39 发布

风声2012

最新推荐文章于 2022-08-13 13:09:39 发布

阅读量4.6k

点赞数

分类专栏： Hadoop MapReduce编程文章标签： mapreduce 框架 class input output

本文链接：https://blog.csdn.net/zklth/article/details/5816195

版权

Hadoop 同时被 2 个专栏收录

20 篇文章 0 订阅

订阅专栏

MapReduce编程

7 篇文章 0 订阅

订阅专栏

MapReduce程序中的map和reduce方法是重载Mapper类和Reducer类的map和reduce方法。

MapReduce程序中的map和reduce方法在框架中默认是如下运行方式:
针对一个<key,value>对运行一次map方法或者reduce方法，其具体实现见package org.apache.hadoop.mapreduce包下的Mapper类和Reducer类。

实现机制: Mapper类和Reducer类的run方法中将针对输入的所有<key,value>对,循环执行map方法和reduce方法。

请见Hadoop-0.20.1的源码

Mapper类:

package org.apache.hadoop.mapreduce

public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {

public class Context
    extends MapContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
    public Context(Configuration conf, TaskAttemptID taskid,
                   RecordReader<KEYIN,VALUEIN> reader,
                   RecordWriter<KEYOUT,VALUEOUT> writer,
                   OutputCommitter committer,
                   StatusReporter reporter,
                   InputSplit split) throws IOException, InterruptedException {
      super(conf, taskid, reader, writer, committer, reporter, split);
    }
}

/**
   * Called once at the beginning of the task.
   */
protected void setup(Context context
                       ) throws IOException, InterruptedException {
    // NOTHING
}

/**
   * Called once for each key/value pair in the input split. Most applications
   * should override this, but the default is the identity function.
   */
@SuppressWarnings("unchecked")
protected void map(KEYIN key, VALUEIN value,
                     Context context) throws IOException, InterruptedException {
    context.write((KEYOUT) key, (VALUEOUT) value);
}

/**
   * Called once at the end of the task.
   */
protected void cleanup(Context context
                         ) throws IOException, InterruptedException {
    // NOTHING
}

/**
   * Expert users can override this method for more complete control over the
   * execution of the Mapper.
   * @param context
   * @throws IOException
   */
public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    while (context.nextKeyValue()) {
      map(context.getCurrentKey(), context.getCurrentValue(), context);
    }
    cleanup(context);
}
}

Reducer类：

public class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {

public class Context
    extends ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
    public Context(Configuration conf, TaskAttemptID taskid,
                   RawKeyValueIterator input,
                   Counter inputCounter,
                   RecordWriter<KEYOUT,VALUEOUT> output,
                   OutputCommitter committer,
                   StatusReporter reporter,
                   RawComparator<KEYIN> comparator,
                   Class<KEYIN> keyClass,
                   Class<VALUEIN> valueClass
                   ) throws IOException, InterruptedException {
      super(conf, taskid, input, inputCounter, output, committer, reporter,
            comparator, keyClass, valueClass);
    }
}

/**
   * Called once at the start of the task.
   */
protected void setup(Context context
                       ) throws IOException, InterruptedException {
    // NOTHING
}

/**
   * This method is called once for each key. Most applications will define
   * their reduce class by overriding this method. The default implementation
   * is an identity function.
   */
@SuppressWarnings("unchecked")
protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
                        ) throws IOException, InterruptedException {
    for(VALUEIN value: values) {
      context.write((KEYOUT) key, (VALUEOUT) value);
    }
}

/**
   * Called once at the end of the task.
   */
protected void cleanup(Context context
                         ) throws IOException, InterruptedException {
    // NOTHING
}

/**
   * Advanced application writers can use the
   * {@link #run(org.apache.hadoop.mapreduce.Reducer.Context)} method to
   * control how the reduce task works.
   */
public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    while (context.nextKey()) {
      reduce(context.getCurrentKey(), context.getValues(), context);
    }
    cleanup(context);
}
}

风声2012

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
MapReduce框架中map、reduce方法的运行方式

MapReduce程序中的map和reduce方法是重载Mapper类和Reducer类的map和reduce方法。MapReduce程序中的map和reduce方法在框架中默认是如下运行方式:针对一个对运行一次map方法或者reduce方法，其具体实现见package org.apache.hadoop.mapreduce包下的Mapper类和Reducer类。实现机制: Mapper类和Reducer类的run方法中将针对输入的所有对,循环执行map方法和reduce方法。请见代码Mapper类:pac
复制链接

扫一扫