关闭

Hadoop Mapreduce Mapper和Reducer源码

标签: Hadoop MapReduce
1498人阅读 评论(0) 收藏 举报
分类:

关键是里面run,setup,cleanup函数的作用!

Mapper


public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {

  /**
   * The <code>Context</code> passed on to the {@link Mapper} implementations.
   */
  public abstract class Context
    implements MapContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
  }
  
  /**
   * Called once at the beginning of the task.
   */
  protected void setup(Context context
                       ) throws IOException, InterruptedException {
    // NOTHING
  }

  /**
   * Called once for each key/value pair in the input split. Most applications
   * should override this, but the default is the identity function.
   */
  @SuppressWarnings("unchecked")
  protected void map(KEYIN key, VALUEIN value, 
                     Context context) throws IOException, InterruptedException {
    context.write((KEYOUT) key, (VALUEOUT) value);
  }

  /**
   * Called once at the end of the task.
   */
  protected void cleanup(Context context
                         ) throws IOException, InterruptedException {
    // NOTHING
  }
  
  /**
   * Expert users can override this method for more complete control over the
   * execution of the Mapper.
   * @param context
   * @throws IOException
   */
  public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    try {
      while (context.nextKeyValue()) {
        map(context.getCurrentKey(), context.getCurrentValue(), context);
      }
    } finally {
      cleanup(context);
    }
  }
}

Reducer

public class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {

  /**
   * The <code>Context</code> passed on to the {@link Reducer} implementations.
   */
  public abstract class Context 
    implements ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
  }

  /**
   * Called once at the start of the task.
   */
  protected void setup(Context context
                       ) throws IOException, InterruptedException {
    // NOTHING
  }

  /**
   * This method is called once for each key. Most applications will define
   * their reduce class by overriding this method. The default implementation
   * is an identity function.
   */
  @SuppressWarnings("unchecked")
  protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
                        ) throws IOException, InterruptedException {
    for(VALUEIN value: values) {
      context.write((KEYOUT) key, (VALUEOUT) value);
    }
  }

  /**
   * Called once at the end of the task.
   */
  protected void cleanup(Context context
                         ) throws IOException, InterruptedException {
    // NOTHING
  }

  /**
   * Advanced application writers can use the 
   * {@link #run(org.apache.hadoop.mapreduce.Reducer.Context)} method to
   * control how the reduce task works.
   */
  public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    try {
      while (context.nextKey()) {
        reduce(context.getCurrentKey(), context.getValues(), context);
        // If a back up store is used, reset it
        Iterator<VALUEIN> iter = context.getValues().iterator();
        if(iter instanceof ReduceContext.ValueIterator) {
          ((ReduceContext.ValueIterator<VALUEIN>)iter).resetBackupStore();        
        }
      }
    } finally {
      cleanup(context);
    }
  }
}


0
0
查看评论

mapreduce之mapper、reducer个数

这个图大概可以描述mapreduce计算模型的执行过程,下面我们就围绕这个图聊几个问题,其中有工作中非常有用的问题:1. mapper的个数 结论:mapper的个数是由输入数据的大小决定的,一般不需要我们去设置,如果你想控制mapper的个数,那么需要先了解hadoop是怎么控制mappe...
  • qq_30485769
  • qq_30485769
  • 2016-11-25 23:19
  • 1560

借助Streaming用三种语言编写MapReduce

借助Streaming用三种语言编写MapReduceStreaming的原理我就不介绍了,不过我也不是特别懂,我只知道Streaming会把标准输入带给Mapper和Reducer。想了解的具体可以看Hadoop官网。我只是做一个实战备忘啦。一.Python with Streaming 第一步...
  • sdankle
  • sdankle
  • 2016-04-08 22:59
  • 707

Hadoop之MapReduce程序分析

摘要:Hadoop之MapReduce程序包括三个部分:Mapper,Reducer和作业执行。本文介绍和分析MapReduce程序三部分结构。
  • wangloveall
  • wangloveall
  • 2014-06-05 05:55
  • 5208

hadoop mapreduce编程 内存溢出的解决办法

ZAI OutOfMemoryError: Java heap space
  • yaoshilu146
  • yaoshilu146
  • 2014-06-01 10:19
  • 2165

Hadoop Reducer类中的reduce方法不执行的原因

Reducer类中的reduce方法覆写时需要注意: Eclipse快捷键覆写这个方法的时候,Context的类型是org.apache.hadoop.mapreduce.Reducer.Context,而如果去掉@Override标签,就不会报错误,直接导致reduce这个方法就不会调用了,正确...
  • xuxiaocheng1
  • xuxiaocheng1
  • 2013-12-19 22:21
  • 1997

Hadoop-2.4.1学习之Mapper和Reducer

Hadoop-2.4.1中MapReduce作业的Mapper和Reducer综述
  • sky_walker85
  • sky_walker85
  • 2014-11-05 11:30
  • 5557

MapReduce编程之实现多表关联

多表关联和单表关联类似,它也是通过对原始数据进行一定的处理,从其中挖掘出关心的信息。如下 输入的是两个文件,一个代表工厂表,包含工厂名列和地址编号列;另一个代表地址表,包含地址名列和地址编号列。 要求从输入数据中找出工厂名和地址名的对应关系,输出工厂名-地址名表 样本如下: factory:...
  • sunlei1980
  • sunlei1980
  • 2015-06-23 10:40
  • 3609

mapreduce原理全剖析map+shuffle+reducer 全部过程

1.mapreduce原理全剖析map+shuffle+reducer  全部过程        wordcount为例      1.mapper开始运行,调用InputFo...
  • a920259310
  • a920259310
  • 2016-04-15 00:08
  • 2372

MapReduce中Mapper类和Reducer类4函数解析

Mapper类4个函数的解析 protected void setup(Mapper.Context context) throws IOException,InterruptedException //Called once at the beginning of the task ...
  • jtlyuan
  • jtlyuan
  • 2012-05-12 19:56
  • 3371

多个Mapper和Reducer的Job

多个Mapper和Reducer的Job@(Hadoop)对于复杂的mr任务来说,只有一个map和reduce往往是不能够满足任务需求的,有可能是需要n个map之后进行reduce,reduce之后又要进行m个map。在hadoop的mr编程中可以使用ChainMapper和ChainReducer...
  • qq1010885678
  • qq1010885678
  • 2016-02-22 21:21
  • 2361
    个人资料
    • 访问:604294次
    • 积分:7751
    • 等级:
    • 排名:第3263名
    • 原创:192篇
    • 转载:6篇
    • 译文:0篇
    • 评论:99条
    博客专栏
    文章分类
    最新评论