MapReduce学习2-2：Mapper类和Reducer类简易解析

最新推荐文章于 2022-11-22 15:41:10 发布

愿你被这个世界温暖相待

最新推荐文章于 2022-11-22 15:41:10 发布

阅读量1.2k

点赞数

分类专栏： # MapReduce基础文章标签： mapreduce hadoop big data

本文链接：https://blog.csdn.net/qq_43967413/article/details/122012875

版权

MapReduce基础专栏收录该内容

8 篇文章 3 订阅

订阅专栏

- 1 Mapper类解析
- 2 Reducer类解析

-创建一个MapReduce程序实际是一个插件开发的过程，它通过继承Mapper类和Reducer类实现Map过程和Reduce过程接下来，在MapReduce学习2-1：以官方wordcount实例为例的MapReduce程序学习中进行了一个简易的MapReduce程序的编写，本文就解析Mapper类和Reducer类进一步分析

1 Mapper类解析

package org.apache.hadoop.mapreduce;

import java.io.IOException;

import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.RawComparator;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.mapreduce.task.MapContextImpl;


@InterfaceAudience.Public
@InterfaceStability.Stable
public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {

  /**
   * The <code>Context</code> passed on to the {@link Mapper} implementations.
   */
  public abstract class Context
    implements MapContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
  }
  
  /**
   * MapTask任务开始前调用
   */
  protected void setup(Context context
                       ) throws IOException, InterruptedException {
    // NOTHING
  }

  /**
   * 每处理一组KV就会调用一次map，在wordcount实例中也就是每处理一行就会调用一次map方法
   * 一般应该重写该方法
   */
  @SuppressWarnings("unchecked")
  protected void map(KEYIN key, VALUEIN value, 
                     Context context) throws IOException, InterruptedException {
    context.write((KEYOUT) key, (VALUEOUT) value);
  }

  /**
   * 在MapTask任务结束之前调用
   */
  protected void cleanup(Context context
                         ) throws IOException, InterruptedException {
    // NOTHING
  }
  
  /**
   * 可以重写该方法来对整个Mapper过程有更完整的控制
   * 通过下属的调用过程可以看到,setup方法以及cleanup方法的调用时机都是在该方法内控制的
   */
  public void run(Context context) throws IOException, InterruptedException {
    setup(context);  // 任务开始前执行
    try {
      while (context.nextKeyValue()) {  // 是否有下一个值，例如一行数据的
        // 有下一个值就调用map处理
        map(context.getCurrentKey(), context.getCurrentValue(), context);  
      }
    } finally {
      cleanup(context);  // 任务结束前执行
    }
  }
}

2 Reducer类解析



package org.apache.hadoop.mapreduce;

import java.io.IOException;

import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.task.annotation.Checkpointable;

import java.util.Iterator;


@Checkpointable
@InterfaceAudience.Public
@InterfaceStability.Stable
public class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {

  /**
   * The <code>Context</code> passed on to the {@link Reducer} implementations.
   */
  public abstract class Context 
    implements ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
  }

  /**
   * ReduceTask调用之前调用
   */
  protected void setup(Context context
                       ) throws IOException, InterruptedException {
    // NOTHING
  }

  /**
   * 每个汇总的kv对都需要调用一次reduce方法，汇总的意思是他每次调用reduce方法的时候
   * 不是那Map过程写过来的数据的顺序，例如wordcount案例中，map过程传过来的就是每行
   * 处理的单词，但是在reduce方法调用前会对这些单词进行排序，也就是按单词的字母会进行
   * 一个排序，例如boy apple是一行，虽然boy比apple先传过来，但是实际apple先调用redcue方法，并且会自觉进行汇总，也就是同一个键，不同的值，不会调用多次reduce方法，
   * 而是以一个迭代器values汇总所有的值
   * is an identity function.
   */
  @SuppressWarnings("unchecked")
  protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
                        ) throws IOException, InterruptedException {
    for(VALUEIN value: values) {
      context.write((KEYOUT) key, (VALUEOUT) value);
    }
  }

  /**
   * ReduceTask结束之前调用
   */
  protected void cleanup(Context context
                         ) throws IOException, InterruptedException {
    // NOTHING
  }

  /**
   * 重写该方法可以全局控制Reducer的完整构成
   * 
   * 
   */
  public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    try {
      while (context.nextKey()) {  // 是否下一个键值对
        reduce(context.getCurrentKey(), context.getValues(), context);
        // If a back up store is used, reset it
        Iterator<VALUEIN> iter = context.getValues().iterator();
        if(iter instanceof ReduceContext.ValueIterator) {
          ((ReduceContext.ValueIterator<VALUEIN>)iter).resetBackupStore();        
        }
      }
    } finally {
      cleanup(context);
    }
  }
}

愿你被这个世界温暖相待

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
MapReduce学习2-2：Mapper类和Reducer类简易解析

1 Mapper类解析2 Reducer类解析-创建一个MapReduce程序实际是一个插件开发的过程，它通过继承Mapper类和Reducer类实现Map过程和Reduce过程接下来，在MapReduce学习2-1：以官方wordcount实例为例的MapReduce程序学习中进行了一个简易的MapReduce程序的编写，本文就解析Mapper类和Reducer类进一步分析1 Mapper类解析package org.apache.hadoop.mapreduce;import java..
复制链接

扫一扫