MapReduce编程综合应用

最新推荐文章于 2022-12-25 22:56:43 发布

皓洲

最新推荐文章于 2022-12-25 22:56:43 发布

阅读量513

点赞数

分类专栏：大数据笔记

本文链接：https://blog.csdn.net/weixin_43911945/article/details/115794845

版权

MapReduce编程综合应用

实验环境

VMware虚拟机（CentOS 7系统）
Hadoop

数据

现有一份汽车销售记录，销售记录【包括时间、地点、邮政编码、车辆类型等信息，每条记录信息包含39项数据项】。

实验内容

请利用MapReduce框架，编写程序实现如下功能：

统计不同车型销售的年龄段分布情况，并分别按照车型和年龄段进行汇总（不考虑排序）。

注意：年龄段每10岁为1个年龄段（0_10、1120、21~30…）

输出格式参考如下：

车型1,年龄段1,300

…

车型1,年龄段2,300

…

车型2,年龄段1,300

…

车型1,小计,1800

…

小计,年龄段1,1800

…

思路

可以观察到车型和年龄段应该作为一组key，所以我们需要自定义数据对把车型和年龄段合起来。

我们开可以看到最后输出了车型的小计、不同年龄段的小计。这个我们放在最后输出，需要调用Reducer的其他函数，观察一下源码。

@Checkpointable
@InterfaceAudience.Public
@InterfaceStability.Stable
public class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
   

  /**
   * The <code>Context</code> passed on to the {@link Reducer} implementations.
   */
  public abstract class Context 
    implements ReduceContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {
   
  }

  /**
   * Called once at the start of the task.
   */
  protected void setup(Context context
                       ) throws IOException, InterruptedException {
   
    // NOTHING
  }

  /**
   * This method is called once for each key. Most applications will define
   * their reduce class by overriding this method. The default implementation
   * is an identity function.
   */
  @SuppressWarnings("unchecked")
  protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
                        ) throws IOException, InterruptedException {
   
    for(VALUEIN value: values) {
   
      context.write((KEYOUT) key, (VALUEOUT) value);
    }
  }

  /**
   * Called once at the end of the task.
   */
  protected void cleanup(Context context
                         ) throws IOException, InterruptedException {
   
    // NOTHING
  }

  /**
   * Advanced application writers can use the 
   * {@link #run(org.apache.hadoop.mapreduce.Reducer.Context)} method to
   * control how the reduce task works.
   */
  public void run(Context context) throws IOException, InterruptedException {
   
    setup(context);
    try {
   
      while (context.nextKey()) {
   
        reduce(context.getCurrentKey(), context.getValues(), context);
        // If a back up store is used, reset it
        Iterator<VALUEIN> iter = context.getValues().iterator();
        if(iter

最低0.47元/天解锁文章

皓洲

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
MapReduce编程综合应用

MapReduce编程综合应用实验环境VMware虚拟机（CentOS 7系统）Hadoop数据现有一份汽车销售记录，销售记录【包括时间、地点、邮政编码、车辆类型等信息，每条记录信息包含39项数据项】。实验内容请利用MapReduce框架，编写程序实现如下功能：统计不同车型销售的年龄段分布情况，并分别按照车型和年龄段进行汇总（不考虑排序）。注意：年龄段每10岁为1个年龄段（010、1120、21~30…）输出格式参考如下：车型1,年龄段1,300 车
复制链接

扫一扫