hadoop一些基本知识——Hadoop reducer类的阅读

最新推荐文章于 2021-05-29 11:22:58 发布

差点儿90后

最新推荐文章于 2021-05-29 11:22:58 发布

阅读量926

点赞数

分类专栏： ubuntu 文章标签： hadoop

ubuntu 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

在Hadoop的reducer类中，有3个主要的函数，分别是：setup，clearup，reduce。代码如下：

  /**
   * Called once at the start of the task.
   */
  protected void setup(Context context
                       ) throws IOException, InterruptedException {
    // NOTHING
  }


  /**
   * This method is called once for each key. Most applications will define
   * their reduce class by overriding this method. The default implementation
   * is an identity function.
   */
  @SuppressWarnings("unchecked")
  protected void reduce(KEYIN key, Iterable<VALUEIN> values, Context context
                        ) throws IOException, InterruptedException {
    for(VALUEIN value: values) {
      context.write((KEYOUT) key, (VALUEOUT) value);
    }
  }

  /**
   * Called once at the end of the task.
   */
  protected void cleanup(Context context
                         ) throws IOException, InterruptedException {
    // NOTHING
  }

在用户的应用程序中调用到reducer时，会直接调用reducer里面的run函数，其代码如下：

/*
   * control how the reduce task works.
   */
  @SuppressWarnings("unchecked")
  public void run(Context context) throws IOException, InterruptedException {
    setup(context);
    while (context.nextKey()) {
      reduce(context.getCurrentKey(), context.getValues(), context);
      // If a back up store is used, reset it
      ((ReduceContext.ValueIterator)
          (context.getValues().iterator())).resetBackupStore();
    }
    cleanup(context);
  }
}

由上面的代码，我们可以了解到，当调用到reduce时，通常会先执行一个setup函数，最后会执行一个cleanup函数。而默认情况下，这两个函数的内容都是nothing。因此，当reduce不符合应用要求时，可以试着通过增加setup和cleanup的内容来满足应用的需求。