Hadoop mapreduce 中partition作用

最新推荐文章于 2024-09-07 14:32:55 发布

chengzhewang

最新推荐文章于 2024-09-07 14:32:55 发布

阅读量2.2k

点赞数

分类专栏： Hadoop 文章标签： hadoop mapreduce

本文链接：https://blog.csdn.net/chengzhewang/article/details/37653379

版权

Hadoop 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

1.其中mapreduce中map输出的数据会根据不同的key输入到不同的reduce，是怎么判断不同key输入到不同的reduce的呢？

就是通过partition,在缺省的情况下默认HashPartitioner,它通过对key取hash值，然后与reduce的数目取模，然后就可以判断该key及对应的valuelist应该去哪个reduce上进行处理。

<span style="font-size:14px;">public interface Partitioner<K2, V2> extends JobConfigurable {
  /** 
   * Get the paritition number for a given key (hence record) given the total 
   * number of partitions i.e. number of reduce-tasks for the job.
   *   
   * <p>Typically a hash function on a all or a subset of the key.</p>
   *
   * @param key the key to be paritioned.
   * @param value the entry value.
   * @param numPartitions the total number of partitions.
   * @return the partition number for the <code>key</code>.
   */
  int getPartition(K2 key, V2 value, int numPartitions);
}</span>

Hadoop将会根据getPartitioner返回的值判断mapper的值将会被发到哪个reducer上，返回值相同的Key/value将会被输出到同一个reducer上。