Hadoop中Partitioner解析

最新推荐文章于 2022-10-27 18:04:02 发布

s20082043

最新推荐文章于 2022-10-27 18:04:02 发布

阅读量576

点赞数

分类专栏： hadoop

本文链接：https://blog.csdn.net/s20082043/article/details/43490955

版权

hadoop 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Mapper最终生成的键值对<key,value> 需要送到Reducer进行合并,相同的key会送到同一个Reducer中,哪个key由哪个Reducer来处理的分配过程是由Partitioner规定的,Partitioner接口如下：

  public abstract class Partitioner<KEY, VALUE> {

/**
* Get the partition number for a given key (hence record) given the total
* number of partitions i.e. number of reduce-tasks for the job.
*
* <p>Typically a hash function on a all or a subset of the key.</p>
*
* @param key the key to be partioned.
* @param value the entry value.
* @param numPartitions the total number of partitions.
* @return the partition number for the <code>key</code>.
*/
public abstract int getPartition(KEY key, VALUE value, int numPartitions);

}

输入是Map的结果对<key, value>和Reducer的数目，输出则是分配的Reducer（整数编号）。就是指定Mappr输出的键值对到哪一个reducer上去。系统缺省的Partitioner是HashPartitioner，它以key的Hash值对Reducer的数目取模，得到对应的Reducer。这样保证如果有相同的key值，肯定被分配到同一个reducre上。如果有N个reducer，编号就为0,1,2,3……(N-1)。

JobContext.java中如下：

/**
* Get the {@link Partitioner} class for the job.
*
* @return the {@link Partitioner} class for the job.
*/
@SuppressWarnings("unchecked")
public Class<? extends Partitioner<?,?>> getPartitionerClass()
throws ClassNotFoundException {
return (Class<? extends Partitioner<?,?>>)
conf.getClass(PARTITIONER_CLASS_ATTR, HashPartitioner.class);
}

系统缺省的HashPartitioner.java实现如下：

public class HashPartitioner<K, V> extends Partitioner<K, V> {
/** Use {@link Object#hashCode()} to partition. */
public int getPartition(K key, V value,
int numReduceTasks) {
return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
}
}

可以继承Partitioner抽象类来实现自己的Partitioner对象MyParatitioner,通过job.setPartitionerClass(myParatitioner);来执行

s20082043

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hadoop中Partitioner解析

Mapper最终生成的键值对需要送到Reducer进行合并,相同的key会送到同一个Reducer中,哪个key由哪个Reducer来处理的分配过程是由Partitioner规定的,Partitioner接口如下： public abstract class Partitioner { /** * Get the partition number for a gi
复制链接

扫一扫