1.其中mapreduce中map输出的数据会根据不同的key输入到不同的reduce,是怎么判断不同key输入到不同的reduce的呢?
就是通过partition,在缺省的情况下默认HashPartitioner,它通过对key取hash值,然后与reduce的数目取模,然后就可以判断该key及对应的valuelist应该去哪个reduce上进行处理。
<span style="font-size:14px;">public interface Partitioner<K2, V2> extends JobConfigurable {
/**
* Get the paritition number for a given key (hence record) given the total
* number of partitions i.e. number of reduce-tasks for the job.
*
* <p>Typically a hash function on a all or a subset of the key.</p>
*
* @param key the key to be paritioned.
* @param value the entry value.
* @param numPartitions the total number of partitions.
* @return the partition number for the <code>key</code>.
*/
int getPartition(K2 key, V2 value, int numPartitions);
}</span>
Hadoop将会根据getPartitioner返回的值判断mapper的值将会被发到哪个reducer上,返回值相同的Key/value将会被输出到同一个reducer上。