初学mapreduce源码分析
reduce
reducetask.run();
在类reducetask中run方法:由yarnchild调用
run方法中:RawKeyValueIterator rIter = null;此类为迭代器reducer类中的reduce方法中参数(key , Iteractor values,context)中的 Iteractor values也是一个一个的keyvalue传过来的
run方法中:ShuffleConsumerPlugin shuffleConsumerPlugin = null;Class<? extends ShuffleConsumerPlugin> clazz =
job.getClass(MRConfig.SHUFFLE_CONSUMER_PLUGIN, Shuffle.class, ShuffleConsumerPlugin.class);
shuffleConsumerPlugin = ReflectionUtils.newInstance(clazz, job);负责洗牌、可以自定义洗牌规则
Shuffle.class中的run方法
Start the map-completion events fetcher thread:eventFetcher.start();map任务结束后启动提取线程
Start the map-output fetcher threads
Wait for shuffle to complete successfully
Class keyClass = job.getMapOutputKeyClass();
Class valueClass = job.getMapOutputValueClass();
RawComparator comparator = job.getOutputValueGroupingComparator();
if (useNewApi) {
runNewReducer(job, umbilical, reporter, rIter, comparator,
keyClass, valueClass);
}
RawComparator comparator = job.getOutputValueGroupingComparator();分组比较器的自定义;作用定义怎么分组setGroupingComparatorClass(Class<? extends RawComparator>):比如默认分组相同key为1组:此类的比较器issamekey决定是否适用1个reduce方法:
此代码可实现RawComparator getOutputValueGroupingComparator() {
Class<? extends RawComparator>