在这篇http://bit1129.iteye.com/blog/2186325博文中,分析了hash based shuffle write开启consolidationFiles选项的过程。本文,则关注将Iteratable
1. 如下代码是HashShuffleWriter.write方法
在将partition的数据写入到磁盘前,进行map端的shuffle
/** Write a bunch of records to this task's output */
override def write(records: Iterator[_ <: Product2[K, V]]): Unit = {
///对输入的partition对应Iteratable集合进行map端combine
val iter = if (dep.aggregator.isDefined) {
if (dep.mapSideCombine) { //如果定义了dep.aggregator以及dep.mapSideCombine则进行map端combine
dep.aggregator.get.combineValuesByKey(records, context)
} else {
records
}
} else {
require(!dep.mapSideCombine, "Map-side combine without Aggregator specified!")
records
}
for (elem <- iter) {
val bucketId = dep.partitioner.getPartition(elem._1)
shuffle.writers(bucketId).write(elem)
}
}
2. 调用dep.aggregator.get.combineValuesByKey(records, context)进行map端combine
其中aggregator是Aggregator类型的对象,它在构造时需要传入如下参数:
case class Aggrega