aggregateByKey
aggregateByKey的用法同combineByKey,针对combineByKey的三个参数:
createCombiner: V => C,mergeValue: (C, V) => C,mergeCombiners: (C, C) => C
将createCombiner: V => C替换成一个初始值 C ,相当于aggregateByKey的三个参数为:
zeroValue: C,mergeValue: (C, V) => C,mergeCombiners: (C, C) => C
注意—>>:
需要注意的是,zeroValue这个值一般要求置为,0、“”、Nil。
因为最终的合并结果和分区个数有关。
mergeValue是针对每一个分区进行合并,每个分区都会调用一下初始值zeroValue;
如果初始值zeroValue非空,会导致最终合并每一个分区的值:mergeCombiners的合并结果不同。
源码
/**
* 底层同样调用的是 combineByKeyWithClassTag
*/
def aggregateByKey[U: ClassTag](zeroValue: U, partitioner: Partitioner)(seqOp: (U, V) => U,
combOp: (U, U) => U): RDD[(K, U)] = self.withScope {
...
combineByKeyWithClassTag[U]((v: V) => cleanedSeqOp(createZero(), v),
cleanedSeqOp, combOp, partitioner)
}
def aggregateByKey[U](zeroValue: U)(seqOp: (U, V) => U, combOp: (U, U) => U): RDD[(K, U)]
def aggregateByKey[U](zeroValue: U, numPartitions: Int)(seqOp: (U, V) => U, combOp: (U, U) => U): RDD[(K, U)]
def aggregateByKey[U](zeroValue: U, partitioner: Partitioner)(seqOp: (U, V) => U, combOp: (U, U) => U): RDD[(K, U)]
案例同combineByKey
请参考:Spark算子[08]:combineByKey详解
Scala实战案例
/***/
def avgScore(): Unit = {