fold
是一个action算子,也是使用了局部聚合然后在全局汇总,调用了TraversableOnce.foldLeft方法
/**
* Aggregate the elements of each partition, and then the results for all the partitions, using a
* given associative function and a neutral "zero value". The function
* op(t1, t2) is allowed to modify t1 and return it as its result value to avoid object
* allocation; however, it should not modify t2.
*
* This behaves somewhat differently from fold operations implemented for non-distributed
* collections in functional languages like Scala. This fold operation may be applied to
* partitions individually, and then fold those results into the final result, rather than
* apply the fold to each element sequentially in some defined ordering. For functions
* that are not commutative, the result may differ from that of a fold applied to a
* non-distributed collection.
*
* @param zeroValue the initial value for the accumulated result of each partition for the `op`
* operator, and also the initial value for the combine results from different
* partitions for the `op` operator - this will typically be the neutral
* element (e.g. `Nil` for list concatenation or `0` for summation)
* @param op an operator used to both accumulate results within a partition and combine results
* from different partitions
*/
def fold(zeroValue: T)(op: (T, T) => T): T = withScope {
// Clone the zero value since we will also be serializing it as part of tasks
var jobResult = Utils.clone(zeroValue, sc.env.closureSerializer.newInstance())
val cleanOp = sc.clean(op)
//TODO: 局部聚合使用的函数 iter.fold里面调用了TraversableOnce.foldLeft方
val foldPartition = (iter: Iterator[T]) => iter.fold(zeroValue)(cleanOp)
// TODO:全局聚合使用的算子
val mergeResult = (_: Int, taskResult: T) => jobResult = op(jobResult, taskResult)
//TODO:调用了runJob 可以说明是action 算子
sc.runJob(this, foldPartition, mergeResult)
jobResult
}
sum
DoubleRDDFunctions 类
里面使用了fold函数,初始化为0
def sum(): Double = self.withScope {
self.fold(0.0)(_ + _)
}