1、fold()
函数原型:fold(self, zeroValue, op)
示例:求序列[1,2,3,4,5]的元素累加和
>>> nums = sc.parallelize([1,2,3,4,5])
>>> sumCnt = nums.fold(0, lambda x, y: x + y)
>>> print sumCnt
15
zeroValue意义:1、初值;2、保存中间结果
执行累加过程分解:
1、[1,2,3,4,5], zeroValue = 0
2、currentVal = 1, zeroValue = 0
3、currentVal = 2, zeroValue = 1
4、currentVal = 3, zeroValue = 3
5、currentVal = 4, zeroValue = 6
6、sumCnt = 4 + 6 = 10
2、aggregate()
函数原型:aggregate(self, zeroValue, seqOp, combOp)
seqOp:针对每个分区(节点)的操作函数
combOp:在seqOp对每个分区操作完成之后,将每个分区的结果进行整合,从而求出最后的结果
示例:求序列[1,2,3,4,5]的均值
>>> nums = sc.parallelize([1,2,3,4,5])
>>> sumCnt = nums.aggregate((0, 0), (lambda partSumAndNum, zeroVal: (partSumAndNum[0] + zeroVal, partSumAndNum[1] + 1)), (lambda part1Ret, part2Ret: (part1Ret[0] + part2Ret[0], part1Ret[1] + part2Ret[1])))
>>> print sumCnt[0] / float(sumCnt[1])
3.0
partSumAndNum:某分区(节点)的元素累加和以及元素个数,如part1的元素序列为[1,2,3,4,5],则part1的partSumAndNum=(15, 5)