reduceByKey、foldByKey、aggregateByKey、combineByKey之间的区别和联系
源码主要逻辑函数比较
算子 | 源码 |
---|---|
reduceByKey( ) | combineByKeyWithClassTag[V] ((v:V)=>v,func,func) |
foldByKey( )( ) | combineByKeyWithClassTag[V] ((v:V)=>cleanedFunc(createZero(),v),cleanedFunc,cleanedFunc) |
aggregateByKey( )( , ) | combineByKeyWithClassTag[V] ((v:V)=>cleanedSeqOp(createZero(),v),cleanedSeqOp,combOp) |
combineByKey( , , ) | combineByKeyWithClassTag(createCombiner,mergeValue,mergeCombiners) |
灵活性:
reduceByKey < foldByKey < aggregateByKey < combineByKey
都有一个默认参数:mapSideCombine=true,即都在map端进行的combine操作,进行了提前的预聚合。