- foldByKey(zeroValue: V, numPartitions: Int)(func: (V, V) => V): RDD[(K, V)]
- foldByKey(zeroValue: V)(func: (V, V) => V): RDD[(K, V)]
- foldByKey(zeroValue: V, partitioner: Partitioner)(func: (V, V) => V): RDD[(K, V)]
foldByKey操作作用于RDD[K,V]根据K将V做折叠、合并处理,其中的参数zeroValue表示先根据映射函数将zeroValue应用与V,进行初始化V,在将映射函数应用于初始化后的V。
scala> val rdd1 = sc.makeRDD(Array(
| ("A", 1), ("A", 2), ("B", 1), ("B", 2), ("C", 1)
| ))
rdd1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[12] at makeRDD at <console>:24
scala> /**
| * rdd1中每个key对应的V进行累加,注意zeroValue=0,需要先初始化V,映射函数为+操作
| * 比如,("A", 1), ("A", 2),先将zeroValue应用于每个V,得到("A", 1+0), ("A", 2+),
| * 即,("A", 1), ("A", 2),在将映射函数应用于初始化后的V,最后得到("A", 1+@), ("A", 3)
| */
| rdd1.foldByKey(0)(_+_).collect()
res14: Array[(String, Int)] = Array((B,3), (A,3), (C,1))
//映射函数为乘法时,zeroValue需设置成1
rdd1.foldByKey(1)(_*_).collect
res16: Array[(String, Int)] = Array((B,2), (A,2), (C,1))