def reduceByKey(func: (V, V) => V): RDD[(K, V)]
def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]
def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)]
该函数用于将RDD[K,V]中每个K对应的V值根据映射函数来运算。
参数numPartitions用于指定分区数;
参数partitioner用于指定分区函数;
reduceByKey 算子示例
List<Tuple2<String,Integer>> list = Arrays.asList(
new Tuple2<String,Integer>("w1",1),
new Tuple2<String,Integer>("w2",2),
new Tuple2<String,Integer>("w3",3),
new Tuple2<String,Integer>("w2",22),
new Tuple2<String,Integer>("w1",11)
);
JavaPairRDD<String,Integer> pairRdd = javaSparkContext.parallelizePairs(list);
JavaPairRDD<String,Integer> result = pairRdd.reduceByKey(new Function2<Integer, Integer, Integer>() {
@Override
public Integer call(Integer integer, Integer integer2) throws Exception {
return integer+integer2;
}
},2);
System.out.println(result.collect());
//返回的结果 [(w3,3), (w1,12), (w2,24)]