spark常用RDD算子 - reduceByKey

最新推荐文章于 2024-06-02 21:22:03 发布

小哇666

最新推荐文章于 2024-06-02 21:22:03 发布

阅读量1k

点赞数 1

分类专栏： # spark 文章标签： spark

本文链接：https://blog.csdn.net/qq_41712271/article/details/107748257

版权

spark 专栏收录该内容

76 篇文章 0 订阅

订阅专栏

def reduceByKey(func: (V, V) => V): RDD[(K, V)]

def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]

def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)]

该函数用于将RDD[K,V]中每个K对应的V值根据映射函数来运算。

参数numPartitions用于指定分区数；

参数partitioner用于指定分区函数；

reduceByKey 算子示例

List<Tuple2<String,Integer>> list = Arrays.asList(
                new Tuple2<String,Integer>("w1",1),
                new Tuple2<String,Integer>("w2",2),
                new Tuple2<String,Integer>("w3",3),
                new Tuple2<String,Integer>("w2",22),
                new Tuple2<String,Integer>("w1",11)
        );

JavaPairRDD<String,Integer> pairRdd = javaSparkContext.parallelizePairs(list);

JavaPairRDD<String,Integer> result = pairRdd.reduceByKey(new Function2<Integer, Integer, Integer>() {
            @Override
            public Integer call(Integer integer, Integer integer2) throws Exception {
                return integer+integer2;
            }
        },2);

System.out.println(result.collect());
//返回的结果 [(w3,3), (w1,12), (w2,24)]

小哇666

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
spark常用RDD算子 - reduceByKey

def reduceByKey(func: (V, V) => V): RDD[(K, V)]def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)]该函数用于将RDD[K,V]中每个K对应的V值根据映射函数来运算。参数numPartitions用于指定
复制链接

扫一扫

专栏目录