reduceByKey、foldByKey、aggregateByKey、combineByKey

老大:combineBykey

有初始值,并且初始值还支持改变数据结构,最灵活

combineByKeyWithClassTag(createCombiner, mergeValue, mergeCombiners,
      partitioner, mapSideCombine, serializer)(null)
// 3.4 使用combinebykey求平均值
    val list: List[(String, Int)] = List(("a", 88), ("b", 95), ("a", 91), ("b", 93), ("a", 95), ("b", 98))
    val rdd4: RDD[(String, Int)] = sc.makeRDD(list, 2)

    val value4: RDD[(String, (Int, Int))] = rdd4.combineByKey(
      i => (i, 1),
      (res: (Int, Int), elem: Int) => (res._1 + elem, res._2 + 1),
      (res1: (Int, Int), res2: (Int, Int)) => (res1._1 + res1._1, res2._2 + res1._2)
    )
    value4.collect().foreach(println)

    value4.mapValues({
      case (sum, count) => sum.toDouble / count
    }).collect().foreach(println)

老二:aggregateByKey

有初始值,分区内和分区间计算逻辑还可以变,很灵活

combineByKeyWithClassTag[U]((v: V) => cleanedSeqOp(createZero(), v),
      cleanedSeqOp, combOp, partitioner)
// 3.3 使用aggregateByKey求
    val rdd3: RDD[(String, Int)] = sc.makeRDD(List(("a", 1), ("a", 3), ("b", 5), ("b", 7), ("b", 2), ("b", 4), ("b", 6), ("a", 7)), 2)
    // 3.3 取出每个分区相同key对应值的最大值,然后相加,
    rdd3.aggregateByKey(Int.MinValue)((res: Int, elem: Int) => math.max(res, elem)
      , (res: Int, elem: Int) => res + elem)
      .collect().foreach(println)

    // 3.3 使用combineByKey来写
    rdd3.combineByKey(
      // combinebykey 中的默认值,是从集合中的值进行的改变,会占据集合中的一个元素
      //      i => Int.MinValue,
      //      i => i - Int.MinValue,
      i => i,
      (res: Int, elem: Int) => math.max(res, elem),
      (res: Int, elem: Int) => res + elem
    ).collect().foreach(println)

老三:foldByKey

有初始值,分区内和分区间计算逻辑一致

combineByKeyWithClassTag[V]((v: V) => cleanedFunc(createZero(), v),
      cleanedFunc, cleanedFunc, partitioner)
// 3.2 使用foldByKey
    //3.1 创建第一个RDD
    val list1: List[(String, Int)] = List(("a", 1), ("a", 3), ("a", 5), ("b", 7), ("b", 2), ("b", 4), ("b", 6), ("a", 7))
    val rdd2 = sc.makeRDD(list1, 2)

    //3.2 求wordcount
    rdd2.foldByKey(0)(_ + _).collect().foreach(println)
    //    rdd2.foldByKey(10)(_ + _).collect().foreach(println)

    // 3.2 使用combineByKey
    rdd2.combineByKey(
      i => i,
      (res: Int, elem: Int) => res + elem,
      (res: Int, elem: Int) => res + elem
    ).collect().foreach(println)

    rdd2.combineByKey(
      i => i + 10,
      (res: Int, elem: Int) => res + elem,
      (res: Int, elem: Int) => res + elem
    ).collect().foreach(println)

老四:reduceByKey

没有初始值,分区内和分区间计算逻辑一致

combineByKeyWithClassTag[V]((v: V) => v, func, func, partitioner)
//3.1.1 创建RDD
    val rdd1 = sc.makeRDD(List(("a", 1), ("b", 5), ("a", 5), ("b", 2)))

    //3.1.2 计算相同key对应值的相加结果
    rdd1.reduceByKey(_ + _).collect().foreach(println)

    //3.1.3 使用combineByKey来编写
    rdd1.combineByKey(
      i => i,
      (res: Int, elem: Int) => res + elem,
      (res: Int, elem: Int) => res + elem
    ).collect().foreach(println)

reduceByKey和foldByKey和aggregateByKey都可以用combineByKey来写出来。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-EKhiN8Nc-1635516704468)(https://secure.wostatic.cn/static/pRjfed9mEp3tY893wU6Nga/image.png)]

package com.huc.Spark1.KeyAndValue

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

object Test05_ {
  def main(args: Array[String]): Unit = {
    //1.创建SparkConf并设置App名称
    val conf: SparkConf = new SparkConf().setAppName("SparkCore").setMaster("local[*]")

    //2.创建SparkContext,该对象是提交Spark App的入口
    val sc: SparkContext = new SparkContext(conf)

    //3.使用Scala进行spark编程

    //3.1.1 创建RDD
    val rdd1 = sc.makeRDD(List(("a", 1), ("b", 5), ("a", 5), ("b", 2)))

    //3.1.2 计算相同key对应值的相加结果
    rdd1.reduceByKey(_ + _).collect().foreach(println)

    //3.1.3 使用combineByKey来编写
    rdd1.combineByKey(
      i => i,
      (res: Int, elem: Int) => res + elem,
      (res: Int, elem: Int) => res + elem
    ).collect().foreach(println)


    println("+++++++++++++++++++++++++++++++")
    // 3.2 使用foldByKey
    //3.1 创建第一个RDD
    val list1: List[(String, Int)] = List(("a", 1), ("a", 3), ("a", 5), ("b", 7), ("b", 2), ("b", 4), ("b", 6), ("a", 7))
    val rdd2 = sc.makeRDD(list1, 2)

    //3.2 求wordcount
    rdd2.foldByKey(0)(_ + _).collect().foreach(println)
    //    rdd2.foldByKey(10)(_ + _).collect().foreach(println)

    // 3.2 使用combineByKey
    rdd2.combineByKey(
      i => i,
      (res: Int, elem: Int) => res + elem,
      (res: Int, elem: Int) => res + elem
    ).collect().foreach(println)

    rdd2.combineByKey(
      i => i + 10,
      (res: Int, elem: Int) => res + elem,
      (res: Int, elem: Int) => res + elem
    ).collect().foreach(println)


    println("_____________________________")
    // 3.3 使用aggregateByKey求
    val rdd3: RDD[(String, Int)] = sc.makeRDD(List(("a", 1), ("a", 3), ("b", 5), ("b", 7), ("b", 2), ("b", 4), ("b", 6), ("a", 7)), 2)
    // 3.3 取出每个分区相同key对应值的最大值,然后相加,
    rdd3.aggregateByKey(Int.MinValue)((res: Int, elem: Int) => math.max(res, elem)
      , (res: Int, elem: Int) => res + elem)
      .collect().foreach(println)

    // 3.3 使用combineByKey来写
    rdd3.combineByKey(
      // combinebykey 中的默认值,是从集合中的值进行的改变,会占据集合中的一个元素
      //      i => Int.MinValue,
      //      i => i - Int.MinValue,
      i => i,
      (res: Int, elem: Int) => math.max(res, elem),
      (res: Int, elem: Int) => res + elem
    ).collect().foreach(println)


    println("*&************************")
    // 3.4 使用combinebykey求平均值
    val list: List[(String, Int)] = List(("a", 88), ("b", 95), ("a", 91), ("b", 93), ("a", 95), ("b", 98))
    val rdd4: RDD[(String, Int)] = sc.makeRDD(list, 2)

    val value4: RDD[(String, (Int, Int))] = rdd4.combineByKey(
      i => (i, 1),
      (res: (Int, Int), elem: Int) => (res._1 + elem, res._2 + 1),
      (res1: (Int, Int), res2: (Int, Int)) => (res1._1 + res1._1, res2._2 + res1._2)
    )
    value4.collect().foreach(println)

    value4.mapValues({
      case (sum, count) => sum.toDouble / count
    }).collect().foreach(println)


    //4.关闭连接
    sc.stop()
  }
}

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值