spark函数讲解:aggregate

函数原型:

def
aggregate[U](zeroValue: U)(seqOp: (U, T) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): U
Aggregate the elements of each partition, and then the results for all the partitions, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of this RDD, T. Thus, we need one operation for merging a T into an U and one operation for merging two U's, as in scala.TraversableOnce. Both of these functions are allowed to modify and return their first argument instead of creating a new U to avoid memory allocation.
zeroValue
the initial value for the accumulated result of each partition for the seqOp operator, and also the initial value for the combine results from different partitions for the combOp operator - this will typically be the neutral element (e.g. Nil for list concatenation or 0 for summation)
seqOp
an operator used to accumulate results within a partition
combOp
an associative operator used to combine results from different partitions

aggregate函数将每个分区里面的元素进行聚合(seqOp),然后用combine函数将每个分区的结果和初始值(zeroValue)进行combine操作。这个函数最终返回的类型不需要和RDD中元素类型一致。

实例:

scala> def seqOP(a:Int, b:Int) : Int = {
     |     val r = a*b
     |     println("seqOp: " + a + "\t" + b+"=>"+r)
     |     r
     |   }
seqOP: (a: Int, b: Int)Int

scala>   def combOp(a:Int, b:Int): Int = {
     |     val r= a+b
     |     println("combOp: " + a + "\t" + b+"=>"+r)
     |     r
     |   }
combOp: (a: Int, b: Int)Int

scala> val z = sc. parallelize ( List (1 ,2 ,3 ,4 ,5 ,6) , 2)
z: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[9] at parallelize at <console>:27

scala> z. aggregate(3)(seqOP, combOp)
combOp: 3	18=>21
combOp: 21	360=>381
res20: Int = 381

计算流程:

1、对List(1,2,3,4,5,6)分区,分成(1,2,3)(4,5,6)

2、对(1,2,3)执行seqOp方法:

3(初始值)*1=>3

3(上轮计算结果)*2=>6

6*3=>18

     对(4,5,6)执行seqOp方法

3(初始值)*4=>12

12(上轮计算结果)*5=>60

60*6=>360

3、对分区结果惊醒combine操作

3(初始值)+18(分区结果)=>21

21(上轮计算结果)+360(分区结果) =>381

注意:

1、reduce函数和combine函数必须满足交换律(commutative)和结合律(associative)
2、从aggregate 函数的定义可知,combine函数的输出类型必须和输入的类型一致


本文参考:http://www.iteblog.com/archives/1268




  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值