RDD之aggregate

定义

定义可参考RDD的API

aggregate[U](zeroValue: U)(seqOp: (U, T) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: ClassTag[U]): U
Aggregate the elements of each partition, and then the results for all the partitions, using given combine functions and a neutral “zero value”. This function can return a different result type, U, than the type of this RDD, T. Thus, we need one operation for merging a T into an U and one operation for merging two U’s, as in scala.TraversableOnce. Both of these functions are allowed to modify and return their first argument instead of creating a new U to avoid memory allocation.
zeroValue
the initial value for the accumulated result of each partition for the seqOp operator, and also the initial value for the combine results from different partitions for the combOp operator - this will typically be the neutral element (e.g. Nil for list concatenation or 0 for summation)
seqOp
an operator used to accumulate results within a partition
combOp
an associative operator used to combine results from different partitions

实验1-熟悉使用

api讲的比较清楚了,该函数用来聚集每个分区的元素,并用合并函数和zeroValue来聚集分区结果。并给予我们两个函数,seqOp和CombOp

实验程序

打开spark-shell,我们执行实验1(当复制并粘贴以下代码实验时请将注释去掉

//该函数用来将每个分区的index展示出来
def myfunc[T](index:Int,iter:Iterator[T]):Iterator[(Int,T)]={
var res = List[(Int,T)]()
for(x<-iter)
res.::=(index,x)
res.iterator
}
val data = sc.parallelize(1 to 10,3)
data.mapPartitionsWithIndex(myfunc).collect
data.aggregate(0)((a,b)=>if(a>b) a else b ,_+_)

实验结果

实验1结果

结果分析

结果分析

实验2-zeroValue

api讲解如下:zeroValue值为seqOp函数的初始值,同时也是combOp函数的初始值。

实验程序

打开spark-shell,我们执行实验2(当复制并粘贴以下代码实验时请将注释去掉

//seqOp函数
def seqOp(arg1:Int,arg2:Int):Int={
var res:Int=arg2
if(arg1>arg2)
res=arg1
println("seqOp:"+arg1+","+arg2+"=>"+res)
res
}
//combOp函数
def combOp(arg1:Int,arg2:Int):Int={
println("combOp:"+arg1+","+arg2+"=>"+(arg1+arg2))
arg1+arg2
}
//将每个分区index显示出来
def myfunc[T](index:Int,iter:Iterator[T]):Iterator[(Int,T)]={
var res = List[(Int,T)]()
for(x<-iter)
res.::=(index,x)
res.iterator
}
val data = sc.parallelize(1 to 10,3)
data.mapPartitionsWithIndex(myfunc).collect
data.aggregate(11)(seqOp,combOp)

实验结果

实验结果

结果分析

当然,该实验的zeroValue取值比较极端,大家可换成5或者6试一试
结果分析


参考博客:
[1]:http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html#aggregate
[2]:http://www.iteblog.com/archives/1268

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值