深入理解 spark api --- Aggregate

Deep understand Aggregate funcation of Spark API
 
val z = sc.parallelize(List("a","b","c","d","e","f"), 3)
println("result:"+z.aggregate("X")((x,y)=> "1"+x+y ,(x,y)=>"2"+ x+y))
 
step 1 :Data slice
   result :
   a,b  --Partition 1
   c,d  --Partition 2
   e,f  --Partition 3
step 2 :deal each Partition by distributed
      Partition 1 deal detail
      step 2.1 deal the first element of Partition 1 with the function of "1" + x + y   ( x = zero value , y = first element of Partition 1)
       result : 1 + x + y = 1 + X + a =  1Xa
      step 2.2 deal the second element of Partition 1 with the function of "1" + x + y   ( x = the result of last step , y = second element of Partition 1)
        result :  1 + x + y = 1 + 1Xa + b =  11Xab
     Recursive each  element of this Partition. If the partition has 3th  element Z ,the final result of this partition is 111XabZ
    
     By the same token,
      Partition 2  = 11Xcd
      Partition 3  = 11Xef
     
step 3 : deal the results of each Partition
      step 1 : deal the result of Partition 1 with function of "2"+ x+y  ( x =  zero value , y = the result of Partition 1)
      result 1 = "2"+ x+y = 2 + X + 11Xab = 2X11Xab
     
     step 2 : deal the result of Partition 2 with function of "2"+ x+y  ( x =  the result of last step , y = the result of Partition 2)
      result 2 = "2"+ x+y = 2 + 2X11Xab + 11Xcd = 22X11Xab11Xcd
     
       step 3: deal the result of Partition 3 with function of "2"+ x+y  ( x =  the result of last step , y = the result of Partition 3)
      result 3 = "2"+ x+y = 2 + 22X11Xab11Xcd + 11Xef = 222X11Xab11Xcd11Xef
     
      final result sort =XXXX222111111abcded  (the Sequence of the letter is Random,But total num of each letter is stationary)


Aggregate

 

def aggregate (zeroValue)(seqOp:(U, T) => U, combOp: (U, U) => U)

 

val z = sc.parallelize(List(1,2,3,4,5,6), 2)
z.aggregate(0)(math.max(_, _), _ + _)
res40: Int = 9

 

3  6

 

 

valz = sc.parallelize(List("a","b","c","d","e","f"), 3)

println("result:"+z.aggregate("X")((x,y)=> "1"+x+y ,(x,y)=>"2"+ x+y))

 

step1 : 数据分片

   a,b --分片1

   c,d --分片2

   e,f --分片3

 

step2 : 处理分片

    处理分片中的第一个元素1 + x + y  = (x zero value y 为分片中的第一个元素)

    处理结果为  1 + x + a =1xa

   处理分片中的第二个元素    1 + x + y = (x 为上一步的结果 y 为分片中的第二个元素)

     1+1xa + b = 11xab

 依次类推循环

其他两个分片的处理结果为

  11xcd

  11xed

 

step3: 对以上结果做"2"+ x+y 操作

 

x+ 2x1a1b + 2x1c1d + 2x1e1f = xxxx222111111abcd

 

具体字母可能和预计顺序不一致,但结果字母数是一致的

 

BUG:任何zero valuelength 都为1


     
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值