rdd.aggregateByKey()笔记

@TOC

rdd.aggregateByKey()笔记搬运

rdd1 = rdd1.aggregateByKey(aTuple, lambda a,b: (a[0] + b,    a[1] + 1),
                                   lambda a,b: (a[0] + b[0], a[1] + b[1]))

关于上面每个a和b对的含义,以下内容是正确的(以便您可以直观了解发生的情况):

First lambda expression for Within-Partition Reduction Step::
   a: is a TUPLE that holds: (runningSum, runningCount).
   //
   b: is a SCALAR that holds the next Value

   Second lambda expression for Cross-Partition Reduction Step::
   a: is a TUPLE that holds: (runningSum, runningCount).
   b: is a TUPLE that holds: (nextPartitionsSum, nextPartitionsCount).
sumcntRDD_combined = combined_rdd.aggregateByKey((0,0),lambda acc,rating: (acc[0]+rating, acc[1]+1), lambda acc1, acc2: (acc1[0]+acc2[0], acc1[1]+acc2[1]))

//output: [('Children', (31426.5, 9208)), ('Fantasy', (41312.5, 11834)), ('Romance', (63552.0, 18124)),

genre_result =sumcntRDD_combined.mapValues(lambda value: value[0]/value[1])
print("The average rating for each genre is: ")
print(genre_result.take(5))
//output:The average rating for each genre is: 
[('Children', 3.412956125108601), ('Fantasy', 3.4910005070136894), ('Romance', 3.5065107040388437), ('Action', 3.447984331646809), ('Thriller', 3.4937055799183425)]

第二种

sumcntRDD_combined = combined_rdd.aggregateByKey((0,0),lambda acc,rating: (acc[0]+rating, acc[1]+1), lambda acc1, acc2: (acc1[0]+acc2[0]/(acc1[1]+acc2[1]))

新的改变

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值