大数据Spark “蘑菇云”行动第81课:Spark GraphX 综合案例作业讲解和源码深度剖析
聚合操作是分布式系统中最重要的操作
which fields should be included in the [[EdgeContext]] passed to the `sendMsg` function. If not all fields are needed, specifying this can improve performance.
val oldestFollowers: VertexRDD[(String, Int)] = graph.aggregateMessages[(String, Int)](
triplet => { // Map Function
// Send message to destination vertex containing name and age
triplet.sendToDst(triplet.srcAttr._1, triplet.srcAttr._2)
},
// Compare age
(a, b) => if( a._2 > b._2 ) a else b // Reduce Function
)
triplet => { // Map Function
// Send message to destination vertex containing name and age
triplet.sendToDst(triplet.srcAttr._1, triplet.srcAttr._2)
},
// Compare age
(a, b) => if( a._2 > b._2 ) a else b // Reduce Function
)
今天作业,研究Spark中join的不同