应用情景:分析一个图时,需要分析与顶点相连接的每个顶点的信息,例如,需要统计汇总指向A点的每个顶点的信息:每个点的年龄
为了提高性能,主要的聚合操作从graph.mapReduceTriplets改为了新的graph.AggregateMessages。
1.生成一个100个点的graph,边的数目随机。点属性Double,表示该点的年龄,边属性Int
scala> import org.apache.spark.graphx.util.GraphGenerators
scala> val graph: Graph[Double, Int] = GraphGenerators.logNormalGraph(sc, numVertices = 100).mapVertices( (id, _) => id.toDouble )
2.生成点集olderFollowers: VertexRDD[(Int, Double)],Int为年龄大于该点的点的数目,Double为这些点的年龄的累加和
scala> val olderFollowers: VertexRDD[(Int, Double)] = graph.aggregateMessages[(Int, Double)](
triplet => {
if (triplet.srcAttr > triplet.dstAttr) {
triplet.sendToDst(1, triplet.srcAttr)
}
},
(a, b) => (a._1 + b._1, a._2 + b._2)
)
上述 ( 1 , triple.srcAttr ) 为发送给dst的消息
triplet => … 可视为map过程
(a,b) => … 可视为reduce过程
3.求出平均年龄
scala> val avgAgeOfOlderFollowers: VertexRDD[Double] =
olderFollowers.mapValues( (id, value) => value match { case (count, totalAge) => totalAge / count } )
scala> avgAgeOfOlderFollowers.collect.foreach(println(_))