sparkGraphX 图操作:pregel(加强的aggregateMessages)

目录

1、Pregel API:

2、代码实现:

使用pregal实现找出源顶点到每个节点最小花费

使用pregel实现找出源节点到每个节点的最大深度


1、Pregel API:

图本身就是内在的递归的数据结构,因为一个顶点的属性可能依赖于其neighbor,而neighbor的属性又依赖于他们的neighbour。所以很多重要的图算法都会迭代计算每个顶点的属性,直到达到一个稳定状态。

GraphX中的Pregel操作符是一个批量同步并行(bulk-synchronous parallel message abstraction)的messaging abstraction,用于图的拓扑结构(topology of the graph)。The Pregel operator executes in a series of super steps in whichvertices receive the sum of their inbound messagesfrom the previous super step,compute a new valuefor the vertex property, and thensend messages to neighboring verticesin the next super step. Message是作为edge triplet的一个函数并行计算的,message的计算可以使用source和dest顶点的属性。没有收到message的顶点在super step中被跳过。迭代会在么有剩余的信息之后停止,并返回最终的图。

pregel的定义:

def pregel[A]

    (initialMsg: A,//在第一次迭代中每个顶点获取的起始

    msgmaxIter: Int = Int.MaxValue,//迭代计算的次数

    activeDir: EdgeDirection = EdgeDirection.Out

)(

    vprog: (VertexId, VD, A) => VD,//顶点的计算函数,在每个顶点运行,根据顶点的ID,属性和获取的inbound message来计算顶点的新属性值。顶一次迭代的时候,inbound message为initialMsg,且每个顶点都会执行一遍该函数。以后只有上次迭代中接收到信息的顶点会执行。

    sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],//应用于顶点的出边(out edges)用于接收顶点发出的信息

    mergeMsg: (A, A) => A//合并信息的算法

)

算法实现的大致过程:

var g = mapVertices((vid, vdata) => vprog(vid, vdata, initMsg)).cache //第一步是根据initMsg在每个顶点执行一次vprog算法,从而每个顶点的属性都会迭代一次。

var messages = g.mapReduceTriplets(sendMsg, mergeMsg)

var messagesCount = messages.count

var i = 0

while(activeMessages > 0 && i < maxIterations){

    g = g.joinVertices(messages)(vprog).cache

    val oldMessages = messages

    messages = g.mapReduceTriplets(

        sendMsg,

        mergeMsg,

        Some((oldMessages, activeDirection))

    ).cache()

    activeMessages = messages.count

    i += 1

}

g

pregel算法的一个实例:将图跟一些一些初始的score做关联,然后将顶点分数根据出度大小向外发散,并自己保留一份:

//将图中顶点添加上该顶点的出度属性

val graphWithDegree = graph.outerJoinVertices(graph.outDegrees){

    case (vid, name, deg) => (name, deg match {

        case Some(deg) => deg+0.0

        case None => 1.25}

    )

}//将图与初始分数做关联

val graphWithScoreAndDegree = graphWithDegree.outerJoinVertices(scoreRDD){

    case (vid, (name, deg), score) => (name,deg, score.getOrElse(0.0))

}

graphWithScoreAndDegree.vertices.foreach(x => println("++++++++++++id:"+x._1+"; deg: "+x._2._2+"; score:"+x._2._3))//将图与初始分数做关联

val graphWithScoreAndDegree = graphWithDegree.outerJoinVertices(scoreRDD){

    case (vid, (name, deg), score) => (name,deg, score.getOrElse(0.0))

}

graphWithScoreAndDegree.vertices.foreach(x => println("++++++++++++id:"+x._1+"; deg: "+x._2._2+"; score:"+x._2._3))

算法的第一步:将0.0(也就是传入的初始值initMsg)跟各个顶点的值相加(还是原来的值),然后除以顶点的出度。这一步很重要,不能忽略。 并且在设计的时候也要考虑结果会不会被这一步所影响。

解释来源:https://www.jianshu.com/p/d9170a0723e4

 

2、代码实现:

使用pregal实现找出源顶点到每个节点最小花费

package homeWork

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.graphx.{Edge, Graph, VertexId}
import org.apache.spark.graphx.util.GraphGenerators

object MapGraphX5 {


  def main(args: Array[String]): Unit = {
    //设置运行环境
    val conf = new SparkConf().setAppName("Pregel API GraphX").setMaster("local")
    val sc = new SparkContext(conf)
    sc.setLogLevel("WARN")

    // 构建图
    val myVertices = sc.parallelize(Array((1L, 0), (2L, 0), (3L, 0), (4L, 0),
      (5L, 0)))
    val myEdges = sc.makeRDD(Array(Edge(1L, 2L, 2.5),
      Edge(2L, 3L, 3.6), Edge(3L, 4L, 4.5),
      Edge(4L, 5L, 0.1), Edge(3L, 5L, 5.2)
    ))
    val myGraph = Graph(myVertices, myEdges)

    //设置源顶点
    val sourceId: VertexId = 1L
    //初始化数据集,是源顶点就为0.0,不是就设置为double的正无穷大
    val initialGraph = myGraph.mapVertices((id, _) =>
      if (id == sourceId) 0.0 else Double.PositiveInfinity)

/*
    def pregel[A](
                   initialMsg : A,
                   maxIterations : scala.Int = { /* compiled code */ },
                   activeDirection : org.apache.spark.graphx.EdgeDirection = { /* compiled code */ }
                 )
                 (
                   vprog : scala.Function3[org.apache.spark.graphx.VertexId, VD, A, VD],
                   sendMsg : scala.Function1[org.apache.spark.graphx.EdgeTriplet[VD, ED],
                     scala.Iterator[scala.Tuple2[org.apache.spark.graphx.VertexId, A]]],
                   mergeMsg : scala.Function2[A, A, A])(implicit evidence$6 : scala.reflect.ClassTag[A]
                 )
    : org.apache.spark.graphx.Graph[VD, ED] = { /* compiled code */ }
*/


    val sssp: Graph[Double, Double] = initialGraph.pregel(
      //initialMs
      Double.PositiveInfinity
      //maxIterations和activeDirection使用默认值
    )(
      //vprog   更改数据集
      (id, dist, newDist) => math.min(dist, newDist),
      //sendMsg
      triplet => { // Send Message
        //寻找1L顶点到每个顶点的最小花费
        if (triplet.srcAttr + triplet.attr < triplet.dstAttr) {
          //满足sum(起始顶点+边值) 小于 终止顶点当前数据集中的值,就把sum发送给终止顶点,更新数据集的数据
          Iterator((triplet.dstId, triplet.srcAttr + triplet.attr))
        } else {
          Iterator.empty
        }
      },
      //mergeMsg    选择当前数据和发送数据的最小值传送
      (a, b) => math.min(a, b)
    )


    sssp.vertices.collect.foreach(println(_))


  }
}

使用pregel实现找出源节点到每个节点的最大深度

package pregel

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.graphx.{Edge, EdgeDirection, Graph}

object Demo2 {

  def main(args: Array[String]): Unit = {

    //设置运行环境
    val conf = new SparkConf().setAppName("Pregol Api GraphX").setMaster("local")
    val sc = new SparkContext(conf)
    sc.setLogLevel("WARN")

    // 构建图
    val myVertices = sc.parallelize(Array((1L, "张三"), (2L, "李四"), (3L, "王五"), (4L, "钱六"),
      (5L, "领导")))
    val myEdges = sc.makeRDD(Array( Edge(1L,2L,"朋友"),
      Edge(2L,3L,"朋友") , Edge(3L,4L,"朋友"),
      Edge(4L,5L,"上下级"),Edge(3L,5L,"上下级")
    ))

    val myGraph = Graph(myVertices,myEdges)

    val g =  myGraph.mapVertices((vid,vd)=>0)

    var newGraph: Graph[Int, String] = g.pregel(0)(
      (id, attr, maxValue) => maxValue,
      triplet => { // Send Message
        if (triplet.srcAttr + 1 > triplet.dstAttr) {
           Iterator((triplet.dstId, triplet.srcAttr + 1))
        } else {
          Iterator.empty
        }
      },
      (a: Int, b: Int) => math.max(a, b)
    )

    newGraph.vertices.collect.foreach(println(_))




    

  }

}


 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值