spark-graphx之pagerank

spark-graphx实现的pagerank源代码分为两个:
spark-1.6.\graphx\src\main\scala\org\apache\spark\graphx\Pregel.scala有一个还有一个在spark-1.6.\graphx\src\main\scala\org\apache\spark\graphx\lib\PageRank.scala中

该PageRank模型提供了两种调用方式:
*
第一种:(静态)在调用时提供一个参数number,用于指定迭代次数,即无论结果如何,该算法在迭代number次后停止计算,返回图结果。
* for a fixed number of iterations:
* {{{
* var PR = Array.fill(n)( 1.0 )
* val oldPR = Array.fill(n)( 1.0 )
* for( iter <- 0 until numIter ) {
* swap(oldPR, PR)
* for( i <- 0 until n ) {
* PR[i] = alpha + (1 - alpha) * inNbrs[i].map(j => oldPR[j] / outDeg[j]).sum
* }
* }
* }}}
*
第二种:(动态)在调用时提供一个参数tol,用于指定前后两次迭代的结果差值应小于tol,以达到最终收敛的效果时才停止计算,返回图结果。* {{{

  • var PR = Array.fill(n)( 1.0 )
  • val oldPR = Array.fill(n)( 0.0 )
  • while( max(abs(PR - oldPr)) > tol ) {
  • swap(oldPR, PR)
  • for( i <- 0 until n if abs(PR[i] - oldPR[i]) > tol ) {
  • PR[i] = alpha + (1 - \alpha) * inNbrs[i].map(j => oldPR[j] / outDeg[j]).sum
  • }
  • }

先介绍第一种静态方法的实现源码:

“`

def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED], numIter: Int,
resetProb: Double = 0.15): Graph[Double, Double] =
{
runWithOptions(graph, numIter, resetProb)
}

/**
* Run PageRank for a fixed number of iterations returning a graph
* with vertex attributes containing the PageRank and edge
* attributes the normalized edge weight.
*
* @tparam VD the original vertex attribute (not used)即顶点属性的类型
* @tparam ED the original edge attribute (not used)
*边的属性的类型
* @param graph the graph on which to compute PageRank
 需要计算的图
* @param numIter the number of iterations of PageRank to run
 控制pagerank迭代次数
* @param resetProb the random reset probability (alpha)


    • @param srcId the source vertex for a Personalized Page Rank (optional)
      *
  • @return the graph containing with each vertex containing the PageRank and each edge
  • containing the normalized weight.
    *
    */
    def runWithOptions[VD: ClassTag, ED: ClassTag](
    graph: Graph[VD, ED], numIter: Int, resetProb: Double = 0.15,
    srcId: Option[VertexId] = None): Graph[Double, Double] =
    {
    val personalized = srcId isDefined
    val src: VertexId = srcId.getOrElse(-1L)

//用于初始化PageRank图模型,具体内容是赋予每个顶点属性为值1.0,赋予每条边属性为值“1.0/该边的出发顶点的出度数
var rankGraph: Graph[Double, Double] = graph
// 计算每个节点的出度,与原图的顶点进行连接,是原图的顶点属性为该点的出度
.outerJoinVertices(graph.outDegrees) { (vid, vdata, deg) => deg.getOrElse(0) }
// 根据边的出度,计算边的权重TripletFields.Src该值为指定在使用聚合消息函数的时候,需要传播的为一个Triplets的src部分,其余不需要传播
.mapTriplets( e => 1.0 / e.srcAttr, TripletFields.Src )
//设置图的顶点的初始值
.mapVertices { (id, attr) =>
if (!(id != src && personalized)) resetProb else 0.0
}

//为persionliaed的时候用到的
def delta(u: VertexId, v: VertexId): Double = { if (u == v) 1.0 else 0.0 }

var iteration = 0
var prevRankGraph: Graph[Double, Double] = null
while (iteration < numIter) {
  rankGraph.cache()
  val rankUpdates = rankGraph.aggregateMessages[Double](
    ctx => ctx.sendToDst(ctx.srcAttr * ctx.attr), _ + _, TripletFields.Src)
  prevRankGraph = rankGraph
  val rPrb = if (personalized) {
    (src: VertexId , id: VertexId) => resetProb * delta(src, id)
  } else {
    (src: VertexId, id: VertexId) => resetProb
  }

  rankGraph = rankGraph.joinVertices(rankUpdates) {
    (id, oldRank, msgSum) => rPrb(src, id) + (1.0 - resetProb) * msgSum
  }.cache()

  rankGraph.edges.foreachPartition(x => {}) // also materializes rankGraph.vertices
  logInfo(s"PageRank finished iteration $iteration.")
  prevRankGraph.vertices.unpersist(false)
  prevRankGraph.edges.unpersist(fal```

e)

  iteration += 1
}
rankGraph

}
“`虽然graphx实现了一种,但是在现实。。
需要根据自己的业务需要去实现自己的pagerank。概算发由于控制迭代次数,所以pagerank值可能会没有达到。/**
* 动态的pagerank
*
* @tparam VD the original vertex attribute (not used)
* @tparam ED the original edge attribute (not used)
*
* @param graph the graph on which to compute PageRank
* @收敛值
* @param resetProb the random reset probability (alpha)
*
* @return the graph containing with each vertex containing the PageRank and each edge
* containing the normalized weight.
*/
def runUntilConvergence[VD: ClassTag, ED: ClassTag](
graph: Graph[VD, ED], tol: Double, resetProb: Double = 0.15): Graph[Double, Double] =
{
runUntilConvergenceWithOptions(graph, tol, resetProb)
}

* @tparam VD the original vertex attribute (not used)

* @tparam ED the original edge attribute (not used)
*
* @param graph the graph on which to compute PageRank
* @param tol the tolerance allowed at convergence (smaller => more accurate).
* @param resetProb the random reset probability (alpha)
* @param srcId the source vertex for a Personalized Page Rank (optional)
*
* @return the graph containing with each vertex containing the PageRank and each edge
* containing the normalized weight.
*/
def runUntilConvergenceWithOptions[VD: ClassTag, ED: ClassTag](
graph: Graph[VD, ED], tol: Double, resetProb: Double = 0.15,
srcId: Option[VertexId] = None): Graph[Double, Double] =
{
val personalized = srcId.isDefined
val src: VertexId = srcId.getOrElse(-1L)

//同静态方法
val pagerankGraph: Graph[(Double, Double), Double] = graph
// Associate the degree with each vertex
.outerJoinVertices(graph.outDegrees) {
(vid, vdata, deg) => deg.getOrElse(0)
}

  .mapTriplets( e => 1.0 / e.srcAttr )
  .mapVertices { (id, attr) =>
    if (id == src) (resetProb, Double.NegativeInfinity) else (0.0, 0.0)
  }

.cache()

//图中每个节点的权重处理函数
def vertexProgram(id: VertexId, attr: (Double, Double), msgSum: Double): (Double, Double) = {
val (oldPR, lastDelta) = attr
val newPR = oldPR + (1.0 - resetProb) * msgSum
(newPR, newPR - oldPR)
}

def personalizedVertexProgram(id: VertexId, attr: (Double, Double),
  msgSum: Double): (Double, Double) = {
  val (oldPR, lastDelta) = attr
  var teleport = oldPR
  val delta = if (src==id) 1.0 else 0.0
  teleport = oldPR*delta

  val newPR = teleport + (1.0 - resetProb) * msgSum
  val newDelta = if (lastDelta == Double.NegativeInfinity) newPR else newPR - oldPR
  (newPR, newDelta)
}

def sendMessage(edge: EdgeTriplet[(Double, Double), Double]) = {
  if (edge.srcAttr._2 > tol) {
    Iterator((edge.dstId, edge.srcAttr._2 * edge.attr))
  } else {
    Iterator.empty
  }
}

def messageCombiner(a: Double, b: Double): Double = a + b

// 初始化消息
val initialMessage = if (personalized) 0.0 else resetProb / (1.0 - resetProb)

// Execute a dynamic version of Pregel.
val vp = if (personalized) {
  (id: VertexId, attr: (Double, Double), msgSum: Double) =>
    personalizedVertexProgram(id, attr, msgSum)
} else {
  (id: VertexId, attr: (Double, Double), msgSum: Double) =>
    vertexProgram(id, attr, msgSum)
}

Pregel(pagerankGraph, in 

itialMessage, activeDirection = EdgeDirection.Out)(
vp, sendMessage, messageCombiner)
.mapVertices((vid, attr) => attr._1)
} 如果在应用中,需要使用边的权重而不是按照graphx的实现的方法,该怎么去实现呢,需要在初始化图的时候进行处理,后期会补上相关实现。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值