数据文件下载链接: https://pan.baidu.com/s/1AHSbfrweBPfkqPyWAImwAw
提取码: gpyv
Spark Graphx 算法
1.Connected Components算法实例
参考链接:https://www.jianshu.com/p/8b0a4ce52703
package suanfa
import org.apache.spark.graphx.{Edge, Graph}
import org.apache.spark.{SparkConf, SparkContext}
;
/**
* @Author Bright
* @Date 2020/11/26
* @Description
*/
object ConnectedComponentsDemo {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("ConnectedComponentsDemo")
val sc = new SparkContext(conf)
val people = sc.textFile("in/people.csv")
people.collect.foreach(println)
val links = sc.textFile("in/links.csv")
links.collect.foreach(println)
case class Person(name:String,age:Int)
val peopleRDD = people.map(x=>x.split(",")).map(x=>(x(0).toLong,Person(x(1),x(2).toInt)))
val linksRDD = links.map({ x =>
val row = x.split(",");
Edge(x(0).toInt, row(1).toInt, x(2))
})
// 创建图
val graph = Graph(peopleRDD,linksRDD)
graph.vertices.collect.foreach(println)
graph.triplets.collect.foreach(println)
// 调用connectedComponents
val cc = graph.connectedComponents
cc.vertices.collect.foreach(println)
// 分析:
// cc:(4,1) peopleRDD:(4,Person(Dave,25))
// (id,mincc,people):(4,1,Person(Dave,25))
// (mincc,people.get.name,people.get.age):(1,Dava,25)
val newGraph = cc.outerJoinVertices(peopleRDD)((id,mincc,people)=>(mincc,people.get.name,people.get.age))
newGraph.vertices.collect.foreach(println)
// 分析:
// cc:(4,1) => cc.vertices.map(_._2) = 1
// newGraph:(4,(1,Dave,25)) => id2._1 = 1
cc.vertices.map(_._2).collect.distinct.foreach(id =>{
val sub = newGraph.subgraph(vpred = (id1,id2) => id2._1 == id)
sub.triplets.collect.foreach(println)
})
}
}
从结果中可以看到通过计算之后的图,每个顶点多了一个属性,这个属性表示的就是这个顶点所在的连通图中的最小顶点id。例如顶点11所在的连通图中的最小顶点id是10,顶点4所在的连通图中的最小顶点id是1。
扩展
经过connectedComponents得到的结果,可以知道哪些顶点在一个连通图中,这样就可以将一个大图拆分成若干个连通子图。
分析:
通过connectedComponents得到的新图的顶点属性已经没有了原始的那些信息,所以需要和原始信息作一个join,例如val newGraph = cc.outerJoinVertices(peopleRDD)((id, cc, p)=>(cc,p.get.name,p.get.age))
cc.vertices.map(_._2).collect.distinct会得到所有连通图中id最小的顶点编号
通过连通图中最小顶点编号,使用subgraph方法得到每个连通子图
2.Pregel算法 求顶点5 到 其他各顶点的 最短距离
Pregel算法参考链接:
https://blog.csdn.net/hanweileilei/article/details/89764466
package suanfa
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
/**
* @Author Bright
* @Date 2020/11/26
* @Description
*/
object PregelDemo {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("PregelDemo").setMaster("local[*]")
val sc = new SparkContext(conf)
//2、创建顶点
val vertexArray = Array(
(1L, ("Alice", 28)),
(2L, ("Bob", 27)),
(3L, ("Charlie", 65)),
(4L, ("David", 42)),
(5L, ("Ed", 55)),
(6L, ("Fran", 50))
)
//3、创建边,边的属性代表 相邻两个顶点之间的距离
val edgeArray = Array(
Edge(2L, 1L, 7),
Edge(2L, 4L, 2),
Edge(3L, 2L, 4),
Edge(3L, 6L, 3),
Edge(4L, 1L, 1),
Edge(2L, 5L, 2),
Edge(5L, 3L, 8),
Edge(5L, 6L, 3)
)
val edgeRDD: RDD[Edge[Int]] = sc.makeRDD(edgeArray)
val vertexRDD: RDD[(VertexId, (String,Int))] = sc.makeRDD(vertexArray)
val graph = Graph(vertexRDD,edgeRDD)
val srcVertexId = 5L
val initialGraph = graph.mapVertices{case (vid,(name,age)) => if(vid==srcVertexId) 0.0 else Double.PositiveInfinity}
//5、调用pregel
val pregelGraph = initialGraph.pregel(
Double.PositiveInfinity,
Int.MaxValue,
EdgeDirection.Out
)(
(vid: VertexId, vd: Double, distMsg: Double) => {
val minDist = math.min(vd, distMsg)
//println("vprog"+vid+" "+vd+" "+distMsg+" "+minDist)
println(s"顶点${vid},属性${vd},收到消息${distMsg},合并后的属性${minDist}")
minDist
},
(edgeTriplet: EdgeTriplet[Double,PartitionID]) => {
if (edgeTriplet.srcAttr + edgeTriplet.attr < edgeTriplet.dstAttr) {
println(s"顶点${edgeTriplet.srcId} 给 顶点${edgeTriplet.dstId} 发送消息 ${edgeTriplet.srcAttr + edgeTriplet.attr}")
Iterator[(VertexId, Double)]((edgeTriplet.dstId, edgeTriplet.srcAttr + edgeTriplet.attr))
} else {
Iterator.empty
}
},
(msg1: Double, msg2: Double) => math.min(msg1, msg2)
)
pregelGraph.triplets.collect.foreach(println)
}
}
输出结果:
顶点2,属性Infinity,收到消息Infinity,合并后的属性Infinity
顶点5,属性0.0,收到消息Infinity,合并后的属性0.0
顶点3,属性Infinity,收到消息Infinity,合并后的属性Infinity
顶点6,属性Infinity,收到消息Infinity,合并后的属性Infinity
顶点1,属性Infinity,收到消息Infinity,合并后的属性Infinity
顶点4,属性Infinity,收到消息Infinity,合并后的属性Infinity
顶点5 给 顶点6 发送消息 3.0
顶点5 给 顶点3 发送消息 8.0
顶点3,属性Infinity,收到消息8.0,合并后的属性8.0
顶点6,属性Infinity,收到消息3.0,合并后的属性3.0
顶点3 给 顶点2 发送消息 12.0
顶点2,属性Infinity,收到消息12.0,合并后的属性12.0
顶点2 给 顶点4 发送消息 14.0
顶点2 给 顶点1 发送消息 19.0
顶点1,属性Infinity,收到消息19.0,合并后的属性19.0
顶点4,属性Infinity,收到消息14.0,合并后的属性14.0
顶点4 给 顶点1 发送消息 15.0
顶点1,属性19.0,收到消息15.0,合并后的属性15.0
((2,12.0),(1,15.0),7)
((2,12.0),(4,14.0),2)
((3,8.0),(2,12.0),4)
((3,8.0),(6,3.0),3)
((4,14.0),(1,15.0),1)
((2,12.0),(5,0.0),2)
((5,0.0),(3,8.0),8)
((5,0.0),(6,3.0),3)