graphX主要由三部分构成:
- 顶点集(vertices vertex)
- 边集 (edges)
- 用来表示顶点之间关系的三元组 (triplets)
下面是一个示例(spark环境中运行):
1. 根据上图构造图结构并查看满足一定条件的顶点的信息
//顶点
val vertexRDD: RDD[(Long, (String, Int))] = sc.parallelize(Seq((1L, ("alex", 23)), (2L, ("bob", 27)),(3L, ("charlie", 65)),(4L, ("david", 42)),(5L, ("ed", 55)),(6L, ("fran", 50))))
//边集
val edgeRDD: RDD[Edge[Int]] = sc.parallelize(Seq(Edge(4L,1L, 1),Edge(2L, 1L, 7),Edge(2L,4L, 2), Edge(3L, 2L, 4),Edge(5L,2L, 2),Edge(5L, 3L, 8),Edge(5L,6L, 3), Edge(3L, 6L, 3)))
//根据顶点和边集构造图
val graph: Graph[(String, Int), Int] = Graph(vertexRDD, edgeRDD)
//根据顶点挑选满足条件的信息
graph.vertices.filter { case (id, (name, age)) => age > 30 }
.collect.foreach { case (id, (name, age)) => println(s"$name is $age")}
结果如下:
2. 查看三元组的信息输出用户之间关系的信息
//查看图中边的信息
for (triplet <- graph.triplets.collect)
{
println(s"${triplet.srcAttr._1} likes ${triplet.dstAttr._1}")
}
结果如下:
3. 获取出度入度信息并利用出度入度信息输出每个用户的关注者的个数
case class User(name: String, age: Int, inDeg: Int, outDeg: Int)
// Creating a user Graph
val initialUserGraph: Graph[User, Int] = graph.mapVertices{ case (id, (name, age)) => User(name, age, 0, 0) }
// 获取出度入度信息
val userGraph = initialUserGraph.outerJoinVertices(initialUserGraph.inDegrees) {
case (id, u, inDegOpt) => User(u.name, u.age, inDegOpt.getOrElse(0), u.outDeg)
}.outerJoinVertices(initialUserGraph.outDegrees) {
case (id, u, outDegOpt) => User(u.name, u.age, u.inDeg, outDegOpt.getOrElse(0))
}
// 输出每个用户的关注者的个数信息
for ((id, property) <- userGraph.vertices.collect) {
println(s"User $id is called ${property.name} and is liked by ${property.inDeg} people.")
}
结果如下: