目录
一、数据
1、数据关系图
2、 数据说明
每个顶点代表一位社交成员,如顶点3(顶点id为3,成员姓名为Charlie,成员年龄为65)。
两个顶点之间边的箭头方向就是成员的被追求者与粉丝的关系,箭头方向表示被追求者,箭尾方向成员表示被追求者的粉丝。
两个顶点之间边上的数据表示粉丝点击被追求者的次数。
3、顶点表
ID | 属性 |
1 | (Alice,28) |
2 | (Bob,27) |
3 | (Charlie,65) |
4 | (David,42) |
5 | (Ed,55) |
6 | (Fran,50) |
4、边表
源ID | 目的ID | 属性 |
2 | 1 | 7 |
2 | 4 | 2 |
3 | 2 | 4 |
3 | 6 | 3 |
4 | 1 | 1 |
5 | 2 | 2 |
5 | 3 | 8 |
5 | 6 | 3 |
二、需求
- 构造fans网图。
- 找出年龄大于30岁的顶点。
- 找出边属性大于5的边。
- 将每个顶点的年龄+20。
- 将边的属性*3。
- 找出顶点年龄大于30的子图。
- 创建以User作为顶点的新图。
- 找出年纪最大的追求者。
- 计算追求者的平均年纪。
- 找出顶点5到各顶点的最短距离。
三、需求实现
1、构造fans网图
//设置顶点和边,注意顶点和边都是用元组定义的Array
//顶点的数据类型是VD:(String,Int)
val vertexArray = Array(
(1L, ("Alice", 28)),
(2L, ("Bob", 27)),
(3L, ("Charlie", 65)),
(4L, ("David", 42)),
(5L, ("Ed", 55)),
(6L, ("Fran", 50))
)
//边的数据类型ED:Int
val edgeArray = Array(
Edge(2L, 1L, 7),
Edge(2L, 4L, 2),
Edge(3L, 2L, 4),
Edge(3L, 6L, 3),
Edge(4L, 1L, 1),
Edge(5L, 2L, 2),
Edge(5L, 3L, 8),
Edge(5L, 6L, 3)
)
//构造vertexRDD和edgeRDD
val vertexRDD: RDD[(Long, (String, Int))] = sc.parallelize(vertexArray)
val edgeRDD: RDD[Edge[Int]] = sc.parallelize(edgeArray)
//构造图Graph[VD,ED]
val graph: Graph[(String, Int), Int] = Graph(vertexRDD, edgeRDD)
2、找出年龄大于30岁的顶点
graph.vertices.filter(_._2._2 > 30).collect().foreach(v => println(v))
graph.vertices.filter { case (id, (name, age)) => age > 30 }.collect().foreach(v => println(v))
3、找出边属性大于5的边
graph.edges.filter(e => e.attr > 5).collect().foreach(v => println(v))
4、将每个顶点的年龄+20
graph.mapVertices((vid, attr) => attr._2 + 20).vertices.collect().foreach(v => println(v))
graph.mapVertices { case (id, (name, age)) => age + 20 }.vertices.collect().foreach(v => println(v))
5、将边的属性*3
graph.mapEdges(e => e.attr * 3).edges.collect().foreach(v => println(v))
6、找出顶点年龄大于30的子图
val subGraph: Graph[(String, Int), Int] = graph.subgraph(vpred = (id, vd) => vd._2 > 30)
subGraph.vertices.collect.foreach(v => println(s"${v._2._1} is ${v._2._2}"))
7、创建以User作为顶点的新图
case class User(name: String, age: Int, inDeg: Int, outDeg: Int)
//创建一个新图,顶点VD的数据类型为User,并从graph做类型转换
val initialUserGraph: Graph[User, Int] = graph.mapVertices { case (id, (name, age)) => User(name, age, 0, 0) }
//initialUserGraph与inDegrees、outDegrees(RDD)进行连接,并修改initialUserGraph中inDeg值、outDeg值
val userGraph: Graph[User, PartitionID] = initialUserGraph.outerJoinVertices(initialUserGraph.inDegrees) {
case (id, u, inDegOpt) => User(u.name, u.age, inDegOpt.getOrElse(0), u.outDeg)
}.outerJoinVertices(initialUserGraph.outDegrees) {
case (id, u, outDegOpt) => User(u.name, u.age, u.inDeg, outDegOpt.getOrElse(0))
}
userGraph.vertices.collect.foreach(v => println(s"${v._2.name} inDeg: ${v._2.inDeg} outDeg: ${v._2.outDeg}"))
8、找出年纪最大的追求者
val oldestFollower: VertexRDD[(String, Int)] = graph.aggregateMessages[(String, Int)](
// 将源顶点的属性发送给目标顶点,map过程
triplet => {
triplet.sendToDst(triplet.srcAttr._1, triplet.srcAttr._2)
},
// 得到最大追求者,reduce过程
(a, b) => if (a._2 > b._2) a else b
)
// oldestFollower.foreach(v=>println(v)) (3,(Ed,55))
userGraph.vertices.leftJoin(oldestFollower) { (id, user, optOldestFollower) => {
optOldestFollower match {
case None => s"${user.name} does not have any followers."
case Some((name, age)) => s"${name} is the oldest follower of ${user.name}."
}
}
}.collect.foreach { case (id, str) => println(str) }
9、计算追求者的平均年纪
val averageAge: VertexRDD[Double] = graph.aggregateMessages[(Int, Double)](
// 将源顶点的属性 (1, Age)发送给目标顶点,map过程
triplet => {
triplet.sendToDst((1, triplet.srcAttr._2.toDouble))
},
// 得到追求着的数量和总年龄
(a, b) => ((a._1 + b._1), (a._2 + b._2))
).mapValues((id, p) => p._2 / p._1)
userGraph.vertices.leftJoin(averageAge) { (id, user, optAverageAge) => {
optAverageAge match {
case None => s"${user.name} does not have any followers."
case Some(avgAge) => s"The average age of ${user.name}\'s followers is $avgAge."
}
}
}.collect.foreach { case (id, str) => println(str) }
10、找出顶点5到各顶点的最短距离
val sourceId: VertexId = 5L
//Double.NEGATIVE_INFINITY 负无穷
//Double.POSITIVE_INFINITY 正无穷
val initialGraph: Graph[Double, PartitionID] = graph.mapVertices((id, _) => if (id == sourceId) 0.0 else Double.PositiveInfinity)
val sssp: Graph[Double, PartitionID] = initialGraph.pregel(Double.PositiveInfinity)(
(id, dist, newDist) => math.min(dist, newDist),
triplet => {
// 计算权重
if (triplet.srcAttr + triplet.attr < triplet.dstAttr) {
Iterator((triplet.dstId, triplet.srcAttr + triplet.attr))
} else {
Iterator.empty
}
},
(a, b) => math.min(a, b) // 最短距离
)
println(sssp.vertices.collect.mkString("\n"))