目录
一、GranphX的map操作:
def mapVertices[VD2](map:(VertexId, VD)=> VD2): Graph[VD2, ED]
def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2]
def mapTriplets[ED2](map: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2]
使用这些方法不会改变图的结构,所以这些操作符可以利用原有的图的structural indicies。所以不要用graph.vertices.map的方法来实现同样的操作。
mapEdges: transform each edge attribute in the graph using the map function.
实例:注意在mapEdges中使用的函数里,输入参数x是一个Edge对象,返回对象则是Edge的属性对象。在例子中,属性对象的类型并没有改变,(都是String)但属性的值有所变化。也可以变成其它的类型的对象。
val sheyouGraph = graph.mapEdges(x => {if("roommate".equals(x.attr)) "sheyou" else x.attr})
mapVertices: transform each vertex attribute in the graph using the map function
跟mapEdges类似,mapVerticies中传入的对象也是Vertex的实例化对象,返回值也是顶点的属性对象:
val oneAttrGraph = graph.mapVertices((id, attr) => {attr._1+ " is:"+attr._2})
mapTriplets: Transforms each edge attribute using the map function, passing it the adjacent(临近的) vertex attributes as well.
也就是在mapTriplets中,与mapEdges不同的地方仅仅在于可以使用的作为map条件的东西多了邻近的顶点的属性,最终改变的东西仍然是edge的属性。如果转换中不需要根据顶点的属性,就直接用mapEdges就行了。
什么是Triplet:
Triplet的全称是EdgeTriplet,继承自Edge,所代表的entity是:An edge along with the vertex attributes of its neighboring vertices. 一个EdgeTriplet中包含srcId, dstId, attr(继承自Edge)和srcAttr和dstAttr五个属性。
graph.mapTriplets(triplet => {.....})
详解来自:https://www.jianshu.com/p/d9170a0723e4
二、代码综合实现
import org.apache.spark.graphx.{Edge, Graph, VertexId}
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object MapGraphX {
def main(args: Array[String]): Unit = {
//设置运行环境
val conf = new SparkConf().setAppName("SimpleGraphX").setMaster("local")
val sc = new SparkContext(conf)
sc.setLogLevel("WARN")
//设置users顶点
val users: RDD[(VertexId, (String, Int))] =
sc.parallelize(Array(
(1L, ("Alice", 28)),
(2L, ("Bob", 27)),
(3L, ("Charlie", 65)),
(4L, ("David", 42)),
(5L, ("Ed", 55)),
(6L, ("Fran", 50))
))
//设置relationships边
val relationships: RDD[Edge[Int]] =
sc.parallelize(Array(
Edge(2L, 1L, 7),
Edge(2L, 4L, 2),
Edge(3L, 2L, 4),
Edge(3L, 6L, 3),
Edge(4L, 1L, 1),
Edge(5L, 2L, 2),
Edge(5L, 3L, 8),
Edge(5L, 6L, 3)
))
// 定义默认的作者,以防与不存在的作者有relationship边
val defaultUser = ("John Doe", 0)
println("(1)通过上面的项点数据和边数据创建图对象")
// Build the initial Graph
val graph: Graph[(String, Int), Int] = Graph(users, relationships,
defaultUser)
//val newVertices = graph.vertices.map { case (id, attr) => (id, mapUdf(id, attr)) }
//val newGraph = Graph(newVertices, graph.edges)
graph.vertices.collect.foreach(println(_))
println("在已有图上新建新的图")
var graph2: Graph[(String, Int), Int] = graph.mapVertices((vid: VertexId, attr: (String, Int)) => (attr._1, 2 * attr._2))
graph2.vertices.collect.foreach(println(_))
println("(2)使用mapEdges函数遍历所有的边,新增加一个属性值然后构建出新的图")
var graph3: Graph[(String, Int), (Int, Int)] = graph.mapEdges(e => (e
.attr , 100))
graph3.edges.collect.foreach(println(_))
println("(3)使用mapTriplets函数遍历所有的三元组,新增加一个属性值,然后返回新的图")
var graph4: Graph[(String, Int),(Int, Int)] = graph.mapTriplets(t => (t.attr,
10))
graph4.edges.collect.foreach(println(_))
}
}