SparkGraphX图计算(四)

SparkGraphX构建图案例各方法对比与总结


要构建一个图,可以调用这个看起来像构造函数的Graph()。
当一个Scala的类或对象中定义了函数apply()时,在调用apply()时可以省略apply,即Graph.apply()简写为Graph () 。所以Graph()看起来像是一个构造函数,但实际上它是在调用apply()函数。
弹性分布式数据集RDD是构建Spark程序的基础模块,它提供了灵活、高效、并行化数据处理和容错等特性。在GraphX中,图的基础类为Graph,它包含两个RDD : 一个为边RDD,另一个为顶点RDD。

案例一:分析-协作数据

顶点的构建:2种方式:
方式1:RDD构建;
方式2:优化版本VertextRDD
VertextRDD[VD]理解为RDD[Vertext[VD]]的扩展和优化

import org.apache.spark.graphx.impl.EdgeRDDImpl
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD

/**
  * 1-边的创建
  * 2-顶点的创建
  * 3-图的创建
  */
object CreateGraph {
  def main(args: Array[String]): Unit = {
    val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("SparkGraphX_helloworld")
    val sc = new SparkContext(conf)
    sc.setLogLevel("WARN")
    //顶点的创建方式1
    val users: RDD[(VertexId, (String, String))] = sc.parallelize(Array((3L, ("rxin", "student")),
      (7L, ("jg", "postdoc")),
      (5L, ("franklin", "prof")),
      (2L, ("isnoic", "prof"))))
    users.foreach(println(_))
    println("====")
    //顶点的创建方式2
    val user1: VertexRDD[(String, String)] = VertexRDD[(String, String)](users)
    user1.foreach(println(_))

边的构建2种方法:
方式1:RDD方式构建;
方式2:EdgeRDD构建方式:

EdgeRDD[ED]理解为RDD[Edge[ED]]的扩展和优化,在GraphX中边是由ED类型的边RDD构成的。

 //边的创建方式1
    // RDD[Edge[String]]
    val relationship: RDD[Edge[String]] = sc.parallelize(Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"), Edge(2L, 5L, "colleague"), Edge(5L, 7L, "PI")))
    relationship.foreach(println(_))
    println("====")
    //边的创建方式2
    val relationship1: EdgeRDDImpl[String, Nothing] = EdgeRDD.fromEdges(relationship)
    relationship1.foreach(println(_))
    println("====")
    val relationship2: RDD[(VertexId, VertexId)] = sc.parallelize(Array((3L, 7L), (5L, 3L), (2L, 5L), (5L, 7L)))
    relationship2.foreach(println(_))
    //    (2,5)
    //    (5,7)
    //    (3,7)
    //    (5,3)

图的构建3种方法:
way1:apply方法
way2:fromEdgeTuple方法
way3:fromEdge边的属性

//1-构建图的部分---关系--第一种方法
    val defaultVertex = ("Jack", "Missing")
    val graph = Graph(users, relationship, defaultVertex)
    graph.vertices.collect.foreach(println(_))
    //    (5,(franklin,prof))
    //    (2,(isnoic,prof))
    //    (3,(rxin,student))
    //    (7,(jg,postdoc))
    //2-图的创建----fromEdgesTuple根据两个顶点创建图
    val graph3: Graph[(String, String), PartitionID] = Graph.fromEdgeTuples[(String, String)](relationship2,defaultVertex)//defaultValue = ("","")其中defaultValue:VD为顶点的默认数据,
    //用于当顶点在边RDD存在但是在顶点RDD不存在为顶点提供默认值。
    graph3.vertices.collect.foreach(println(_))
    //    (5,(Jack,Missing))
    //    (2,(Jack,Missing))
    //    (3,(Jack,Missing))
    //    (7,(Jack,Missing))
    //3-图的创建方法--fromEdge 根据边创建图
    val graph4: Graph[(String, String), String] = Graph.fromEdges(relationship,defaultVertex)
    graph4.vertices.collect.foreach(println(_))
  }
}

案例二:分析-社交网络数据

社交网络数据分析:
在这里插入图片描述
图的定义:
在这里插入图片描述
分析代码:

//顶点定义
val myVertices=sc.makeRDD(Array((1L,"Ann"),(2L,"Bill"),(3L,"Charles"),(4l,"Diane"),(5L,"Went to gym this morning")))
//边定义
val myEdge=sc.makeRDD(Array(Edge(1L,2L,"is_friends-with"),Edge(2L,3L,"is_friends-with"),Edge(3L,4L,"is_friends-with"),Edge(4L,5L,"Like-status"),Edge(3L,5L,"write-status")))
//图的定义
 val myGraph=Graph(myVertices,myEdge)
 myGraph.vertices.collect

全部代码:


#社交网络数据构建图
val myVertices=sc.makeRDD(Array((1L,"Ann"),(2L,"Bill"),(3L,"Charles"),(4L,"Diane"),(5L,"Went to gym this morning")))
val myEdge=sc.makeRDD(Array(Edge(1L,2L,"is_friend"),Edge(2L,3L,"is_friend"),Edge(3L,4L,"is_friend"),Edge(4L,5L,"Like-Status"),Edge(3L,5L,"write-status")))
myEdge.collect
val myGraph=Graph(myVertices,myEdge)
myGraph.vertices.collect
myGraph.edges.collect

#顶点构建
import org.apache.spark.graphx.VertexRDD
1.val v1: RDD[(VertexId, String)] = 
sc.parallelize(Array((1L,"Ann"),(2L,"Bill"),(3L,"Charles"),(4L,"Diane"),(5L,"Went to gym this morning")))
2.val v2:VertexRDD[String]=VertexRDD(String)(v1)
  al v2 = VertexRDD(v1)
 * val someData: RDD[(VertexId, SomeType)] = loadData(someFile)
 * val vset = VertexRDD(someData)
 * // If there were redundant values in someData we would use a reduceFunc
 * val vset2 = VertexRDD(someData, reduceFunc)
 * 
构建边方法:
val r1: RDD[Edge[String]] =sc.parallelize(Array(Edge(1L,2L,"is_friend"),Edge(2L,3L,"is_friend"),Edge(3L,4L,"is_friend"),Edge(4L,5L,"Like-Status"),Edge(3L,5L,"write-status")))
import org.apache.spark.graphx.EdgeRDD
val r2:EdgeRDD[String]=EdgeRDD.fromEdges(r1)
relationships.collect
relationships1.collect

构建图的方法:
val myGraph=Graph(myVertices,myEdge)
val myGraph1=Graph.apply(myVertices,myEdge)
2.fromEdgeTuple
defaultUser=("jack","none)
val r3:RDD[(VertexId, VertexId)]=sc.parallelize(Array((1L,2L), (2L,3L), (3L,4L), (4L,5L),(3L,5L)))
val myGraph4=Graph.fromEdgeTuples[(String,String)](r3,defaultUser)
myGraph4.vertices.collect.foreach(println _)
myGraph4.edges.collect.foreach(println _)

对比fromedges
defaultUser=("jack","none)
val myGraph3=Graph.fromEdges(r2,defaultUser)
myGraph3.vertices.collect.foreach(println _)
myGraph3.edges.collect.foreach(println _)

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

刘金超DT

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值