结构算子
//创建顶点RDD
val users = sc.makeRDD(Array(
(1L, ("Alice", 28)),
(2L, ("Bob", 27)),
(3L, ("Charli", 65)),
(4L, ("David", 42)),
(5L, ("Ed", 55)),
(6L, ("Fran", 50))
))
//创建各顶点间关系的RDD
val relation = sc.makeRDD(Array(
Edge(2L, 1L, 7),
Edge(3L, 2L, 4),
Edge(3L, 6L, 3),
Edge(5L, 6L, 3),
Edge(5L, 3L, 8),
Edge(5L, 2L, 2),
Edge(2L, 4L, 2),
Edge(4L, 1L, 1)
))
val relationGraph = Graph(users,relation)
reverse
改变边的方向
val reGraph=relationGraph.reverse
reGraph.triplets.foreach(println)
subgraph
生成子图。
生成边的子图:
筛选出年龄小于65的。
val epGraph=relationGraph.subgraph(epred=ep=>ep.srcAttr._2<65)
epGraph.vertices.foreach(println)
生成顶点的子图:
val vpGraph=relationGraph.subgraph(vpred=(id,attr)=>attr._2<65)
vpGraph.vertices.foreach(println)
Join算子
joinVertices
从外部的RDDs加载数,修改顶点属性
val address=sc.makeRDD(Array((1L,"qq.com"),(2L,"163.com"),(3L,"139.com")))
val joinGraph=relationGraph.joinVertices(address)((id,v,a)=>(v._1+"@"+a,v._2))
joinGraph.vertices.foreach(println)
outerJoinVertices
val outerGraph=relationGraph.outerJoinVertices(address)((id,v,a)=>(v._1+"@"+a,v._2))
outerGraph.vertices.foreach(println)
GraphX API的应用
将顶点入读、出度存入顶点属性中.
入读:
case class User(name:String,age:Int,inDeg:Int,outDeg:Int)
val g1=relationGraph.outerJoinVertices(relationGraph.inDegrees){
case(id,u,indeg)=>User(u._1,u._2,indeg.getOrElse(0),0)}
g1.triplets.foreach(println)
出度:
val g2=g1.outerJoinVertices(relationGraph.outDegrees){
case(id,u,outdeg)=>User(u.name,u.age,u.inDeg,outdeg.getOrElse(0))}
g2.triplets.foreach(println)
PageRank(PR)算法
- 用于评估网页链接的质量和数量,以确定该网页的重要性和权威性的相对分数,范围为0到10
- 从本质上讲,PageRank是找出图中顶点(网页链接)的重要性
- GraphX提供了PageRank API用于计算图的PageRank
class Graph[VD, ED] { def pageRank(tol: Double, resetProb: Double = 0.15): Graph[Double, Double] }
tol:为收敛时允许的误差,越小越精确,确定迭代是否结束的参数;
resetProb:随机重置概率
val pr=relationGraph.pageRank(0.01,0.1)
pr.triplets.foreach(println