Spark组件之GraphX学习4--Structural Operators:mask

本文介绍如何使用Apache Spark的GraphX库计算图中的连通分量,并通过子图过滤移除无效节点,展示了具体的实现代码及运行结果。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

更多代码请见:https://github.com/xubo245/SparkLearning


1解释


connectedComponents源码:返回连接成分的顶点值:包含顶点Id,属性没了

  /**
   * Compute the connected component membership of each vertex and return a graph with the vertex
   * value containing the lowest vertex id in the connected component containing that vertex.
   *
   * @see [[org.apache.spark.graphx.lib.ConnectedComponents$#run]]
   */
  def connectedComponents(): Graph[VertexId, ED] = {
    ConnectedComponents.run(graph)
  }


mask源码:返回的是current graph和other graph的公共子图

  /**
   * Restricts the graph to only the vertices and edges that are also in `other`, but keeps the
   * attributes from this graph.
   * @param other the graph to project this graph onto
   * @return a graph with vertices and edges that exist in both the current graph and `other`,
   * with vertex and edge data from the current graph
   */
  def mask[VD2: ClassTag, ED2: ClassTag](other: Graph[VD2, ED2]): Graph[VD, ED]



2.代码:

/**
 * @author xubo
 * ref http://spark.apache.org/docs/1.5.2/graphx-programming-guide.html
 * time 20160503
 */

package org.apache.spark.graphx.learning
import org.apache.spark._
import org.apache.spark.graphx._
// To make some of the examples work we will also need RDD
import org.apache.spark.rdd.RDD

object GraphOperatorsStructuralMask {

  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("GraphOperatorsStructuralMask").setMaster("local[4]")
    // Assume the SparkContext has already been constructed
    val sc = new SparkContext(conf)

    // Create an RDD for the vertices
    val users: RDD[(VertexId, (String, String))] =
      sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")),
        (5L, ("franklin", "prof")), (2L, ("istoica", "prof")),
        (4L, ("peter", "student"))))
    // Create an RDD for edges
    val relationships: RDD[Edge[String]] =
      sc.parallelize(Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"),
        Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi"),
        Edge(4L, 0L, "student"), Edge(5L, 0L, "colleague")))
    // Define a default user in case there are relationship with missing user
    val defaultUser = ("John Doe", "Missing")
    // Build the initial Graph
    val graph = Graph(users, relationships, defaultUser)
    // Notice that there is a user 0 (for which we have no information) connected to users
    // 4 (peter) and 5 (franklin).
    println("vertices:");
    graph.subgraph(each => each.srcId != 100L).vertices.collect.foreach(println)
    println("\ntriplets:");
    graph.triplets.map(
      triplet => triplet.srcAttr._1 + " is the " + triplet.attr + " of " + triplet.dstAttr._1).collect.foreach(println(_))
    graph.edges.collect.foreach(println)

    // Run Connected Components
    val ccGraph = graph.connectedComponents() // No longer contains missing field
    // Remove missing vertices as well as the edges to connected to them
    val validGraph = graph.subgraph(vpred = (id, attr) => attr._2 != "Missing")
    // Restrict the answer to the valid subgraph
    val validCCGraph = ccGraph.mask(validGraph)
    println("\nccGraph:");
    println("vertices:");
    ccGraph.vertices.collect.foreach(println)
    println("edegs:");
    ccGraph.edges.collect.foreach(println)
    println("\nvalidGraph:");
    validGraph.vertices.collect.foreach(println)
    println("\nvalidCCGraph:");
    validCCGraph.vertices.collect.foreach(println)
  }

}
分析:

先对图进行connectedComponents,转换成新的图ccGraph,然后再对原图Graph进行subgraph操作,最后再mask取交集


3.结果:


vertices:
(4,(peter,student))
(0,(John Doe,Missing))
(5,(franklin,prof))
(2,(istoica,prof))
(3,(rxin,student))
(7,(jgonzal,postdoc))

triplets:
rxin is the collab of jgonzal
istoica is the colleague of franklin
franklin is the advisor of rxin
franklin is the pi of jgonzal
peter is the student of John Doe
franklin is the colleague of John Doe
Edge(3,7,collab)
Edge(2,5,colleague)
Edge(5,3,advisor)
Edge(5,7,pi)
Edge(4,0,student)
Edge(5,0,colleague)

ccGraph:
vertices:
(4,0)
(0,0)
(5,0)
(2,0)
(3,0)
(7,0)
edegs:
Edge(3,7,collab)
Edge(2,5,colleague)
Edge(5,3,advisor)
Edge(5,7,pi)
Edge(4,0,student)
Edge(5,0,colleague)

validGraph:
(4,(peter,student))
(5,(franklin,prof))
(2,(istoica,prof))
(3,(rxin,student))
(7,(jgonzal,postdoc))

validCCGraph:
(4,0)
(5,0)
(2,0)
(3,0)
(7,0)


参考

【1】 http://spark.apache.org/docs/1.5.2/graphx-programming-guide.html

【2】https://github.com/xubo245/SparkLearning


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值