GraphFrame 初试

GraphX基于RDD API,不支持Python API;但GraphFrame基于DataFrame,并且支持Python API。

“GraphFrames is a DataFrame-based external Spark package that provides performance optimizations and also additional functionalities such as motif finding.”

“While the GraphX framework is based on the RDD API, GraphFrames is an external Spark package built on top of the DataFrames API. It inherits the performance advantages of DataFrames using the catalyst optimizer. It can be used in the Java, Scala, and Python programming languages. GraphFrames provides additional functionalities over GraphX such as motif nding, DataFrame-based serialization, and graph queries. GraphX does not provide the Python API, but GraphFrames exposes the Python API as well.”

开始GraphFrame,主要从如下几步开始入手:
1.导入jar包。导入graphframes-0.5.0-spark2.1-s_2.11.jar
如下几种方法。
1)$SPARK_HOME/bin/spark-shell --packages graphframes-0.5.0-spark2.1-s_2.11.jar
2)直接放到$SPARK_HOME/jars目录下
3)idea中导入jar包
其中,1) 和 2)适用于spark-shell,3)适用于idea。
2. 示例代码

import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession

import org.graphframes._

object Graphs {
  def main(args: Array[String]){
    // 屏蔽不必要的日志显示在终端上
    Logger.getLogger("org.apache.spark").setLevel(Level.WARN)

    val spark = SparkSession
      .builder()
      .appName("Graphs")
      .getOrCreate()

    val vertex = spark.createDataFrame(List(
      ("1","Jacob",48),
      ("2","Jessica",45),
      ("3","Andrew",25),
      ("4","Ryan",53),
      ("5","Emily",22),
      ("6","Lily",52)
    )).toDF("id", "name", "age")

    vertex.show()
//    +---+-------+---+
//    | id|   name|age|
//    +---+-------+---+
//    |  1|  Jacob| 48|
//    |  2|Jessica| 45|
//    |  3| Andrew| 25|
//    |  4|   Ryan| 53|
//    |  5|  Emily| 22|
//    |  6|   Lily| 52|
//    +---+-------+---
    val edges = spark.createDataFrame(List(
      ("6","1","Sister"),
      ("1","2","Husband"),
      ("2","1","Wife"),
      ("5","1","Daughter"),
      ("5","2","Daughter"),
      ("3","1","Son"),
      ("3","2","Son"),
      ("4","1","Friend"),
      ("1","5","Father"),
      ("1","3","Father"),
      ("2","5","Mother"),
      ("2","3","Mother")
    )).toDF("src", "dst", "relationship")
    edges.show()

    val graph = GraphFrame(vertex, edges)
//      +---+---+------------+
//      |src|dst|relationship|
//      +---+---+------------+
//      |  6|  1|      Sister|
//      |  1|  2|     Husband|
//      |  2|  1|        Wife|
//      |  5|  1|    Daughter|
//      |  5|  2|    Daughter|
//      |  3|  1|         Son|
//      |  3|  2|         Son|
//      |  4|  1|      Friend|
//      |  1|  5|      Father|
//      |  1|  3|      Father|
//      |  2|  5|      Mother|
//      |  2|  3|      Mother|
//      +---+---+------------+
    graph.vertices.show()
    graph.edges.show()
    graph.vertices.groupBy().min("age").show()
//    +--------+
//    |min(age)|
//    +--------+
//    |      22|
//    +--------+

//     Motif  finding
    val motifs = graph.find("(a)-[e]->(b); (b)-[e2]->(a)")
    motifs.show()
//    +--------------+--------------+--------------+--------------+
//    |             a|             e|             b|            e2|
//    +--------------+--------------+--------------+--------------+
//    |  [1,Jacob,48]| [1,2,Husband]|[2,Jessica,45]|    [2,1,Wife]|
//    |[2,Jessica,45]|    [2,1,Wife]|  [1,Jacob,48]| [1,2,Husband]|
//    |  [5,Emily,22]|[5,1,Daughter]|  [1,Jacob,48]|  [1,5,Father]|
//    |  [5,Emily,22]|[5,2,Daughter]|[2,Jessica,45]|  [2,5,Mother]|
//    | [3,Andrew,25]|     [3,1,Son]|  [1,Jacob,48]|  [1,3,Father]|
//    | [3,Andrew,25]|     [3,2,Son]|[2,Jessica,45]|  [2,3,Mother]|
//    |  [1,Jacob,48]|  [1,5,Father]|  [5,Emily,22]|[5,1,Daughter]|
//    |  [1,Jacob,48]|  [1,3,Father]| [3,Andrew,25]|     [3,1,Son]|
//    |[2,Jessica,45]|  [2,5,Mother]|  [5,Emily,22]|[5,2,Daughter]|
//    |[2,Jessica,45]|  [2,3,Mother]| [3,Andrew,25]|     [3,2,Son]|
//    +--------------+--------------+--------------+--------------+

    // filter results
    motifs.filter("b.age > 30").show()
//    +--------------+--------------+--------------+-------------+
//    |             a|             e|             b|           e2|
//    +--------------+--------------+--------------+-------------+
//    |  [1,Jacob,48]| [1,2,Husband]|[2,Jessica,45]|   [2,1,Wife]|
//    |[2,Jessica,45]|    [2,1,Wife]|  [1,Jacob,48]|[1,2,Husband]|
//    |  [5,Emily,22]|[5,1,Daughter]|  [1,Jacob,48]| [1,5,Father]|
//    |  [5,Emily,22]|[5,2,Daughter]|[2,Jessica,45]| [2,5,Mother]|
//    | [3,Andrew,25]|     [3,1,Son]|  [1,Jacob,48]| [1,3,Father]|
//    | [3,Andrew,25]|     [3,2,Son]|[2,Jessica,45]| [2,3,Mother]|
//    +--------------+--------------+--------------+-------------+

    //3.Loading and saving GraphFrames
    graph.vertices.write.parquet("file:///Users/sws/IdeaProjects/JavaScala/src/main/scala/Data/vertices")
    graph.edges.write.parquet("file:///Users/sws/IdeaProjects/JavaScala/src/main/scala/Data/edges")

    val verticesDF = spark.read.parquet("file:///Users/sws/IdeaProjects/JavaScala/src/main/scala/Data/vertices")
    val edgesDF = spark.read.parquet("file:///Users/sws/IdeaProjects/JavaScala/src/main/scala/Data/edges")
    val sameGraph = GraphFrame(verticesDF, edgesDF)

  }

关键部分整理:
1)import org.graphframes._
2)创建语句:val graph = GraphFrame(vertex, edges)
3)

   vertices.show()        //是个普通的DataFrame
   graph.vertices.show()  // 是GraphFrame的

4)graph.find("(a)-[e]->(b); (b)-[e2]->(a)")
This is GraphFrame-based motif finding uses DataFrame-based DSL for finding structural patterns.
It will search for pairs of vertices a, and b, connected by edges in both directions. It will return a DataFrame of all such structures in the graph with columns for each of the named elements (vertices or edges) in the motif.
可以参考neo4j链接
5)查询示例:
motifs.filter("b.age > 30").show()
6)读写文件:
Since GraphFrames are built on top of DataFrames, they inherit all DataFrame-supported DataSources. You can write GraphFrames to the Parquet, JSON, and CSV formats. 文件位置可以是本地(file:///…),可以是HDFS等。

graph.vertices.write.parquet("vertices")  // hdfs上
spark.read.parquet("vertices")
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值