在E-MapReduce集群内运行Spark GraphX作业

最新推荐文章于 2024-11-01 10:39:50 发布

weixin_34232744

最新推荐文章于 2024-11-01 10:39:50 发布

阅读量152

点赞数

文章标签： scala 大数据人工智能

原文链接：https://yq.aliyun.com/articles/140685

版权

Spark GraphX是一个比较流行的图计算框架，如果你使用了阿里云的E-MapReduce服务，可以很方便的运行图计算的作业。

下面以PageRank为例，看看如何运行GraphX作业。这个例子来自Spark官方的example（examples/src/main/scala/org/apache/spark/examples/graphx/PageRankExample.scala），直接调用GraphOps的pageRank方法，计算出ranks：

object PageRankExample {
  def main(args: Array[String]): Unit = {
    // Creates a SparkSession.
    val spark = SparkSession
      .builder
      .appName(s"${this.getClass.getSimpleName}")
      .getOrCreate()
    val sc = spark.sparkContext

    // $example on$
    // Load the edges as a graph
    val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
    // Run PageRank
    val ranks = graph.pageRank(0.0001).vertices
    // Join the ranks with the usernames
    val users = sc.textFile("data/graphx/users.txt").map { line =>
      val fields = line.split(",")
      (fields(0).toLong, fields(1))
    }
    val ranksByUsername = users.join(ranks).map {
      case (id, (username, rank)) => (username, rank)
    }
    // Print the result
    println(ranksByUsername.collect().mkString("\n"))
    // $example off$
    spark.stop()
  }
}

下面来看如何运行这个example，首先要登录E-MapReduce程序的Master节点，依次运行如下命令：

cd /usr/lib/spark-current
hadoop fs -mkdir -p data
hadoop fs -put data/graphx data/
run-example graphx.PageRankExample

等待作业提交之后，最后运行结果打印：

(justinbieber,0.15)
(matei_zaharia,0.7013599933629602)
(ladygaga,1.390049198216498)
(BarackObama,1.4588814096664682)
(jeresig,0.9993442038507723)
(odersky,1.2973176314422592)