2021SC@SDUSC
目录
应用举例
Spark自带的例子LiveJoumalPageRank演示了 PageRank。根据其代码注释知道,需要去 网址:http://snap.stanford.edu/data/soc-LiveJoumal 1 .html 下载构建图需要的数据集。
LiveJoumalPageRank 需要以下参数:
数据集文件:下载后的文件soc-LiveJoumall.txt.gz解压后的文件路径为D:\soc-Live-
Joumall.txt;
- 输出文件:-output=<output_file>选项指定;
分区数:通过—numEPart=<num_edge_partitions> 选项指定;
- 分区策略:—partStrategy 选项指定,可 以选择 RandomVertexCutx EdgePartition 1D、 EdgePartition2D 和 CanonicalRandomVertexCut 中的任意一个。
LiveJoumalPageRank 的实现,见下面代码。
object LiveJoumalPageRank ( def main(args: Array[String])
( if (args.length < 1) { System.exit(-1) }
Analytics.main(args.patch(0, List(npagerankn), 0))
其中实际调用了 Analytics的main函数,其中根据taskType分别执行PagcRank、Connected Components. Triangle Count的例子。我们只列出其中PageRank相关的代码,见下面代码。
case "pagerank** =>
val tol = options.remove("tol").map(_.toFloat).getOrElse(0.001F)
val outFname = options.remove (**outputn) .getOrElse (**'*)
val numlterOpt = options .remove (''numlter1') .map (_.tolnt)
options.foreach (
case (opt, _) =>
throw new IllegalArgumentException("Invalid option: " + opt) }
printin (" ========================== «)
printin(MI PageRank |")
printin ("============================")
val sc = new SparkContext(conf.setAppName("PageRank(" + fname +
setMaster("local[2]n));
val unpartitionedGraph = GraphLoader.edgeListFile
(sc, fname, numEdgePartitions = numEPart, edgestorageLevel
= edgeStorageLevel, vertexStorageLevel
= vertexStorageLevel).cache()
val graph = partitionstrategy.foldLeft(unpartitionedGraph)
(_.partitionBy(_)) printin (nGRAPHX: Number of vertices **
+ graph. vertices . count)
printIn("GRAPHX: Number of edges " + graph.edges.count)
val pr = (numlterOpt match (
case Some(numlter) => PageRank.run(graph, numlter) c
ase None => PageRank.runUntilConvergence(graph, tol) })
.vertices.cache () printin
(**GRAPHX: Total rank:"十 pr.map (_._2) .reduce (_ + _))
if (!outFname.isEmpty) {
logWarning (MSaving pageranks of pages to '* + outFname)