创建图模型的方法有很多,除了GraphLoader.edgeListFile()外,还可以通过Graph.from.EdgeTuples方法来创建图。参考:
http://stackoverflow.com/questions/32396477/how-to-create-a-graph-from-a-csv-file-using-graph-fromedgetuples-in-spark-scala/32477121#32477121
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.graphx.GraphLoader
import scala.util.MurmurHash
import org.apache.spark.graphx.Graph
import org.apache.spark.rdd.RDD
import org.apache.spark.graphx.VertexId
object GraphFromFile {
def main(args: Array[String]) {
//create SparkContext
val sparkConf = new SparkConf().setAppName("GraphFromFile").setMaster("local[*]")
val sc = new SparkContext(sparkConf)
// read your file
/*suppose your data is like
v1 v3
v2 v1
v3 v4
v4 v2
v5 v3
*/
val file = sc.textFile("src/main/resources/textFile1.csv");
// create edge RDD of type RDD[(VertexId, VertexId)]
val edgesRDD: RDD[(VertexId, VertexId)] = file.map(line => line.split(" "))
.map(line =>
(MurmurHash.stringHash(line(0).toString), MurmurHash.stringHash(line(1).toString)))
// create a graph
val graph = Graph.fromEdgeTuples(edgesRDD, 1)
// you can see your graph
graph.triplets.collect.foreach(println)
}
}
MurmurHash.stringHash is used because file contains vertex in form of String . If its of Numeric type then it wont be required
操作:1. 查看节点数目graph.vertices.count.2.查看边数目graph.edges.count