以常用的几行代码为例,从源码中详细解读Spark的运行流程。
算法代码在spark shell中如下:
(1)val lines = sc.textFile("README.md")
(2)val words = lines.flatMap(x => x.split(" "))
(3)val wordCounts = words.map(x => (x, 1))
(4)val cacheCounts = wordCounts.cache()
(5)val reduced = cacheCounts.reduceByKey((a, b) => a + b)
(6)reduced.saveAsTextFile("haha")
(未完待续)