需求,从文件中读取单词,计算单词的数量,并按照数量从大到小排序。
文件内容
wc.txt
tom jerry
henry jim
suse lusy
aaa bbb
ccc ddd
aaa eee
ccc eee
tom jim
henry jim
jim tom
代码内容:
import org.apache.log4j.{Level, Logger}
import org.apache.spark.{SparkConf, SparkContext}
object wordcount {
Logger.getLogger("org").setLevel(Level.WARN)
Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local").setAppName("wordcount")
conf.set("spark.testing.memory", "2147480000")
val sc = new SparkContext(conf)
val inputFile = "D:\\dev\\bigdata\\wc.txt"
val rdd = sc.textFile(inputFile)
val result = rdd.flatMap(_.split(" ")).filter(_.size>0).map((_,1)).reduceByKey(_+_).sortBy(_._2,false)
result.foreach(println)
}
}
输出结果:
(jim,4)
(tom,3)
(eee,2)
(ccc,2)
(aaa,2)
(henry,2)
(bbb,1)
(ddd,1)
(lusy,1)
(jerry,1)
(suse,1)