array是一个数组 写法1: array.flatMap(_.split(" ")).groupBy(x => x).mapValues(_.length).toList.sortBy(x => - x._2)
写法2: array.map(_.split(" ")).flatten.groupBy(x => x).map(mkv => (mkv._1,mkv._2.length)).toList.sortBy(x => - x._2)
spark wordCount
启动hdfs,读取目录wc中的文件,做统计排序
sc.textFile("hdfs://master:9000/wc").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).sortBy(_._2,false).collect