[hadoop@hadoop01 ~]$ hdfs dfs -cat /user/hadoop/worddir/word.txt
Jason chen welcome TJ
TJ welcome Jason
Bye Bye
Bye
3. 编写WorldCount 程序
创建一个 Scala 的 object : WorldCount
package com.jason
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object WordCount {
def main(args: Array[String]): Unit ={//初始化conf配置
val sparkConf =newSparkConf().setAppName("WordCount sample")// .setMaster("192.168.1.10:4040").setMaster("local[2]").set("spark.testing.memory","2147480000");
val sc =newSparkContext(sparkConf);
val rdd = sc.textFile("/user/hadoop/worddir/word.txt");
val tupleRDD = rdd.flatMap(line =>{line.split(" ").toList.map(word =>(word.trim,1))});
val resultRDD :RDD[(String,Int)]=tupleRDD.reduceByKey((a,b)=> a + b);
resultRDD.foreach(elm =>println(elm._1+"="+elm._2));
Thread.sleep(10000);
sc.stop();}}
4. 查看输出:
18/12/16 16:19:28 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 93 ms
18/12/16 16:19:28 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 94 ms
Jason=2
TJ=2
Bye=3
chen=1
welcome=2
18/12/16 16:19:28 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 1138 bytes result sent to driver
18/12/16 16:19:28 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 1138 bytes result sent to driver