在idea中完成wordcount程序编写、在windows本机和集群环境下分别运行
Windows本机运行
scala程序如下:
object SparkWorldCount {
def main(args: Array[String]): Unit = {
//1、做出sc
val conf = new SparkConf()
conf.setMaster("local")
conf.setAppName("wc")
val sc = new SparkContext(conf)
//2、操作sc,做出RDD
val lines:RDD[String]=sc.textFile(path =
“hdfs://hadoop131:9000//wuzhishan/futoubang/hi.txt”)
//3、操作RDD count
// val resCount:Long = lines.count()
// println(resCount)
//4、字数统计
val words =
lines.flatMap(_.split(","))
val wordAndOne:RDD[(String,Int)]=
words.map((_,1))
val wordAndCount:RDD[(String,Int)]=
wordAndOne.reduceByKey(_ + _)
wordAndCount.saveAsTextFile(path =
“hdfs://hadoop131:9000/out”)
sc.stop()
}
}
本地运行到hdfs上,查看输出处理后的结果
注意:在windows运行程序时,可能会报hadoop 安装文件夹目录的bin目录下缺少winutils.exe文件,只要下载到对应位置就行了。
链接:https://pan.baidu.com/s/1CnV4VbEEPWxjs_QNpayrpA
提取码:g55a
集群运行
将程序包上传到集群中
将打包好的程序包提交spark上运行
[root@hadoop131 spark]# bin/spark-submit --master yarn --class com.zpark.demo.WorkCount /root/yarn_sparkdemo-1.0-SNAPSHOT.jar
将打包好的程序包提交yarn上运行
[root@hadoop131 spark]# bin/spark-submit --master yarn --class com.zpark.demo.WorkCount /root/yarn_sparkdemo-1.0-SNAPSHOT.jar
查看运行结果
[root@hadoop132 ~]# hadoop fs -cat /out/part-00000