1、WordCount程序代码
package com.first
import org.apache.spark.SparkContext
import SparkContext._
import org.apache.spark.SparkConf
object WordCount {
def main(args: Array[String]) {
if (args.length != 2){
println("usage is org.test.WordCount <input> <output>")
return
}
val conf = new SparkConf()
val sc = new SparkContext(conf)
//val sc = new SparkContext(args(0), "WordCount",
// System.getenv("SPARK_HOME"), Seq(System.getenv("SPARK_TEST_JAR")))
val textFile = sc.textFile(args(0))
val result = textFile.flatMap(line => line.split("\\s+"))
.map(word => (word, 1)).reduceByKey(_ + _)
result.saveAsTextFile(args(1))
//result.foreach(f=>println)
sc.stop
}
}
2、通过spark-submit提交作业
在终端进去spark的bin目录下执行(多种执行方式可以参考点击打开链接)
./spark-submit --name WordCount1 --class com.first.WordCount --master yarn-cluster /home/hadoop/wangqiujie/wordcount2.jar wanginput/word.txt (此为相对路径)wangoutput(此为相对路径)
3、运行中出现了异常Exception in createBlockOutputStream
原因是229那个节点的防火墙没有关闭。(常见异常可以参考点击打开链接)
4、关闭后再执行2、中的脚本成功,并可以在hdfs中查看运行结果