以WordCount程序为例进行说明
首先 在idea中编写好WordCount的代码
package day0628
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object WordCount extends App {
// if(args.length <1){
// println("参数必须传递 local yarn")
// System.exit(-1)
// }
// private var sparkConf: SparkConf = null;
// if(args(0)== "local"){
// sparkConf = new SparkConf()
// .setMaster("local[4]").setAppName("wc")}
// else{
// sparkConf = new SparkConf()
// .setMaster("yarn").setAppName("wc")
// }
private val sparkConf: SparkConf = new SparkConf()
private val sc = new SparkContext(sparkConf)
private val result: RDD[(String, Int)] = sc.textFile(args(0))
.flatMap(x => x.split(" ")).map((_, 1)).reduceByKey(_ + _)
result.collect().foreach(println)
sc.stop()
}
/*
(hive,2)
(spark,5)
(hadoop,4)
(mr,4)
*/
其次 运行后打包程序,将生成的jar包拉进集群上
然后在集群上执行spark命令,将程序提交到yarn上运行
代码如下:
spark-submit \
--class day0628.WordCount \
--master yarn \
--name wc \
/tmp/b07/Spark-1.0-SNAPSHOT-jar-with-dependencies.jar /kgc/wc.txt