代码 object Scala02_WordCountOnYarn { def main(args: Array[String]): Unit = { val conf: SparkConf = new SparkConf().setAppName("wordcount") val sc = new SparkContext(conf) val resRDD: RDD[(String, Int)] = sc.textFile(args(0)).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_) resRDD.saveAsTextFile(args(1)) sc.stop() } } submit bin/spark-submit \ --class com.aura.spark.day01.Scala02_WordCountOnYarn \ --master yarn \ --executor-memory 2G \ --total-executor-cores 8 \ --deploy-mode cluster \ /home/hadoop/jar/WordCount.jar \ /word_in /word_out submit的参数解释 \表示换行。class后面是类的全类名。master指定yarn方式运行。executor-memory指定每个executor的可用内存。total-executor-cores指定所有executor的cpu核数。deploy-mode指定以cluster或者client模式运行。/home/hadoop/jar/WordCount.jar是本地jar包路径。/word_in是HDFS集群输入路径。/word_out是HDFS集群输出路径。