Spark操作之Demo编写及提交任务
1 运行spark自带测试程序包
计算pi值
yarn模式提交任务(CDH采用此方式)
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
/opt/cloudera/parcels/CDH/lib/spark/examples/jars/spark-examples_2.11-2.2.0-cdh6.0.1.jar \
100
2 创建sparkdemo项目WordCount
2.1 主要依赖
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
2.2 WordCount代码
package com.nml
import org.apache.spark.{SparkConf, SparkContext}
object WordCount {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster("local[*]").setAppName("WordCount")
val sc = new SparkContext(conf)
//file://表示读取本地文件
// 在node3本地新建文件/opt/nml/wc.txt,写入几个单词
val lines = sc.textFile("file:///opt/nml/wc.txt")
val words = lines.flatMap(_.split(" "))
val wordOne = words.map((_,1))
val wordSum = wordOne.reduceByKey(_+_)
val result = wordSum.collect()
result.foreach(println)
}
}
2.3 maven打包代码
package
Spark01-1.0-SNAPSHOT.jar
Xftp向node3传文件
2.4 提交任务执行代码
在[root@node3 ~]# 下
spark-submit \
--class com.nml.WordCount \
--master yarn \
--deploy-mode client \
/opt/nml/Spark01-1.0-SNAPSHOT.jar
或者
spark-submit \
--class com.nml.WordCount \
--num-executors 3 \
--driver-memory 1G \
--executor-memory 1G \
--executor-cores 3 \
/opt/nml/Spark01-1.0-SNAPSHOT.jar
2.5 执行结果
20/05/25 15:12:32 INFO scheduler.DAGScheduler: ResultStage 1 (collect at WordCount.scala:15) finished in 0.055 s
20/05/25 15:12:32 INFO scheduler.DAGScheduler: Job 0 finished: collect at WordCount.scala:15, took 0.397266 s
(scala,2)
(waterdrop,1)
(spark,3)
(hadoop,2)
20/05/25 15:12:32 INFO spark.SparkContext: Invoking stop() from shutdown hook