1.第一个Spark程序:WordCount
第一步:创建sparkContext
setMaster:运行模式
setAppName:设置appName
val sparkConf = new SparkConf().setMaster("local").setAppName("SparkWordCountApp")
val sc = new SparkContext(sparkConf)
sortByKey(false):通过key进行排序,true为升序,false为降序
saveAsTextFile:将rdd保存至目标目录
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setMaster("local").setAppName("SparkWordCountApp")
val sc = new SparkContext(sparkConf)
val rdd = sc.textFile("file:///home/hadoop/IdeaProjects/sparksql-train/data/in.txt")
/**
* 结果按照单词的出现的个数的降序排列
*/
rdd.flatMap(_.split(",")).map(word => (word, 1))
.reduceByKey(_+_).map(x => (x._2, x._1)).sortByKey(false)
.map(x=> (x._2, x._1))
.saveAsTextFile("file:///home/hadoop/IdeaProjects/sparksql-train/data/out")
sc.stop()
}
2.local模式下spark-shell使用
启动:
spark-shell --master (需要运行的模式)
[hadoop@hadoop000 shell] $ spark-shell --master local
定义一个rdd:
scala> val rdd = sc.textFile(“file:///home/hadoop/data/a.txt”)
rdd: org.apache.spark.rdd.RDD[String] = file:///home/hadoop/data/a.txt MapPartitionsRDD[3] at textFile at :24
结果输出:
scala> rdd.collect
res2: Array[String] = Array(welcome flink flink, zwb welcome)
WordCount操作并输出
scala> rdd.flatMap(.split(" ")).map(word => (word, 1)).reduceByKey(+_).map(x => (x._2, x._1)).sortByKey(false).map(x=> (x._2, x._1)).collect()
res5: Array[(String, Int)] = Array((welcome flink flink,1), (zwb welcome,1))
3.local模式下使用spark-submit提交应用程序
先看一下官方文档吧
spark-submit官网地址
./bin/spark-submit
–class [main-class]
–master [master-url]
–deploy-mode [deploy-mode]
–conf [key=value]
… # other options
[application-jar]
[application-arguments]
接下来进行编写
./bin/spark-submit
–class com.imooc.bigdata.SparkWordCountAppV2
–master local
/home/hadoop/data/jars/sparksql-train-1.0.jar
file:///home/hadoop/IdeaProjects/sparksql-train/data/in.txt file:///home/hadoop/IdeaProjects/sparksql-train/data/out
运行成功!!
4.yarn模式下使用spark-submit提交应用程序
spark-submit官网地址
首先要注意配置HADOOP_HOME_DIR地址,建议配置到环境变量(上面代码复制到环境变量)
这里是引用
export HADOOP_HOME_DIR=/home/hadoop/app/hadoop/etc/hadoop
官网模板:
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi
–master yarn
–deploy-mode cluster
–driver-memory 4g
–executor-memory 2g
–executor-cores 1
–queue thequeue
examples/jars/spark-examples*.jar
10
编写代码:
./bin/spark-submit
–class com.imooc.bigdata.SparkWordCountAppV2
–master yarn
–name SparkWordCountAppV2
/home/hadoop/data/jars/sparksql-train-1.0.jar
file:///home/hadoop/IdeaProjects/sparksql-train/data/in.txt file:///home/hadoop/IdeaProjects/sparksql-train/data/out
运行成功!!
4.standalone模式下使用spark-submit提交应用程序
1.修改conf/slaves.template 模板,创建一个slaves
cp slaves.template slaves
在最后一行位置添加机器地址
2.修改conf/spark-env.sh.template 模板,创建一个spark-env.sh
cp spark-env.sh.template spark-env.sh
我们在这里面添加主机地址就行了
3.启动spark
./sbin/start-all.sh
4.接下来进行编写spark-submit
./bin/spark-submit
–class com.imooc.bigdata.SparkWordCountAppV2
–master spark://192.168.0.133:7077
/home/hadoop/data/jars/sparksql-train-1.0.jar
file:///home/hadoop/IdeaProjects/sparksql-train/data/in.txt file:///home/hadoop/data/out
运行查看输出目录:
完成