体验第一个spark程序
1、在spark目录执行
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://hadoop01:7077 \
--executor-memory 1G \
--total-executor-cores 1 \
examples/jars/spark-examples_2.11-2.3.2.jar \
10
–master spark://hadoop01:7077 :指定Master的地址是hadoop01
–executor-memory 1G :指定executor的内存为1G
–total-executor-cores 1 :指定每个executor使用的CPU核心数为1
2、 启动spark-shell
bin/spark-shell
3、 运行Spark-Shell读取HDFS文件
建立words.txt文件
cd /export/servers/spark
vi words.txt
4、在hadoop01先建立文件夹 /spark/test,将words.txt上传至 /spark/test
hadoop fs -mkdir -p /spark/test
hadoop fs -put words.txt /spark/test
5、整合Spark和HDFS
cd conf
vi spark.env
6、添加:export HADOOP_CONF_DIR=/export/servers/hadoop-2.7.4/etc/hadoop
分发
scp spark-env.sh hadoop02:/export/servers/spark/conf
scp spark-env.sh hadoop03:/export/servers/spark/conf
7、重启hadoop服务
stop-all.sh
start-all.sh
8、重启spark服务,spark/sbin下
stop-all.sh
start-all.sh
9、启动spark-shell
bin/spark-shell --master local[2]
10、编写Scala代码实现单词次数统计
sc.textFile("/spark/test/words.txt").flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).collect
11、退出Spark-shell
:quit