------------------------------------------------------------------
前言
1.Linux用户,centos,拥有root用户的命令操作权限
2.安装之前,创建/soft
sudo mkdir /soft
chown -r centos:centos /soft
------------------------------------------------------------------
1.下载地址
http://mirrors.shu.edu.cn/apache/
http://mirrors.tuna.tsinghua.edu.cn/apache/
目标版本:
spark-2.2.3-bin-hadoop2.7.tgz
------------------------------------------------------------------
2. 用SourceCRT将spark-2.2.3-bin-hadoop2.7.tgz 从Mac拷到Linux上,并解压,创建软链接
SourceCRT的操作,看:https://blog.csdn.net/With__Sunshine/article/details/88534083
put /Users/mac/Downloads/spark-2.2.3-bin-hadoop2.7.tgz /home/centos
tar -zxvf /home/centos/spark-2.2.3-bin-hadoop2.7.tgz -C /soft
cd /soft
ln -s spark-2.2.3-bin-hadoop2.7 spark
------------------------------------------------------------------
3.修改配置文件名称
cd /soft/sprark/conf
mv slaves.templete slaves
mv spark- env.sh .templete spark-env.sh
------------------------------------------------------------------
4.修改slaves文件,添加worker节点
vim slaves
添加的内容
s101
s102
s103
s104
------------------------------------------------------------------
5.修改spark-env.sh文件,添加如下配置:
vim spark-env.sh
添加的内容:
SPARK_MASTER_HOST=s101
SPARK_MASTER_PORT=7077
------------------------------------------------------------------
6.分发spark包
cd /soft
scp -r spark-2.2.3-bin-hadoop2.7/ centos@s102:/soft
scp -r spark-2.2.3-bin-hadoop2.7/ centos@s103:/soft
scp -r spark-2.2.3-bin-hadoop2.7/ centos@s104:/soft
ssh s102
cd /soft
ln -s spark-2.2.3-bin-hadoop2.7 spark
ssh s103
cd /soft
ln -s spark-2.2.3-bin-hadoop2.7 spark
ssh s104
cd /soft
ln -s spark-2.2.3-bin-hadoop2.7 spark
------------------------------------------------------------------
7.分别在s101,s102,s103,s104上配置环境变量
sudo vim /etc/profile
添加如下内容:
#SPARK_HOME
export SPARK_HOME=/soft/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
添加完后
source /etc/profile
------------------------------------------------------------------
8.配置JAVA_HOME,分别在s101,s102,s103,s104上,
vim /soft/spark/sbin/spark-config.sh
在最后一行添加:
export JAVA_HOME=/soft/jdk
备注:如果不添加的话,/soft/spark/sbin/start-all.sh后,会提示如下错误信息:
s101: failed to launch: nice -n 0 /soft/spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://s201:7077
s101: JAVA_HOME is not set
参考:https://blog.csdn.net/a904364908/article/details/81281995
------------------------------------------------------------------
9.提交任务,执行程序
cd /soft/spark
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://s101:7077
--executor-memory 1G \
--total-executor-cores 2 \
./examples/jars/spark-examples_2.11-2.2.3.jar \
100
备注:
1 提交命令每个参数 \ 换行,最后一个参数不要 \ 结尾。
2 如果提交的任务所需太多资源,可能会出现以下异常提示:
19/03/17 19:24:05 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
org.apache.spark.SparkException: Could not find AppClient。
解决办法,参考:https://blog.csdn.net/u013709270/article/details/78879869
------------------------------------------------------------------
10.启动spark shell
/soft/spark/bin/spark-shell \
--master spark://s201:7077 \
--executor-memory 1g \
--total-executor-cores 2
示例:
scala>sc.textFile("./wordcount.txt")
.flatMap(_.split(" "))
.map((_,1))
.reduceByKey(_+_)
.collect
------------------------------------------------------------------
11.在scala命令行模式下,查看帮助及退出等命令行操作:
scala>:help
scala>:quit