1、安装Scala、Spark
解压scala、spark:
tar -zxvf scala-2.11.8.tgz
tar -zxvf spark-2.4.4-bin-hadoop2.6.tgz
2、环境变量配置
配置环境变量(master、slave1、slave2分别执行):
export SCALA_HOME=/usr/local/src/scala-2.11.8
export SPARK_HOME=/usr/local/src/spark-2.4.4-bin-hadoop2.6
export PATH=$PATH:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin
注:重新加载环境变量:source ~/.bashrc
3、SPARK配置文件修改
修改spark配置文件(spark-env.sh、slaves):
进入spark的conf目录: cd spark-2.4.4-bin-hadoop2.6/conf/
将spark-env.sh.template、slaves.template分别替换为spark-env.sh、slaves:
mv spark-env.sh.template spark-env.sh
mv slaves.template slaves
配置slaves:命令行输入 vim slaves
将最后一行localhost改为slave1 、slave2
配置spark-env.sh:
因为之前搭建hadoop过程中hosts中有域名转换,直接填Master:
查看IP与域名映射:命令行输入 vim hosts
XX.XX.XX.XX Master
XX.XX.XX.XX slave1
XX.XX.XX.XX slave2
4、Spark从节点配置
scala\spark拷贝到slave1、slave2节点:
scp -rp spark-2.4.4-bin-hadoop2.6/ root@slave1:/usr/local/src/
scp -rp spark-2.4.4-bin-hadoop2.6/ root@slave2:/usr/local/src/
scp -rp scala-2.11.8/ root@slave2:/usr/local/src/
scp -rp scala-2.11.8/ root@slave2:/usr/local/src/
5、启动spark
需先启动hadoop:
cd Hadoop-2.6.5/
cd sbin/
./start-all.sh
再启动spark:
cd spark-2.4.4-bin-hadoop2.6/
cd sbin/
./start-all.sh
Master上查看服务进程:
jps:
Master
NameNode
secondaryNameNode
Jps
ResourceManager
从节点查看服务进程:
jps:
Worker
Jps
NodeManager
DataNode
6、浏览器访问
Master:8080
1、 验证
(1)本地模式:
bin目录下执行:
2、 ./run-example SparkPi 10 --master local[2]
(2)集群Standlone
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://master:7077 examples/jars/spark-examples_2.11-2.4.4.jar 100
(3) spark on yarn
spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster examples/jars/spark-examples_2.11-2.4.4.jar 100