目录
一、准备工作
1、准备三台服务器(虚拟机):
weekend110 | 192.168.2.100 |
weekend01 | 192.168.2.101 |
weekend02 | 192.168.2.102 |
2、Hadoop已经安装好并能正常启动
二、安装部署
1、先在一台机器(weekend110)上安装Scala和Spark
安装Scala:
官网下载安装包并上传到虚拟机,然后解压:tar -zxvf soft/scala-2.11.8.tgz -C /home/hadoop/app
配置到环境变量:
export SCALA_HOME=/home/hadoop/app/scala-2.11.8
export PATH=$PATH:$SCALA_HOME/bin
# 虽然spark本身自带scala,但还是建议安装
安装Spark:
官网下载安装包并上传到虚拟机,然后解压:tar -zxvf soft/spark-2.4.0-bin-hadoop2.7.tgz -C /home/hadoop/app
配置到环境变量:
export SPARK_HOME=/home/hadoop/app/spark-2.4.0-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
修改配置文件:
1)修改spark-env.sh
[hadoop@weekend110 ~]$ cd /home/hadoop/app/spark-2.4.0-bin-hadoop2.7/conf
[hadoop@weekend110 conf]$ cp spark-env.sh.template spark-env.sh
[hadoop@weekend110 conf]$ vi spark-env.sh
[hadoop@weekend110 conf]$ grep -Ev "^$|#" /home/hadoop/app/spark-2.4.0-bin-hadoop2.7/conf/spark-env.sh
HADOOP_CONF_DIR=/home/hadoop/app/hadoop-2.7.0/etc/hadoop/
JAVA_HOME=/home/hadoop/app/jdk1.8.0_231
SCALA_HOME=/home/hadoop/app/scala-2.11.8
SPARK_MASTER_HOST=weekend110
SPARK_MASTER_PORT=8080
SPARK_MASTER_WEBUI_PORT=7077
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=1g
SPARK_WORKER_PORT=8081
SPARK_WORKER_WEBUI_PORT=7078
YARN_CONF_DIR=/home/hadoop/app/hadoop-2.7.0/etc/hadoop
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080
-Dspark.history.retainedApplications=30
-Dspark.history.fs.logDirectory=hdfs://weekend110:9000/test"
2)修改slaves配置文件
[hadoop@weekend110 conf]$ vi slaves
[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ grep -Ev "^$|#" /home/hadoop/app/spark-2.4.0-bin-hadoop2.7/conf/slaves
weekend01
weekend02
3)配置JobHistoryServer
[hadoop@weekend110 conf]$ mv spark-defaults.conf.template spark-defaults.conf
[hadoop@weekend110 conf]$ vi spark-defaults.conf
[hadoop@weekend110 conf]$ grep -Ev "^$|#" /home/hadoop/app/spark-2.4.0-bin-hadoop2.7/conf/spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://weekend110:9000/test
spark.yarn.historyServer.address=weekend110:18080
spark.history.ui.port=18080
注意:HDFS上的目录需要提前创建。
参数描述:
spark.eventLog.dir:Application在运行过程中所有的信息均记录在该属性指定的路径下;
spark.history.ui.port=18080 WEBUI访问的端口号为18080
spark.history.fs.logDirectory=hdfs://weekend110:9000/test 配置了该属性后,在start-history-server.sh时就无需再显式的指定路径,Spark History Server页面只展示该指定路径下的信息
spark.history.retainedApplications=30指定保存Application历史记录的个数,如果超过这个值,旧的应用程序信息将被删除,这个是内存中的应用数,而不是页面上显示的应用数。
4)分别分发到 weekend01 和 weekend02 这两台机器上:
[hadoop@weekend110 app]$ scp -r /home/hadoop/app/scala-2.11.8 hadoop@weekend01:/home/hadoop/app
[hadoop@weekend110 app]$ scp -r /home/hadoop/app/scala-2.11.8 hadoop@weekend02:/home/hadoop/app
[hadoop@weekend110 app]$ scp -r /home/hadoop/app/spark-2.4.0-bin-hadoop2.7 hadoop@weekend01:/home/hadoop/app
[hadoop@weekend110 app]$ scp -r /home/hadoop/app/spark-2.4.0-bin-hadoop2.7 hadoop@weekend02:/home/hadoop/app
5)分别在 weekend01 和 weekend02 上加载好环境变量,并source生效:
[hadoop@weekend110 app]$ scp /home/hadoop/.bash_profile hadoop@weekend01:/home/hadoop
[hadoop@weekend110 app]$ scp /home/hadoop/.bash_profile hadoop@weekend02:/home/hadoop
[hadoop@weekend01 app]$ source /home/hadoop/.bash_profile
[hadoop@weekend02 app]$ source /home/hadoop/.bash_profile
三、启动服务和验证
1、修改事宜:
为了避免和hadoop中的start/stop-all.sh脚本发生冲突,将spark/sbin/下的start/stop-all.sh脚本进行重命名
[hadoop@weekend110 sbin]$ mv start-all.sh start-spark-all.sh
[hadoop@weekend110 sbin]$ mv stop-all.sh stop-spark-all.sh
2、启动spark,在启动之前先启动Hadoop服务(HDFS+YARN):
[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ sbin/start-spark-all.sh
启动完spark之后,
会在我们配置的主节点weekend110上启动一个进程Master
会在我们配置的从节点weekend01上启动一个进程Worker
会在我们配置的从节点weekend02上启动一个进程Worker
3、启动历史服务:
[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ sbin/start-history-server.sh
4、进程服务都起来了:
[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ jps
3920 ResourceManager
4034 NodeManager
3763 SecondaryNameNode
5187 Master
3588 DataNode
5269 HistoryServer
3452 NameNode
7375 Jps
5、执行spark自带的测试任务
[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
./examples/jars/spark-examples_2.11-2.1.1.jar \
100
6、查看结果
[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ bin/spark-submit \
> --class org.apache.spark.examples.SparkPi \
> --master yarn \
> --deploy-mode client \
> ./examples/jars/spark-examples_2.11-2.4.0.jar \
> 100
20/07/31 02:16:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/07/31 02:16:24 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Pi is roughly 3.1418083141808313