Spark(1)分布式集群安装部署与验证测试

目录

一、准备工作

二、安装部署

三、启动服务和验证


一、准备工作

1、准备三台服务器(虚拟机):

weekend110192.168.2.100
weekend01192.168.2.101
weekend02192.168.2.102

2、Hadoop已经安装好并能正常启动

二、安装部署

1、先在一台机器(weekend110)上安装Scala和Spark

安装Scala:

官网下载安装包并上传到虚拟机,然后解压:tar -zxvf soft/scala-2.11.8.tgz -C /home/hadoop/app
配置到环境变量:
export SCALA_HOME=/home/hadoop/app/scala-2.11.8
export PATH=$PATH:$SCALA_HOME/bin
# 虽然spark本身自带scala,但还是建议安装

安装Spark:

官网下载安装包并上传到虚拟机,然后解压:tar -zxvf soft/spark-2.4.0-bin-hadoop2.7.tgz -C /home/hadoop/app
配置到环境变量:
export SPARK_HOME=/home/hadoop/app/spark-2.4.0-bin-hadoop2.7
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

修改配置文件:

1)修改spark-env.sh

[hadoop@weekend110 ~]$ cd /home/hadoop/app/spark-2.4.0-bin-hadoop2.7/conf
[hadoop@weekend110 conf]$ cp spark-env.sh.template spark-env.sh
[hadoop@weekend110 conf]$ vi spark-env.sh
[hadoop@weekend110 conf]$ grep -Ev "^$|#" /home/hadoop/app/spark-2.4.0-bin-hadoop2.7/conf/spark-env.sh 
HADOOP_CONF_DIR=/home/hadoop/app/hadoop-2.7.0/etc/hadoop/
JAVA_HOME=/home/hadoop/app/jdk1.8.0_231
SCALA_HOME=/home/hadoop/app/scala-2.11.8
SPARK_MASTER_HOST=weekend110
SPARK_MASTER_PORT=8080
SPARK_MASTER_WEBUI_PORT=7077
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=1g
SPARK_WORKER_PORT=8081
SPARK_WORKER_WEBUI_PORT=7078
YARN_CONF_DIR=/home/hadoop/app/hadoop-2.7.0/etc/hadoop
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 
-Dspark.history.retainedApplications=30 
-Dspark.history.fs.logDirectory=hdfs://weekend110:9000/test"

2)修改slaves配置文件

[hadoop@weekend110 conf]$ vi slaves 
[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ grep -Ev "^$|#" /home/hadoop/app/spark-2.4.0-bin-hadoop2.7/conf/slaves 
weekend01
weekend02

3)配置JobHistoryServer

[hadoop@weekend110 conf]$ mv spark-defaults.conf.template spark-defaults.conf

[hadoop@weekend110 conf]$ vi spark-defaults.conf
[hadoop@weekend110 conf]$ grep -Ev "^$|#" /home/hadoop/app/spark-2.4.0-bin-hadoop2.7/conf/spark-defaults.conf
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://weekend110:9000/test
spark.yarn.historyServer.address=weekend110:18080
spark.history.ui.port=18080

注意:HDFS上的目录需要提前创建。

参数描述:

spark.eventLog.dir:Application在运行过程中所有的信息均记录在该属性指定的路径下;

spark.history.ui.port=18080  WEBUI访问的端口号为18080

spark.history.fs.logDirectory=hdfs://weekend110:9000/test 配置了该属性后,在start-history-server.sh时就无需再显式的指定路径,Spark History Server页面只展示该指定路径下的信息

spark.history.retainedApplications=30指定保存Application历史记录的个数,如果超过这个值,旧的应用程序信息将被删除,这个是内存中的应用数,而不是页面上显示的应用数。

4)分别分发到 weekend01 和 weekend02 这两台机器上:

[hadoop@weekend110 app]$ scp -r /home/hadoop/app/scala-2.11.8 hadoop@weekend01:/home/hadoop/app
[hadoop@weekend110 app]$ scp -r /home/hadoop/app/scala-2.11.8 hadoop@weekend02:/home/hadoop/app

[hadoop@weekend110 app]$ scp -r /home/hadoop/app/spark-2.4.0-bin-hadoop2.7 hadoop@weekend01:/home/hadoop/app
[hadoop@weekend110 app]$ scp -r /home/hadoop/app/spark-2.4.0-bin-hadoop2.7 hadoop@weekend02:/home/hadoop/app

5)分别在 weekend01 和 weekend02 上加载好环境变量,并source生效:

[hadoop@weekend110 app]$ scp /home/hadoop/.bash_profile hadoop@weekend01:/home/hadoop
[hadoop@weekend110 app]$ scp /home/hadoop/.bash_profile hadoop@weekend02:/home/hadoop

[hadoop@weekend01 app]$ source /home/hadoop/.bash_profile
[hadoop@weekend02 app]$ source /home/hadoop/.bash_profile

三、启动服务和验证

1、修改事宜:
为了避免和hadoop中的start/stop-all.sh脚本发生冲突,将spark/sbin/下的start/stop-all.sh脚本进行重命名

[hadoop@weekend110 sbin]$ mv start-all.sh start-spark-all.sh
[hadoop@weekend110 sbin]$ mv stop-all.sh stop-spark-all.sh

2、启动spark,在启动之前先启动Hadoop服务(HDFS+YARN):

[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ sbin/start-spark-all.sh

启动完spark之后,

会在我们配置的主节点weekend110上启动一个进程Master
会在我们配置的从节点weekend01上启动一个进程Worker
会在我们配置的从节点weekend02上启动一个进程Worker

3、启动历史服务:

[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ sbin/start-history-server.sh

4、进程服务都起来了:

[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ jps
3920 ResourceManager
4034 NodeManager
3763 SecondaryNameNode
5187 Master
3588 DataNode
5269 HistoryServer
3452 NameNode
7375 Jps

5、执行spark自带的测试任务

[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
./examples/jars/spark-examples_2.11-2.1.1.jar \
100

6、查看结果

[hadoop@weekend110 spark-2.4.0-bin-hadoop2.7]$ bin/spark-submit \
> --class org.apache.spark.examples.SparkPi \
> --master yarn \
> --deploy-mode client \
> ./examples/jars/spark-examples_2.11-2.4.0.jar \
> 100
20/07/31 02:16:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/07/31 02:16:24 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Pi is roughly 3.1418083141808313

http://192.168.2.100:8088/

http://192.168.2.100:18080/

http://192.168.2.100:50070/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值