Spark Standalone模式集群环境搭建
Spark Standalone模式的搭建需要在集群的每个节点都安装Spark,集群角色分配如下表:
节点 | 角色 |
centoshadoop1 | Master |
centoshadoop2 | Worker |
centoshadoop3 | Worker |
centoshadoop4 | Worker |
一:下载scala安装包
下载地址如下:
https://www.scala-lang.org/download/2.12.7.html
执行以下命令安装scala
mkdir -p /home/hadoop/scala
解压scala-2.12.7.tgz安装包到安装目录scala
tar -zxvf ~/tools/scala-2.12.7.tgz -C /home/hadoop/scala/
配置scala的环境变量
vi ~/.bash_profile
# scala
export SCALA_HOME=/home/hadoop/scala/scala-2.12.7
export PATH=$PATH:$SCALA_HOME/bin
source ~/.bash_profile
在任意目录执行: scala -version
Scala code runner version 2.12.7 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.
在任意目录执行scala进入命令行模式
scala> val str:String="yanghong"
str: String = yanghong
二:下载spark
下载地址如下:
http://spark.apache.org/downloads.html
在 Choose a Spark release 中选择自己的版本2.4.5
在 Choose a package type 中选择2.7 and later
点击 Download Spark 后面的tgz文件下载即可
执行如下命令安装spark
mkdir -p /home/hadoop/spark
解压安装包spark-2.4.5-bin-hadoop2.7.tgz到~/spark安装目录
tar -zxvf spark-2.4.5-bin-hadoop2.7.tgz -C /home/hadoop/spark/
修改slaves配置文件
slave文件必须包含所有需要启动的Worker节点的主机名,且每个主机名占一行.
/home/hadoop/spark/spark-2.4.5-bin-hadoop2.7/conf
cp slaves.template slaves
cp spark-env.sh.template spark-env.sh
vi slaves
Centoshadoop2
Centoshadoop3
Centoshadoop4
上述配置的将三个节点设置为集群的从节点(Worker节点)
修改spark-env.sh配置文件添加如下内容
export JAVA_HOME=/usr/local/java/jdk1.8.0_192
export SPARK_MASTER_IP=centoshadoop1
export SPARK_MASTER_PORT=7077
上述配置属性的解析:
JAVA_HOME:指定JAVA_HOME的路径。若集群中的每个节点在/etc/profile文件中都配置了JAVA_HOME,则该选项可以省略,Spark集群启动时会自动读取。为了防止出错,建议此处将该选项配置上。
SPARK_MASTER_IP:指定集群主节点(Master)的主机名或者IP地址,此处为centoshadoop1。
SPARK_MASTER_PORT:指定Master节点的访问端口。默认为7077
复制Spark 安装文件到其他节点
scp -r ~/spark/ hadoop@centoshadoop2:~
scp -r ~/spark/ hadoop@centoshadoop3:~
scp -r ~/spark/ hadoop@centoshadoop4:~
scp -r scala/ hadoop@centoshadoop2:~
scp -r scala/ hadoop@centoshadoop3:~
scp -r scala/ hadoop@centoshadoop5:~
scp -r ~/.bash_profile hadoop@centoshadoop2:~
启动Spark集群
在centoshadoop1节点上进入Spark安装目录,执行以下命令,启动集群
cd /home/hadoop/spark/spark-2.4.5-bin-hadoop2.7
sbin/start-all.sh
输出日志为:
starting org.apache.spark.deploy.master.Master, logging to
/home/hadoop/spark/spark-2.4.5-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-centoshadoop1.out
centoshadoop3: starting org.apache.spark.deploy.worker.Worker, logging to
/home/hadoop/spark/spark-2.4.5-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-centoshadoop3.out
centoshadoop2: starting org.apache.spark.deploy.worker.Worker, logging to
/home/hadoop/spark/spark-2.4.5-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-centoshadoop2.out
centoshadoop1: starting org.apache.spark.deploy.worker.Worker, logging to
/home/hadoop/spark/spark-2.4.5-bin-hadoop2.7/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-centoshadoop1.out
监控各个节点的日志:
cd /home/hadoop/spark/spark-2.4.5-bin-hadoop2.7/logs/
Master节点的日志:
tail -f spark-hadoop-org.apache.spark.deploy.master.Master-1-centoshadoop1.out
20/03/25 12:51:17 INFO SecurityManager: Changing modify acls to: hadoop
20/03/25 12:51:17 INFO SecurityManager: Changing view acls groups to:
20/03/25 12:51:17 INFO SecurityManager: Changing modify acls groups to:
20/03/25 12:51:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
20/03/25 12:51:17 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
20/03/25 12:51:17 INFO Master: Starting Spark master at spark://centoshadoop1:7077
20/03/25 12:51:17 INFO Master: Running Spark version 2.4.5
20/03/25 12:51:18 INFO Utils: Successfully started service 'MasterUI' on port 8080.
20/03/25 12:51:18 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://centoshadoop1:8080
20/03/25 12:51:18 INFO Master: I have been elected leader! New state: ALIVE
tail -f spark-hadoop-org.apache.spark.deploy.worker.Worker-1-centoshadoop2.out
从节点的日志:
Caused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: 没有到主机的路由: centoshadoop1/192.168.227.140:7077
只需要在节点一上执行如下命令
firewall-cmd --zone=public --add-port=8080/tcp --permanent
firewall-cmd --zone=public --add-port=7077/tcp --permanent
firewall-cmd --reload
http://192.168.227.140:8080/ 查看Spark的Web界面
为了防止后续出错,必须在spark-env.sh中的SPARK_MASTER_IP属性指定的节点中启动Spark集群。