搭建Spark Standalone集群
此文以Spark 3.1.2版本为例!
如未指定,下述命令在所有节点执行!
一、系统资源及组件规划
节点名称 | 系统名称 | CPU/内存 | 网卡 | 磁盘 | IP地址 | OS |
---|---|---|---|---|---|---|
Master | master | 2C/4G | ens33 | 128G | 192.168.0.10 | CentOS7 |
Worker1 | worker1 | 2C/4G | ens33 | 128G | 192.168.0.11 | CentOS7 |
Worker2 | worker2 | 2C/4G | ens33 | 128G | 192.168.0.12 | CentOS7 |
二、系统软件安装与设置
1、安装基本软件
yum -y install vim lrzsz bash-completion
2、设置名称解析
echo 192.168.0.10 master >> /etc/hosts
echo 192.168.0.11 worker1 >> /etc/hosts
echo 192.168.0.12 worker2 >> /etc/hosts
3、设置NTP
yum -y install chrony
systemctl start chronyd
systemctl enable chronyd
systemctl status chronyd
chronyc sources
4、设置SELinux、防火墙
systemctl stop firewalld
systemctl disable firewalld
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
三、搭建Spark Standalone集群
1、设置SSH免密登录
在Master节点上配置免密ssh所有节点:
ssh-keygen -t rsa
for host in master worker1 worker2; do ssh-copy-id -i ~/.ssh/id_rsa.pub $host; done
2、安装JDK
下载JDK文件:
参考地址:https://www.oracle.com/java/technologies/javase/javase-jdk8-downloads.html
解压JDK安装文件:
tar -xf /root/jdk-8u291-linux-x64.tar.gz -C /usr/local/
设置环境变量:
export JAVA_HOME=/usr/local/jdk1.8.0_291/
export PATH=$PATH:/usr/local/jdk1.8.0_291/bin/
添加环境变量至/etc/profile文件:
export JAVA_HOME=/usr/local/jdk1.8.0_291/
PATH=$PATH:/usr/local/jdk1.8.0_291/bin/
查看Java版本:
java -version
3、安装Spark Standalone集群
下载Spark文件:
参考地址:http://spark.apache.org/downloads.html
解压Spark安装文件:
tar -zxf /root/spark-3.1.2-bin-hadoop3.2.tgz -C /usr/local/
设置环境变量:
export PATH=$PATH:/usr/local/spark-3.1.2-bin-hadoop3.2/bin/:/usr/local/spark-3.1.2-bin-hadoop3.2/sbin/
添加环境变量至/etc/profile文件:
PATH=$PATH:/usr/local/spark-3.1.2-bin-hadoop3.2/bin/:/usr/local/spark-3.1.2-bin-hadoop3.2/sbin/
4、配置Spark Standalone集群
创建spark-env.sh文件:
cat > /usr/local/spark-3.1.2-bin-hadoop3.2/conf/spark-env.sh << EOF
export JAVA_HOME=/usr/local/jdk1.8.0_291/
SPARK_MASTER_HOST=master
SPARK_MASTER_PORT=7077
EOF
创建workers文件,指定Worker节点:
cat > /usr/local/spark-3.1.2-bin-hadoop3.2/conf/workers << EOF
worker1
worker2
EOF
5、启动Spark Standalone集群
方式一:
在Master节点上启动Spark集群:
start-all.sh
方法二:
在Master节点上启动Spark Master节点:
start-master.sh
在Worker节点上启动Spark Worker节点:
start-worker.sh spark://master:7077
在各类节点上查看Spark进程:
jps
6、登录Spark Standalone集群
登录Master:
http://192.168.0.10:8080
登录Worker:
http://192.168.0.11:8081
7、停止Spark Standalone集群
方式一:
在Master节点上停止Spark集群:
stop-all.sh
方法二:
在Worker节点上停止Spark Worker节点:
stop-worker.sh
在Master节点上停止Spark Master节点:
stop-master.sh