1. 准备工作
在搭建Spark分布式集群环境之前,先要完成Hadoop分布式集群环境的搭建。
Hadoop分布式集群环境构建https://blog.csdn.net/weixin_40595394/article/details/105410543
2. 安装Spark
Spark安装包:https://pan.baidu.com/s/1vGn6KcKAYTKu9P6YcFpajg 提取码:kt6l
[hadoop@hadoop1 Downloads]$ tar -zxvf spark-2.4.4-bin-hadoop2.6.gz
[hadoop@hadoop1 Downloads]$ mv spark-2.4.4-bin-hadoop2.6 spark
3. 配置环境变量
[root@hadoop1 hadoop]# vim ~/.bashrc
export SPARK_HOME=/home/hadoop/Downloads/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
[root@hadoop1 hadoop]# source ~/.bashrc
4. 配置Spark
slaves文件
[hadoop@hadoop1 conf]$ cp slaves.template slaves
[hadoop@hadoop1 conf]$ vim slaves
# 将localhost替换为从节点的主机名
hadoop2
hadoop3
spark-env.sh文件
[hadoop@hadoop1 conf]$ cp spark-env.sh.template spark-env.sh
[hadoop@hadoop1 conf]$ vim spark-env.sh
# 在文件末尾加上如下内容
# master节点
export JAVA_HOME=/home/hadoop/Downloads/java/jdk1.8.0_211
export HADOOP_CONF_DIR=/home/hadoop/Downloads/hadoop/hadoop/etc/hadoop
export SPARK_LOCAL_DIRS=/home/hadoop/Downloads/spark
export SPARK_MASTER_IP=192.168.159.130
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_HOST=192.168.159.130
export SPARK_LOCAL_IP=192.168.159.130
export SPARK_HOME=/home/hadoop/Downloads/spark
# slave节点
export JAVA_HOME=/home/hadoop/Downloads/java/jdk1.8.0_211
export HADOOP_CONF_DIR=/home/hadoop/Downloads/hadoop/hadoop/etc/hadoop
export SPARK_MASTER_IP=hadoop1
export SPARK_LOCAL_DIRS=/home/hadoop/Downloads/spark
export SPARK_MASTER_IP=192.168.159.130
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_HOST=192.168.159.130
export SPARK_LOCAL_IP=192.168.159.129 (当前从节点的IP地址)
export SPARK_HOME=/home/hadoop/Downloads/spark
配置完成,将spark复制到从节点
[hadoop@hadoop1 ~]$ scp -r /home/hadoop/Downloads/spark hadoop@hadoop2:/home/hadoop/Downloads/spark
[hadoop@hadoop1 ~]$ scp -r /home/hadoop/Downloads/spark hadoop@hadoop3:/home/hadoop/Downloads/spark
注:修改从节点spark-env.sh文件中的SPARK_LOCAL_IP。
5. 启动集群
进入spark中的sbin文件中
[hadoop@hadoop1 sbin]$ ./start-master.sh
[hadoop@hadoop2 sbin]$ ./start-slave.sh spark://192.168.159.130:7077
[hadoop@hadoop3 sbin]$ ./start-slave.sh spark://192.168.159.130:7077
检查Spark集群是否启动
[hadoop@hadoop1 ~]$ jps
# master节点
3868 Master
# slave节点
3597 Worker
在主节点上打开浏览器,访问http://192.168.159.130(master-ip):8080,查看Spark集群详情。
Spark分布式集群环境搭建完成。