文章目录
1. 前言
关于大数据集群搭建:
Hadoop+Spark+Zookeeper高可用集群搭建(一)
Hadoop+Spark+Zookeeper高可用集群搭建(二)
Hadoop+Spark+Zookeeper高可用集群搭建(三
Hadoop+Spark+Zookeeper高可用集群搭建(四)
接着搭建之前环境搭建时留下的未搭建的Spark集群。
2. 准备工作
- 为每一台节点上下载scala2.11.8并解压
- 为每一台节点上下载spark2.4.6并解压
spark-2.4.6-bin-hadoop2.6
3. 配置Spark系统变量
为每一个节点都配置
3.1 配置Spark 和 Scala 系统变量
vi /etc/profile
#scala
export SCALA_HOME=/home/hadoop/software/scala-2.11.12
export PATH=$PATH:$SCALA_HOME/bin
#spark
export SPARK_HOME=/home/hadoop/software/spark-2.4.6-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin
3.2 配置Spark环境变量
cd /home/hadoop/software/spark-2.4.6-bin-hadoop2.6/conf
cp spark-env.sh.template spark-env.sh
cp slaves.template slaves
vi spark-env.sh
高可用集群:
export JAVA_HOME=/home/hadoop/software/jdk1.8.0_212
export SCALA_HOME=/home/hadoop/software/scala-2.11.12
export HADOOP_HOME=/home/hadoop/software/hadoop-2.6.5
export HADOOP_CONF_DIR=/home/hadoop/software/hadoop-2.6.5/etc/hadoop
#export SPARK_MASTER_IP=master001
export SPARK_WORKER_MEMORY=2048M
export SPARK_EXECUTOR_MEMORY=2048M
export SPARK_DIST_CLASSPATH=$(/home/hadoop/software/hadoop-2.6.5/bin/hadoop classpath)
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master001:2181,master002:2181,slave001:2181,slave002:2181,slave003:2181 -Dspark.deploy.zookeeper.dir=/spark"
3.3 配置Spark工作节点
vi slaves
插入:
slave001
slave002
slave003
将文件分发到集群所有主机上
scp -r spark-2.4.6-bin-hadoop2.6 hadoop@master002:/home/hadoop/software/
scp -r spark-2.4.6-bin-hadoop2.6 hadoop@slave001:/home/hadoop/software/
scp -r spark-2.4.6-bin-hadoop2.6 hadoop@slave002:/home/hadoop/software/
scp -r spark-2.4.6-bin-hadoop2.6 hadoop@slave003:/home/hadoop/software/
4. 启动Spark集群
4.1 在三个Slave节点上启动ZK集群
./zkServer.sh start
4.2 在master001上启动HDFS集群
./start-dfs.sh
4.3 在master001上启动Spark集群的Master节点
./start-master.sh
4.4 在master002上启动Spark集群的Master节点
./start-master.sh
4.5 启动slave节点
../start-slaves.sh
5. 验证集群