Spark的HA
基于文件目录的HA
(1)修改Master服务器的conf/spark-env.sh配置文件export JAVA_HOME=/usr/local/java/jdk1.8.0_11
export SPARK_MASTER_HOST=hadoop1
export SPARK_MASTER_PORT=7077
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=FILESYSTEM -Dspark.deploy.recoveryDirectory=/usr/local/spark/recovery"
(2)创建recovery文件夹
(3)启动Spark集群
./sbin/start-all.sh
(4)启动Spark-shell
./bin/spark-shell --master spark:\\hadoop1:7077
注意:万一master宕机可以通过./sbin/start-master.sh单节点启动,spark就会读取/usr/local/spark/recovery路径下的文件重新启动spark-shell
基于Zookeeper的HA
(1)修改conf/spark-env.sh配置文件export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=hadoop1:2181,hadoop2:2181,hadoop3:2181 -Dspark.deploy.zookeeper.dir=/spark"
(2)修改conf/salves配置文件
hadoop2
hadoop3
(3)将配置文件通过ssh发送到各个节点服务器上
(4)启动Spark集群
./sbin/start-all.sh
(5)启动hadoop2单节点上的master
./sbin/start-master.sh
注意:hadoop1的master工作的时候,hadoop2的master状态是待命(stand by),万一hadoop1的master服务器宕机,hadoop2的master就会工作,代替hadoop1的master