目录
大体和非可用的搭建方式差不多,不同的是多了zookeeper集群,用来帮助spark实现高可用
1.环境准备
节点名称 | 节点角色 |
---|---|
master1 | master |
master2 | standby master |
woker1 | worker,zookeeper |
woker2 | worker,zookeeper |
woker3 | woker,zookeeper |
worker1,worker2,worker3上面部署zookeeper集群
2.配置文件
修改spark-env.sh,添加以下内容 export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=worker1:2181,worker2:2181,woker3:2181 -Dspark.deploy.zookeeper.dir=/spark/ha"
3.服务启动
在master1上启动spark集群服务
sh /data/spark-3.1.3-bin-hadoop3.2/sbin/start-all.sh
在master2上启动master备用节点
sh /data/spark-3.1.3-bin-hadoop3.2/sbin/start-master.sh
4.验证
访问master1:8081显示为alive 访问master2:8081显示为standby 停止master1的master服务,master2会切换成alive stop-master.sh master1的master服务恢复,主从切点不会切换回来,master2显示alive,master1显示standby
5.测试
bin/spark-sql --master spark://master1:7077 --num-executors 15 --executor-memory 2g --executor-cores 1 --driver-memory 1g --driver-cores 2 --total-executor-cores 1 4 --conf spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive=true --conf spark.sql.shuffle.partitions=20