spark的standalone的HA配置

High Availability
By default, standalone scheduling clusters are resilient to Worker failures (insofar as Spark itself is resilient to losing work by moving it to other workers). However, the scheduler uses a Master to make scheduling decisions, and this (by default) creates a single point of failure: if the Master crashes, no new applications can be created. In order to circumvent this, we have two high availability schemes, detailed below.

Standby Masters with ZooKeeper
Overview

Utilizing ZooKeeper to provide leader election and some state storage, you can launch multiple Masters in your cluster connected to the same ZooKeeper instance. One will be elected “leader” and the others will remain in standby mode. If the current leader dies, another Master will be elected, recover the old Master’s state, and then resume scheduling. The entire recovery process (from the time the the first leader goes down) should take between 1 and 2 minutes. Note that this delay only affects scheduling new applications – applications that were already running during Master failover are unaffected.

Learn more about getting started with ZooKeeper here.

Configuration

In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spark-env using this configuration:

System property Meaning
spark.deploy.recoveryMode Set to ZOOKEEPER to enable standby Master recovery mode (default: NONE).
spark.deploy.zookeeper.url The ZooKeeper cluster url (e.g., 192.168.1.100:2181,192.168.1.101:2181).
spark.deploy.zookeeper.dir The directory in ZooKeeper to store recovery state (default: /spark).
Possible gotcha: If you have multiple Masters in your cluster but fail to correctly configure the Masters to use ZooKeeper, the Masters will fail to discover each other and think they’re all leaders. This will not lead to a healthy cluster state (as all Masters will schedule independently).

Details

After you have a ZooKeeper cluster set up, enabling high availability is straightforward. Simply start multiple Master processes on different nodes with the same ZooKeeper configuration (ZooKeeper URL and directory). Masters can be added and removed at any time.

In order to schedule new applications or add Workers to the cluster, they need to know the IP address of the current leader. This can be accomplished by simply passing in a list of Masters where you used to pass in a single one. For example, you might start your SparkContext pointing to spark://host1:port1,host2:port2. This would cause your SparkContext to try registering with both Masters – if host1 goes down, this configuration would still be correct as we’d find the new leader, host2.

There’s an important distinction to be made between “registering with a Master” and normal operation. When starting up, an application or Worker needs to be able to find and register with the current lead Master. Once it successfully registers, though, it is “in the system” (i.e., stored in ZooKeeper). If failover occurs, the new leader will contact all previously registered applications and Workers to inform them of the change in leadership, so they need not even have known of the existence of the new Master at startup.

Due to this property, new Masters can be created at any time, and the only thing you need to worry about is that new applications and Workers can find it to register with in case it becomes the leader. Once registered, you’re taken care of.

Single-Node Recovery with Local File System
Overview

ZooKeeper is the best way to go for production-level high availability, but if you just want to be able to restart the Master if it goes down, FILESYSTEM mode can take care of it. When applications and Workers register, they have enough state written to the provided directory so that they can be recovered upon a restart of the Master process.

Configuration

In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spark-env using this configuration:

System property Meaning
spark.deploy.recoveryMode Set to FILESYSTEM to enable single-node recovery mode (default: NONE).
spark.deploy.recoveryDirectory The directory in which Spark will store recovery state, accessible from the Master’s perspective.
Details

This solution can be used in tandem with a process monitor/manager like monit, or just to enable manual recovery via restart.
While filesystem recovery seems straightforwardly better than not doing any recovery at all, this mode may be suboptimal for certain development or experimental purposes. In particular, killing a master via stop-master.sh does not clean up its recovery state, so whenever you start a new Master, it will enter recovery mode. This could increase the startup time by up to 1 minute if it needs to wait for all previously-registered Workers/clients to timeout.
While it’s not officially supported, you could mount an NFS directory as the recovery directory. If the original Master node dies completely, you could then start a Master on a different node, which would correctly recover all previously registered Workers/applications (equivalent to ZooKeeper recovery). Future applications will have to be able to find the new Master, however, in order to register.

摘自官网,但是国内应用环境一般都应用到yarn上,所以可以自己研究使用。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是Spark Standalone HA的安装步骤: 1. 配置SSH免密登录,在所有节点上安装Java和Spark。 2. 配置Spark的环境变量,如在.bashrc文件中添加以下内容: ``` export SPARK_HOME=/usr/local/spark export PATH=$SPARK_HOME/bin:$PATH ``` 3. 编辑Spark配置文件,在每个节点上创建一个spark-env.sh文件,包含以下内容: ``` export SPARK_MASTER_HOST=<主节点的IP地址> export SPARK_MASTER_PORT=7077 export SPARK_WORKER_CORES=<每个节点的CPU核心数> export SPARK_WORKER_MEMORY=<每个节点的内存大小> export SPARK_WORKER_INSTANCES=1 export SPARK_DAEMON_MEMORY=<Master和Worker的内存大小> ``` 4. 在主节点上启动Spark Master,执行以下命令: ``` $SPARK_HOME/sbin/start-master.sh ``` 5. 在每个工作节点上启动Spark Worker,执行以下命令: ``` $SPARK_HOME/sbin/start-worker.sh spark://<主节点的IP地址>:7077 ``` 6. 配置Spark Standalone的高可用性,编辑spark-env.sh文件,添加以下内容: ``` export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=<Zookeeper集群的IP地址>:2181 -Dspark.deploy.zookeeper.dir=/spark" ``` 7. 配置Zookeeper集群,安装和启动Zookeeper。 8. 在主节点上启动备用的Spark Master,执行以下命令: ``` $SPARK_HOME/sbin/start-master.sh --webui-port <备用Master的端口号> --properties-file <备用Master的配置文件> ``` 9. 配置自动故障切换,编辑spark-env.sh文件,添加以下内容: ``` export SPARK_DAEMON_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=<Zookeeper集群的IP地址>:2181 -Dspark.deploy.zookeeper.dir=/spark -Dspark.deploy.maxExecutorRetries=<最大重试次数> -Dspark.deploy.executorRetryInterval=<重试时间间隔>" ``` 10. 在备用Master节点上启动Spark Master HA,执行以下命令: ``` $SPARK_HOME/sbin/start-master.sh --webui-port <备用Master的端口号> --properties-file <备用Master的配置文件> --ha ``` 11. 配置Spark Worker节点,编辑spark-env.sh文件,添加以下内容: ``` export SPARK_WORKER_OPTS="$SPARK_WORKER_OPTS -Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=<Zookeeper集群的IP地址>:2181 -Dspark.deploy.zookeeper.dir=/spark" ``` 12. 测试Spark Standalone HA,停止主节点上的Spark Master进程,观察备用Master是否接管了Spark集群的管理。 以上就是Spark Standalone HA的安装步骤。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值