[etcd] Raft理论学习(三)(集群成员和配置变化)

6 Cluster membership changes

Up until now we have assumed that the cluster configuration (the set of servers participating in the consensus
algorithm) is fixed. In practice, it will occasionally be necessary to change the configuration, for example to replace
servers when they fail or to change the degree of replication. Although this can be done by taking the entire cluster
off-line, updating configuration files, and then restarting the cluster, this would leave the cluster unavailable during
the changeover. In addition, if there are any manual steps, they risk operator error. In order to avoid these issues,
we decided to automate configuration changes and incorporate them into the Raft consensus algorithm.

6集群成员更改

到目前为止,我们一直假设集群配置(参与共识的服务器集 算法)是固定的。实际上,有时需要更改配置,例如替换 服务器发生故障或更改复制程度时。尽管可以通过整个集群来完成 脱机,更新配置文件,然后重新启动集群,这将导致集群在以下过程中不可用 转换。另外,如果有任何手动步骤,则可能会导致操作员出错。为了避免这些问题, 我们决定自动进行配置更改,并将其合并到Raft共识算法中。

For the configuration change mechanism to be safe, there must be no point during the transition where it
is possible for two leaders to be elected for the same term. Unfortunately, any approach where servers switch
directly from the old configuration to the new configuration is unsafe. It isn’t possible to atomically switch all of
the servers at once, so the cluster can potentially split into two independentmajorities during the transition (see Figure
10).

为了确保配置更改机制的安全,在过渡期间必须没有任何点 可以同时选举两名领导人。不幸的是,服务器切换的任何方法 直接从旧配置到新配置是不安全的。无法自动切换所有 一次处理服务器,因此群集在过渡期间可能会分成两个独立的多数(请参见图10)。

In order to ensure safety, configuration changes must use a two-phase approach. There are a variety of ways
to implement the two phases. For example, some systems (e.g., [22]) use the first phase to disable the old configuration
so it cannot process client requests; then the second phase enables the new configuration. In Raft the cluster
first switches to a transitional configuration we call joint consensus; once the joint consensus has been committed,
the system then transitions to the new configuration. The joint consensus combines both the old and new configurations:

为了确保安全,更改配置必须使用两阶段方法。有多种方法 实施两个阶段。例如,某些系统(例如[22])使用第一阶段来禁用旧配置 因此它无法处理客户请求;然后第二阶段启用新配置。在Raft里 首先切换到过渡配置,我们称为联合共识;一旦达成了共同共识, 然后,系统过渡到新配置。联合共识将新旧配置结合在一起:

• Log entries are replicated to all servers in both configurations.

• Any server from either configuration may serve as leader.
• Agreement (for elections and entry commitment) requires separate majorities from both the old and new configurations.

•两种配置中的日志条目都复制到所有服务器。

•来自任一配置的任何服务器都可以充当领导者。

•协议(用于选举和进入承诺)需要大多数人与新旧人不同。

The joint consensus allows individual servers to transition between configurations at different times without compromising
safety. Furthermore, joint consensus allows the cluster to continue servicing client requests throughout the configuration change.

联合共识允许单个服务器在不同时间在配置之间进行转换而不会影响 安全。此外,联合共识允许群集在整个配置更改期间继续为客户请求提供服务。

Cluster configurations are stored and communicated using special entries in the replicated log; Figure 11 illustrates
the configuration change process. When the leader receives a request to change the configuration from Cold
to Cnew, it stores the configuration for joint consensus (Cold,new in the figure) as a log entry and replicates that
entry using the mechanisms described previously. Once a given server adds the new configuration entry to its log,
it uses that configuration for all future decisions (a server always uses the latest configuration in its log, regardless
of whether the entry is committed). This means that the leader will use the rules of Cold,new to determine when the
log entry for Cold,new is committed. If the leader crashes, a new leader may be chosen under either Cold or Cold,new,
depending on whether the winning candidate has received Cold,new. In any case, Cnew cannot make unilateral decisions
during this period.

群集配置使用复制日志中的特殊条目进行存储和通信。图11说明 配置更改过程。当领导者收到从Cold更改配置的请求时 到Cnew,它将联合共识的配置(冷,图中的new)存储为日志条目,并复制 使用前面描述的机制进入。给定服务器将新配置条目添加到其日志后, 它会将该配置用于将来的所有决策(服务器始终使用其日志中的最新配置,无论 条目是否已提交)。这意味着领导者将使用Cold,new的规则来确定何时 Cold,new的日志条目已提交。如果领导者崩溃,可以在Cold或Cold,new, 取决于获胜的候选人是否收到过Cold,new。无论如何,Cnew不能做出单方面决定 在这段时期。

OnceCold,new has been committed, neitherCold norCnew can make decisions without approval of the other, and the
Leader Completeness Property ensures that only servers with the Cold,new log entry can be elected as leader. It is
now safe for the leader to create a log entry describing Cnew and replicate it to the cluster. Again, this configuration
will take effect on each server as soon as it is seen. When the new configuration has been committed under
the rules of Cnew, the old configuration is irrelevant and servers not in the new configuration can be shut down. As
shown in Figure 11, there is no time when Cold and Cnew can both make unilateral decisions; this guarantees safety.

一旦提交了“ cold,new”,则“ Cold”和“ Cnew”都无法在未经另一方批准的情况下做出决定,并且 领导者完整性属性确保只有具有Cold,new日志条目的服务器才能被选为领导者。它是 现在,领导者可以安全地创建描述Cnew的日志条目并将其复制到集群。同样,此配置 将会在每台服务器上立即生效。在以下位置提交新配置后 根据Cnew的规则,旧配置是无关紧要的,可以关闭不在新配置中的服务器。如 如图11所示,Cold和Cnew都无法做出单方面决策。这样可以保证安全。

There are three more issues to address for reconfiguration. The first issue is that new servers may not initially
store any log entries. If they are added to the cluster in this state, it could take quite a while for them to catch
up, during which time it might not be possible to commit new log entries. In order to avoid availability gaps,
Raft introduces an additional phase before the configuration change, in which the new servers join the cluster
as non-voting members (the leader replicates log entries to them, but they are not considered for majorities). Once
the new servers have caught up with the rest of the cluster,the reconfiguration can proceed as described above.

还需要解决三个问题以进行重新配置。第一个问题是新服务器最初可能不会 存储任何日志条目。如果以这种状态将它们添加到集群中,则它们可能需要一段时间才能被捕获 up,在此期间可能无法提交新的日志条目。为了避免可用性差距, Raft在配置更改之前引入了另一个阶段,其中新服务器加入了集群 作为非投票成员(领导者向他们复制日志条目,但多数情况下不考虑它们)。一旦 如果新服务器赶上了群集的其余部分,则重新配置可以如上所述进行。

The second issue is that the cluster leader may not be part of the new configuration. In this case, the leader steps
down (returns to follower state) once it has committed the Cnew log entry. This means that there will be a period of
time (while it is committingCnew) when the leader is managing a cluster that does not include itself; it replicates log
entries but does not count itself in majorities. The leader transition occurs when Cnew is committed because this is
the first point when the new configuration can operate independently (it will always be possible to choose a leader
fromCnew). Before this point, it may be the case that only a server fromCold can be elected leader.

第二个问题是群集领导者可能不属于新配置。在这种情况下,领导者走 提交Cnew日志条目后,单击“关闭”(返回到关注者状态)。这意味着会有一段时间 领导者正在管理不包含自身的集群的时间(committingCnew);它复制日志 条目,但不算多数。提交Cnew时会发生领导者过渡,因为这是 新配置可以独立运行的第一点(始终可以选择领导者 fromCnew)。在此之前,可能只有来自Cold的服务器可以被选为领导者。

The third issue is that removed servers (those not in Cnew) can disrupt the cluster. These servers will not receive
heartbeats, so they will time out and start new elections. They will then send RequestVote RPCs with new
term numbers, and this will cause the current leader to revert to follower state. A new leader will eventually be
elected, but the removed servers will time out again and the process will repeat, resulting in poor availability.

第三个问题是,删除的服务器(那些不在Cnew中的服务器)会破坏群集。这些服务器将不会收到 心跳,所以他们会超时并开始新的选举。然后,他们将使用新的请求发送RPC 任期数字,这将导致当前领导者恢复为关注者状态。一个新的领导者最终将成为 选择,但被删除的服务器将再次超时,并且该过程将重复,从而导致可用性降低。

To prevent this problem, servers disregard RequestVote RPCs when they believe a current leader exists. Specifically,
if a server receives a RequestVote RPC within the minimum election timeout of hearing from a current leader, it does not update its term or grant its vote. This does not affect normal elections, where each server waits at least a minimum election timeout before starting an election. However, it helps avoid disruptions from removed servers: if a leader is able to get heartbeats to its
cluster, then it will not be deposed by larger term numbers.

为了防止出现此问题,服务器在认为当前的领导者存在时会忽略RequestVote RPC。特别, 如果服务器在当前领导者的听证会的最小选举超时时间内收到RequestVote RPC,则它不会更新其任期或授予其投票权。这不会影响正常的选举,在正常的选举中,每个服务器在开始选举之前至少要等待最小的选举超时。但是,这有助于避免已删除的服务器造成中断:如果领导者能够对集群心跳进行检测 ,那么它将不会被更大的Term数所取代。

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值