btf-raft共识算法_了解Raft共识算法:学术文章摘要

btf-raft共识算法

by Shubheksha

通过Shubheksha

了解Raft共识算法:学术文章摘要 (Understanding the Raft consensus algorithm: an academic article summary)

This post summarizes the Raft consensus algorithm presented in the paper In Search of An Understandable Consensus Algorithm by Diego Ongaro and John Ousterhout. All pull quotes are taken from that paper.

这篇文章总结了迭戈·奥加罗(Diego Ongaro)和约翰·奥斯特豪特(John Ousterhout)在论文《寻找可理解的共识算法》中提出的Raft共识算法。 所有拉引语均来自该论文。

筏: (Raft:)

Raft is a distributed consensus algorithm. It was designed to be easily understood. It solves the problem of getting multiple servers to agree on a shared state even in the face of failures. The shared status is usually a data structure supported by a replicated log. We need the system to be fully operational as long as a majority of the servers are up.

Raft是一种分布式共识算法。 它的设计易于理解。 它解决了即使面对故障也使多个服务器在共享状态上达成一致的问题。 共享状态通常是复制日志支持的数据结构。 只要大多数服务器都处于运行状态,我们就需要系统能够完全运行。

Raft works by electing a leader in the cluster. The leader is responsible for accepting client requests and managing the replication of the log to other servers. The data flows only in one direction: from leader to other servers.

Raft通过选举集群中的一位领导者来工作。 领导者负责接受客户端请求并管理将日志复制到其他服务器。 数据仅在一个方向上流动:从领导者流向其他服务器。

Raft decomposes consensus into three sub-problems:

Raft将共识分解为三个子问题:

  • Leader Election: A new leader needs to be elected in case of the failure of an existing one.

    领导人选举:如果现有领导人失败,则需要选举一名新领导人。
  • Log replication: The leader needs to keep the logs of all servers in sync with its own through replication.

    日志复制:领导者需要通过复制使所有服务器的日志与其自己的服务器保持同步。
  • Safety: If one of the servers has committed a log entry at a particular index, no other server can apply a different log entry for that index.

    安全性:如果其中一台服务器已在特定索引上提交了日志条目,则其他任何服务器都无法对该索引应用其他日志条目。
基本: (Basics:)

Each server exists in one of the three states: leader, follower, or candidate.

每个服务器都处于以下三种状态之一:领导者,关注者或候选者。

In normal operation there is exactly one leader and all of the other servers are followers. Followers are passive: they issue no requests on their own but simply respond to requests from leaders and candidates. The leader handles all client requests (if a client contacts a follower, the follower redirects it to the leader). The third state, candidate, is used to elect a new leader.
在正常操作中,只有一个领导者,其他所有服务器都是跟随者。 追随者是被动的:他们自己不发出请求,而只是响应领导者和候选人的请求。 领导者处理所有客户请求(如果客户联系关注者,则关注者将其重定向到领导者)。 第三种状态,候选人,用于选举新领导人。

Raft divides time into terms of arbitrary length, each beginning with an election. If a candidate wins the election, it remains the leader for the rest of the term. If the vote is split, then that term ends without a leader.

木筏将时间划分为多个条款 任意长度,每一个都以选举开始。 如果候选人在选举中获胜,则在剩余任期中仍将保持领导地位。 如果表决分裂,则该任期在没有领导人的情况下结束。

The term number increases monotonically. Each server stores the current term number which is also exchanged in every communication.

术语数单调增加。 每个服务器存储当前的条款编号 在每次通讯中也会交换。

.. if one server’s current term is smaller than the other’s, then it updates its current term to the larger value. If a candidate or leader discovers that its term is out of date, it immediately reverts to follower state. If a server receives a request with a stale term number, it rejects the request.
..如果一台服务器的当前期限小于另一台服务器,则它将其当前期限更新为较大的值。 如果候选人或领导者发现其任期已过时,它将立即恢复为关注者状态。 如果服务器接收到具有过期条款编号的请求,则它将拒绝该请求。

Raft makes use of two remote procedure calls (RPCs) to carry out its basic operation.

Raft利用两个远程过程调用(RPC)来执行其基本操作。

  • RequestVotes is used by candidates during elections

    候选人在选举期间使用RequestVotes
  • AppendEntries is used by leaders for replicating log entries and also as a heartbeat (a signal to check if a server is up or not — it doesn’t contain any log entries)

    领导者使用AppendEntries复制日志条目,并用作心跳(检查服务器是否启动的信号-它不包含任何日志条目)
领导人选举 (Leader election)

The leader periodically sends a heartbeat to its followers to maintain authority. A leader election is triggered when a follower times out after waiting for a heartbeat from the leader. This follower transitions to the candidate state and increments its term number. After voting for itself, it issues RequestVotes RPC in parallel to others in the cluster. Three outcomes are possible:

领导者定期向其追随者发送心跳,以保持权威。 当跟随者在等待领导者的心跳后超时时,将触发领导者选举。 该关注者转换为候选状态并增加其任期编号 。 在为自己投票之后,它与集群中的其他进程并行地发出RequestVotes RPC。 可能有以下三种结果:

  1. The candidate receives votes from the majority of the servers and becomes the leader. It then sends a heartbeat message to others in the cluster to establish authority.

    候选人从大多数服务器中获得选票并成为领导者。 然后,它将心跳消息发送给集群中的其他人以建立权限。
  2. If other candidates receive AppendEntries RPC, they check for the term number. If the term number is greater than their own, they accept the server as the leader and return to follower state. If the term number is smaller, they reject the RPC and still remain a candidate.

    如果其他候选人收到AppendEntries RPC,他们将检查 学期编号。 如果期限数字大于自己的期限,则他们接受服务器作为领导者并返回到跟随者状态。 如果术语数量较小,则他们拒绝RPC,并且仍然是候选人。

  3. The candidate neither loses nor wins. If more than one server becomes a candidate at the same time, the vote can be split with no clear majority. In this case a new election begins after one of the candidates times out.

    候选人既不输也不赢。 如果同时有多个服务器成为候选服务器,则可以在没有明显多数的情况下进行表决。 在这种情况下,其中一名候选人超时后便会开始新的选举。
Raft uses randomized election timeouts to ensure that split votes are rare and that they are resolved quickly. To prevent split votes in the first place, election timeouts are chosen randomly from a fixed interval (e.g., 150–300ms). This spreads out the servers so that in most cases only a single server will time out; it wins the election and sends heartbeats before any other servers time out. The same mechanism is used to handle split votes. Each candidate restarts its randomized election timeout at the start of an election, and it waits for that timeout to elapse before starting the next election; this reduces the likelihood of another split vote in the new election.
Raft使用随机的选举超时来确保分割票很少发生,并且可以快速解决。 为了避免一票分开,首先从固定间隔(例如150-300毫秒)中随机选择选举超时。 这样会分散服务器,因此在大多数情况下,只有一台服务器会超时; 它会赢得选举并在其他任何服务器超时之前发送心跳信号。 使用相同的机制来处理拆分投票。 每位候选人在选举开始时都会重新开始其随机选举超时,并等待该超时时间过去后才开始下一次选举; 这减少了在新选举中再次进行分裂表决的可能性。
日志复制: (Log Replication:)

The client requests are assumed to be write-only for now. Each request consists of a command to be executed ideally by the replicated state machines of all the servers. When a leader gets a client request, it adds it to its own log as a new entry. Each entry in a log:

客户端请求现在假定为仅写。 每个请求都包含一个命令,该命令最好由所有服务器的复制状态机执行。 领导者收到客户请求后,会将其作为新条目添加到自己的日志中。 日志中的每个条目:

  • Contains the client specified command

    包含客户端指定的命令
  • Has an index to identify the position of entry in the log (the index starts from 1)

    有一个索引来标识日志中条目的位置(索引从1开始)
  • Has a term number to logically identify when the entry was written

    有一个术语号可以从逻辑上识别条目的写入时间

It needs to replicate the entry to all the follower nodes in order to keep the logs consistent. The leader issues AppendEntries RPCs to all other servers in parallel. The leader retries this until all followers safely replicate the new entry.

它需要将条目复制到所有跟随者节点,以保持日志一致。 领导者将AppendEntries RPC并行发布到所有其他服务器。 领导者将重试此操作,直到所有关注者安全地复制新条目为止。

When the entry is replicated to a majority of servers by the leader that created it, it is considered committed. All the previous entries, including those created by earlier leaders, are also considered committed. The leader executes the entry once it is committed and returns the result to the client.

当条目由创建它的领导者复制到大多数服务器时,就被认为是已提交。 所有以前的条目,包括由早期领导者创建的条目,也都视为已提交。 负责人一旦提交就执行该条目,并将结果返回给客户端。

The leader maintains the highest index it knows to be committed in its log and sends it out with the AppendEntries RPCs to its followers. Once the followers find out that the entry has been committed, it applies the entry to its state machine in order.

领导者在其日志中维护它知道要提交的最高索引,并将其与AppendEntries RPC一起发送给其跟随者。 一旦关注者发现条目已提交,它将按顺序将条目应用于其状态机。

Raft maintains the following properties, which together constitute the Log Matching Property
Raft维护以下属性,它们共同构成了Log Matching属性
• If two entries in different logs have the same index and term, then they store the same command.
•如果不同日志中的两个条目具有相同的索引和术语,则它们存储相同的命令。
• If two entries in different logs have the same index and term, then the logs are identical in all preceding entries.
•如果不同日志中的两个条目具有相同的索引和术语,则所有先前条目中的日志都相同。

When sending an AppendEntries RPC, the leader includes the term number and index of the entry that immediately precedes the new entry. If the follower cannot find a match for this entry in its own log, it rejects the request to append the new entry.

发送AppendEntries RPC时,领导者会在新条目之前紧跟条目的术语编号和索引。 如果关注者在其自己的日志中找不到与该条目匹配的条目,则拒绝添加新条目的请求。

This consistency check lets the leader conclude that whenever AppendEntries returns successfully from a follower, they have identical logs until the index included in the RPC.

通过一致性检查,领导者可以得出结论,只要AppendEntries成功从跟随者返回,它们就会拥有相同的日志,直到RPC中包含索引为止。

But the logs of leaders and followers may become inconsistent in the face of leader crashes.

但是,面对领导者崩溃,领导者和追随者的日志可能会变得不一致。

In Raft, the leader handles inconsistencies by forcing the followers’ logs to duplicate its own. This means that conflicting entries in follower logs will be overwritten with entries from the leader’s log.
在Raft中,领导者通过强迫追随者的日志重复自己的日志来处理不一致之处。 这意味着跟随者日志中的冲突条目将被领导者日志中的条目覆盖。

The leader tries to find the last index where its log matches that of the follower, deletes extra entries if any, and adds the new ones.

领导者试图找到其日志与追随者的日志匹配的最后一个索引,删除多余的条目(如果有),并添加新的条目。

The leader maintains a nextIndex for each follower, which is the index of the next log entry the leader will send to that follower. When a leader first comes to power, it initializes all nextIndex values to the index just after the last one in its log.
领导者为每个关注者维护一个nextIndex,这是领导者将发送给该​​关注者的下一个日志条目的索引。 领导首次掌权时,它将所有nextIndex值初始化为刚好在其日志中的最后一个索引之后的索引。

Whenever AppendRPC returns with a failure for a follower, the leader decrements the nextIndex and issues another AppendEntries RPC. Eventually, nextIndex will reach a value where the logs converge. AppendEntries will succeed when this happens and it can remove extraneous entries (if any) and add new ones from the leaders log (if any). Hence, a successful AppendEntries from a follower guarantees that the leader’s log is consistent with it.

每当AppendRPC因跟随者失败而返回时,领导者递减nextIndex 并发布另一个AppendEntries RPC。 最终,nextIndex 将达到日志收敛的值。 发生这种情况时,AppendEntries将成功执行,它可以删除无关的条目(如果有),并从领导者日志中添加新的条目(如果有)。 因此,来自跟随者的成功AppendEntries可以确保领导者的日志与之保持一致。

With this mechanism, a leader does not need to take any special actions to restore log consistency when it comes to power. It just begins normal operation, and the logs automatically converge in response to failures of the Append-Entries consistency check. A leader never overwrites or deletes entries in its own log.
通过这种机制,领导者在上电时无需采取任何特殊措施即可恢复日志一致性。 它只是开始正常运行,并且响应于Append-Entries一致性检查失败,日志会自动收敛。 领导者永远不会覆盖或删除其自己的日志中的条目。
安全: (Safety:)

Raft makes sure that the leader for a term has committed entries from all previous terms in its log. This is needed to ensure that all logs are consistent and the state machines execute the same set of commands.

Raft确保某个术语的领导者已提交其日志中所有先前术语的条目。 这是确保所有日志一致且状态机执行同一组命令所必需的。

During a leader election, the RequestVote RPC includes information about the candidate’s log. If the voter finds that its log it more up-to-date that the candidate, it doesn’t vote for it.

在领导者选举期间,RequestVote RPC包含有关候选人日志的信息。 如果选民发现其日志比该候选人最新,则不会投票。

Raft determines which of two logs is more up-to-date by comparing the index and term of the last entries in the logs. If the logs have last entries with different terms, then the log with the later term is more up-to-date. If the logs end with the same term, then whichever log is longer is more up-to-date.
Raft通过比较日志中最后一个条目的索引和术语来确定两个日志中哪个是最新的。 如果日志的最后一个条目具有不同的术语,则带有较新术语的日志将是最新的。 如果日志以相同的术语结尾,则以更长的日志为准。
集群成员: (Cluster membership:)
For the configuration change mechanism to be safe, there must be no point during the transition where it is possible for two leaders to be elected for the same term. Unfortunately, any approach where servers switch directly from the old configuration to the new configuration is unsafe.
为了确保配置更改机制的安全,在过渡期间没有任何可能在同一任期选举两名领导者的意义。 不幸的是,任何将服务器直接从旧配置切换到新配置的方法都是不安全的。

Raft uses a two-phase approach for altering cluster membership. First, it switches to an intermediate configuration called joint consensus. Then, once that is committed, it switches over to the new configuration.

Raft使用两阶段方法来更改集群成员。 首先,它切换到称为联合共识的中间配置 然后,一旦提交,它将切换到新配置。

The joint consensus allows individual servers to transition between configurations at different times without compromising safety. Furthermore, joint consensus allows the cluster to continue servicing client requests throughout the configuration change.
联合共识允许单个服务器在不同时间在配置之间转换,而不会影响安全性。 此外,联合共识允许群集在整个配置更改期间继续为客户请求提供服务。

Joint consensus combines the new and old configurations as follows:

联合共识将新的和旧的配置组合如下:

  • Log entries are replicated to all servers in both the configurations

    两种配置中的日志条目均复制到所有服务器
  • Any server from old or new can become the leader

    任何新旧服务器都可以成为领导者
  • Agreement requires separate majorities from both old and new configurations

    协议需要将旧配置与新配置分开的多数

When a leader receives a configuration change message, it stores and replicates the entry for join consensus C<old, new>. A server always uses the latest configuration in its log to make decisions even if it isn’t committed. When joint consensus is committed, only servers with C<old, new> in their logs can become leaders.

领导者收到配置更改消息时,将存储并复制用于加入共识C <old,n ew>的条目。 服务器始终使用其日志中的最新配置来做出决定,即使未提交也是如此。 提交联合共识后,只有日志中具有C <旧,新>的服务器才能成为领导者。

It is now safe for the leader to create a log entry describing C<new> and replicate it to the cluster. Again, this configuration will take effect on each server as soon as it is seen. When the new configuration has been committed under the rules of C<new>, the old configuration is irrelevant and servers not in the new configuration can be shut down.
现在,领导者可以安全地创建描述C <new>的日志条目并将其复制到集群。 同样,此配置将在看到后立即在每台服务器上生效。 当根据C <new>的规则提交了新配置时,旧配置将不相关,并且可以关闭不在新配置中的服务器。

A fantastic visualization of how Raft works can be found here.

这里可以找到有关Raft工作原理的出色可视化效果。

More material such as talks, presentations, related papers and open-source implementations can be found here.

可以在此处找到更多材料,例如演讲,演讲,相关论文和开源实现。

I have dug only into the details of the basic algorithm that make up Raft and the safety guarantees it provides. The paper contains lot more details and it is super approachable as the primary goal of the authors was understandability. I definitely recommend you read it even if you’ve never read any other paper before.

我只研究了构成Raft的基本算法的细节及其提供的安全性保证。 本文包含更多详细信息,并且由于作者的主要目标是可理解性,因此它非常容易上手。 我绝对建议您阅读它,即使您以前从未阅读过其他论文。

If you enjoyed this article, please hit the clap button below so more people see it. Thank you.

如果您喜欢这篇文章,请点击下面的拍手按钮,以便更多的人看到它。 谢谢。

P.S. — If you made it this far and would like to receive a mail whenever I publish one of these posts, sign up here.

PS —如果您到现在为止,并且希望在我发布这些帖子之一时收到邮件,请在此处注册。

翻译自: https://www.freecodecamp.org/news/in-search-of-an-understandable-consensus-algorithm-a-summary-4bc294c97e0d/

btf-raft共识算法

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值