Raft论文翻译(5.0)——Raft 一致性算法

 The Raft consensus algorithm

        Raft is an algorithm for managing a replicated log of the form described in Section 2. Figure 2 summarizes the algorithm in condensed form for reference, and Figure 3 lists key properties of the algorithm; the elements of these figures are discussed piecewise over the rest of this sec- tion.

        Raft是一种用于管理第2节所述格式的复制日志的算法。图2以简明的形式总结了该算法以供参考,图3列出了该算法的关键属性;这些图形的元素将在本节其余部分分段讨论。     

        Raft implements consensus by first electing a distin- guished leader, then giving the leader complete responsi- bility for managing the replicated log. The leader accepts log entries from clients, replicates them on other servers, and tells servers when it is safe to apply log entries to their state machines. Having a leader simplifies the man- agement of the replicated log. For example, the leader can decide where to place new entries in the log without con- sulting other servers, and data flows in a simple fashion from the leader to other servers. A leader can fail or be- come disconnected from the other servers, in which case a new leader is elected.

        Raft通过首先选举一个独立的leader,然后让leader全权负责管理复制的日志来实现共识。leader接受来自客户端的日志条目,在其他服务器上复制它们,并告诉服务器何时可以安全地将日志条目应用到其状态机。拥有一个领导者可以简化复制日志的管理。例如,领导可以决定在日志中放置新条目的位置,而无需咨询其他服务器,数据以简单的方式从领导流向其他服务器。领导者可能会失败或与其他服务器断开连接,在这种情况下,将选出新的领导者。

        Given the leader approach, Raft decomposes the con- sensus problem into three relatively independent subprob- lems, which are discussed in the subsections that follow:

        鉴于领导方法,Raft将con-sensus问题分解为三个相对独立的子问题,将在以下小节中讨论:

  • Leader election: a new leader must be chosen when an existing leader fails (Section 5.2).
  • Log replication: the leader must accept log entries from clients and replicate them across the cluster, forcing the other logs to agree with its own (Sec- tion 5.3).
  • Safety: the key safety property for Raft is the State Machine Safety Property in Figure 3: if any server has applied a particular log entry to its state machine, then no other server may apply a different command for the same log index. Section 5.4 describes how Raft ensures this property; the solution involves an additional restriction on the election mechanism de- scribed in Section 5.2.

        After presenting the consensus algorithm, this section dis- cusses the issue of availability and the role of timing in the system.

        在介绍一致性算法之后,本节讨论了可用性问题以及定时在系统中的作用。


Figure 2: A condensed summary of the Raft consensus algorithm (excluding membership changes and log compaction). The server behavior in the upper-left box is described as a set of rules that trigger independently and repeatedly. Section numbers such as §5.2 indicate where particular features are discussed. A formal specification [31] describes the algorithm more precisely

图2:Raft共识算法的简明摘要(不包括成员更改和日志压缩)。左上框中的服务器行为描述为一组独立且重复触发的规则。章节编号,如§5.2,表明讨论特定特征的位置。形式规范[31]更精确地描述了算法

State:

持久化状态机在所有的server上:

(在返回RPc之前,更新到持久化存储)

  • currentTerm:server 看到的最后一个 term ;(最初启动初始化为0,单调递增) 
  • votedFor:当前 term 收到的候选人 id ;
  • log[]:log 元素,从leader收到entry的时候,每个entry 包含状态机的命令,和term信息。(第一个index=1)

不稳定的 state 在所有server:

  • commitIndex:已知的已经被commit的最高的index;(初始化=0,单调递增)
  • lastApplied:被 applied 最高的 index;(初始化=0,单调递增)

不稳定的state 在leader上:

(选举后重新初始化)

  • nextIndex :每个server,下一个log entry 
  • matchIndex[]:每个server,复制到对应server的最高的log index ;(初始化=0,单调递增)

请求投票的rpc:

请求参数:

  • term:   候选人的term
  • 候选人id:  请求投票的候选人id
  • lastlogindex:候选人的最后一个log entry;
  • lastlogterm:候选人log最后的entry的term;

Results:(候选人侧会收到的返回信息)

  • term:当前term,候选人会自己更新这个字段
  • voteGranted:= true,意味着候选人收到投票;

接收方的实现:

  • 1. 如果 请求过来的 term < 本 peer 当前 currentTerm ——返回 false;
  • 2. 如果 votedFor = null 或者 = 候选者 id ,  候选人 的log 至少和接收方一样新,投票给候选人;

(5.2   5.4)

日志发送RPC:(leader 向 follower 发送)

参数:

  • term:leader 的 term
  • leaderId:
  • prevLogIndex: 紧接着最新条目之前的索引
  • prevLogTerm: prevLogIndex 的 term
  • entries[]:log 元素,(心跳的 entries 为 empty)
  • leaderCommit:leader 的 commitIndex

Servers的规则:

所有 Servers:

  • 如果  commitIndex> lastapplied: 升高 lastApplied,apply log[lastapplied] 到 状态机($5.3)
  • 如果 RPC 收到的 request 和收到的其他机器的 response 包含 term T > 本机currentTerm 则 set 本机 currentTerm = T,并本机转为 follower;

Followers 

  • 回应 leaders 和 condidate 的消息;
  • 如果 选举 超时还没收到来自 current leader 或者 candidate 的 AppendEntries RPC ,则自己变为 candidate

Candidates

  • 转为 candidate ,开始选举
    • 增加本机当前 currentTerm
    • 投票给自己
    • 重置 election timer(选举时间)
    • 发送 投票的 request RPC 给所有其他 servers
  • 如果收到大部分的投票:成为 leader
  • 如果 搜到新的 leader 的 AppendEntries 则自己变为 follower;
  • 如果选举超时,则自己开始一轮新的选举;

Leaders:

  • 自己成为 leader 之后:发送一个初始空的 心跳 AppendEntries RPC 给每个 server,并且在空闲期间也重复发送心跳防止 election 超时;
  • 收到 client 的请求之后 append 到 local log,在 entry apply 到状态机之后返回给客户端(5.3)


图3:Raft guarantees that each of these properties is true at all times. The section numbers indicate where each prop- erty is discussed.

  • Election Safety: at most one leader can be elected in a given term. §5.2
  • Leader Append-Only: a leader never overwrites or deletes entries in its log; it only appends new entries. §5.3
  • Log Matching: if two logs contain an entry with the same index and term,then the logs are identical in all entries up through the given index. §5.3
  • Leader Completeness: if a log entry is committed in a given term, then that entry will be present in the logs of the leaders for all higher-numbered terms. §5.4
  • State Machine Safety: if a server has applied a log entry at a given index to its state machine, no other server will ever apply a different log entry for the same index. §5.4.3

图 3: 

选举安全:没轮只选出一个 leader;

leader 只能追加日志:leader 不会覆盖或者删除 log 中的元素,只 append 新元素;

log matching:如果两条 log 有相同的 index 好 eterm,那么给定的 index 中所有日志都完全相同

leader 完整性:如果一条 log 元素被提交到某个 term,那么这个元素将出现在 term 更高的 leader 的日志中;

状态机安全:如果一个 server 已经将索引 apply 到状态机,那么其他 server 不会为相同的 index apply 不同的索引;

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值