5.4.2 Committing entries from previous terms
提交之前term的日志entry
As described in Section 5.3, a leader knows that an entry from its current term is committed once that entry is stored on a majority of the servers. If a leader crashes before committing an entry, future leaders will attempt to finish replicating the entry. However, a leader cannot immediately conclude that an entry from a previous term is committed once it is stored on a majority of servers. Figure 8 illustrates a situation where an old log entry is stored on a majority of servers, yet can still be overwritten by a future leader.
如 5.3 所述, 一旦当前任期内的某个日志条目已经存储到过半的服务器节点上,leader 就知道该日志条目已经被提交了。???
如果 leader 在提交一个 entry 之前 crash 了,新的 leader 将尝试 完成这个 entry 的复制;
然而, 如果是之前任期内的某个日志条目已经存储到过半的服务器节点上,leader 也无法立即断定该日志条目已经被提交了。图 8 展示了一种情况,一个已经被存储到过半节点上的老日志条目,仍然有可能会被未来的 leader 覆盖掉。
To eliminate problems like the one in Figure 8, Raft never commits log entries from previous terms by counting replicas. Only log entries from the leader’s current term are committed by counting replicas;
once an entry from the current term has been committed in this way, then all prior entries are committed indirectly because of the Log Matching Property. There are some situations where a leader could safely conclude that an older log entry is committed (for example, if that entry is stored on every server), but Raft takes a more conservative approach for simplicity.
为了排除 图 8 的问题,Raft 从不通过计算副本来提交之前任期的日志条目,只有 leader 当前任期内的日志条目才通过计算副本数目的方式来提交;
一旦当前 term 的某个日志条目以这种方式被提交,那么由于日志匹配特性,之前的所有日志条目也都会被间接地提交。在某些情况下,领导人可以安全地断定一个老的日志条目已经被提交(例如,如果该条目已经存储到所有服务器上),但是 Raft 为了简化问题使用了一种更加保守的方法。
Figure 8: A time sequence showing why a leader cannot determine commitment using log entries from older terms.
In (a) S1 is leader and partially replicates the log entry at index 2.
(a)S1 是 leader 并且部分复制了 log 到 index2
In (b) S1 crashes; S5 is elected leader for term 3 with votes from S3, S4, and itself, and accepts a different entry at log index 2.
(b)S1 crash;S5 被选为term3 的 leader(s3,s4,和 S5 自己的投票);并且接受了不同搞的 entry 在 index2 的位置;
In (c) S5 crashes; S1 restarts, is elected leader, and continues replication. At this point, the log entry from term 2 has been replicated on a majority of the servers, but it is not committed.
(c)s5 crash,s1 重启,被选为 leader,并且继续复制。此时, term 2的 log entry 已经被复制到了大多数 server 上,但是还没有提交
If S1 crashes as in (d), S5 could be elected leader (with votes from S2, S3, and S4) and overwrite the entry with its own entry from term 3.
(d)S1 crash ,S5 可以被选为 leader(S2,S3,S4 投票)并且使用 term3 的数据覆盖了 entry
However, if S1 replicates an entry from its current term on a majority of the servers before crashing, as in (e), then this entry is committed (S5 cannot win an election). At this point all preceding entries in the log are committed as well.
(e)然而,如果 S1 crash 前复制一个 entry (在大多数 server 上)从当前 term。然后 entry 被提交(S5 就不可能选举成功)。在这种情况下,之前的所有日志('2')也被提交了。
(图8主要是说明实际情况,d或者e都有可能产生,所以“leader 也无法立即断定该日志条目已经被提交了”)
Raft incurs this extra complexity in the commitment rules because log entries retain their original term numbers when a leader replicates entries from previous terms. In other consensus algorithms, if a new leader rereplicates entries from prior “terms,” it must do so with its new “term number.” Raft’s approach makes it easier to reason about log entries, since they maintain the same term number over time and across logs. In addition, new leaders in Raft send fewer log entries from previous terms than in other algorithms (other algorithms must send redundant log entries to renumber them before they can be committed).
Raft 会在提交规则上增加额外的复杂性是因为当 leader 复制之前任期内的日志条目时,这些日志条目都保留原来的任期号。在其他的一致性算法中,如果一个新的 leader 要重新复制之前的任期里的日志时,它必须使用当前新的任期号。Raft 的做法使得更加容易推导出(reason about)日志条目,因为他们自始至终都使用同一个任期号。另外,和其他的算法相比,Raft 中的新 leader 只需要发送更少的日志条目(其他算法中必须在它们被提交之前发送更多的冗余日志条目来给它们重新编号)。