Raft论文笔记

最新推荐文章于 2024-08-01 15:49:53 发布

odd-point

最新推荐文章于 2024-08-01 15:49:53 发布

阅读量607

点赞数

分类专栏：分布式系统

本文链接：https://blog.csdn.net/zyx112334/article/details/52739681

版权

分布式系统专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Reference ：In Search of an Understandable Consensus Algorithm (Extended Version)

Term是Raft算法中的逻辑时钟，用来识别过期的RPC Requests和不一致的Log Entries。

Client只能向Leader提交command。

当系统中没有Leader时，Client被系统拒绝服务，直到系统中Leader出现。

whenever AppendEntries returns successfully, the leader knows that the follower’s log is identical to its own log up through the new entries

AppendEntries的几种情况（前提是AppendEntries没有被拒）：

follower:
-----------------
leader:
                        ---------
Leader探测一致点失败，之后Leader重试

follower:
-------------x---
leader:
        -----x---------
Leadr探测一致点成功，一致点之后存在冲突
-------------(----------)

follower:
-----------------
leader:
        ---------------
Leader探测一致点成功，一致点之后没有冲突
-----------------(------)

follower:
-----------------
leader:
        --x--
Leader探测一致点成功，一致点之后存在冲突
----------(---)

follower:
-----------------
leader:
        -----
Leadr探测一致点成功，一致点之后没有冲突，follower超出部分是保留的
-----------------

If desired, the protocol can be optimized to reduce the number of rejected AppendEntries RPCs. For example, when rejecting An AppendEnrties request, the follower can include the term of the conficting entry and the first index it stroes for that term

这不是一个寻找准确最后一致点的方法，但最后一定会找到一个一致点（在最后一致点之前）。这样可以大幅减少Leader探测的次数，但是也有一个缺点就是AppendEntries RPC所携带的Entries信息冗余，浪费带宽。

If the logs have last entries with different terms, tehn the log with the later term is more up-to-date. If the logs end with the same term, then whichever log is longer is more up-to-date

voter只有在candidate比自己更加up-to-date的时候才会把票投给candidate。

初读的时候一直在纠结为什么要选择这样一个标准来限制投票，结合后面的Leader Completeness证明才能稍稍理解。

Leader Completeness：if a log entry is committed in a given term, then that entry will be present in te logs of the leaders for all higher-terms

反证法：假设 $Leader_T$ 在任期T内提交了 $Entry_T$ ，在任期 $U$ 内（ $U > T$ ）， $Leader_U$ 的日志中没有出现 $Entry_T$ ，U是满足该条件的距T最近的任期

根据Leader的Append-Only性质， $Leader_U$ 在发起选举的时候日志里就没有 $Entry_T$
因为 $Entry_T$ 已经提交，所以该entry在大多数机器之上都有存储。而 $Leader_U$ 在选举的时候成功获得的大部分选票，说明至少有一台机器既在日志中存储了 $Entry_T$ ，有给 $Leader_U$ 投了票。记这台机器为 $Voter$
机器 $V$ 一定是在 $Leader_U$ 发起选举之前就已经从 $Leader_T$ 那里复制了 $Entry_T$ ，否则先投票会V的任期号变为U，从而拒绝 $Entry_T$ 的AppendEntries请求
因为从任期T到任期U-1，任何Leadr都持有 $Entry_T$ （反证假设），Leader的Append-Only性质和Follower只修改和Leader矛盾的日志部分，所以 $Voter$ 在为 $Leader_U$ 投票时仍旧持有 $Entry_T$ 。
根据投票标准有两种可能：
1. $Voter$ 的最后一条日志任期号等于Leader的最后一条日志任期号，而后者的日志长度更长。假设该任期号为 $W$ ， $T\le W<U$ 。 $Voter$ 和 $Leader_U$ 一定在任期 $W$ 内接受了 $Leader_W$ AppendEntries的请求，该请求会使follower的日志与 $Leader_W$ 的日志保持一致，所以 $Voter$ 的日志是 $Leader_U$ 日志的子集。所以推出 $Leader_U$ 这个时候是包含 $Entry_T$ 的，这与1矛盾
2. $Voter$ 的最后一条日志任期号小于 $Leader_U$ 的最后一条日志任期号。因为 $Voter$ 的最后一条日志任期号至少是 $T$ ，所以 $Leader_U$ 的最后一条日志任期号 $W > T$ 。那么 $Leader_W$ 在任期内向 $Leader_U$ 添加日志时一定会确保 $Entry_T$ 出现在 $Leader_U$ 的日志中（反证假设+AppendEntries性质），所以与1矛盾

If a follower or candidate crashes, then the future RequestVotte and AppendEntries RPCs sent to ti will fail. Raft handles these failures by retrying indefinitly

AppendEntries和RequsteVote并没有实现为失败后立即重试，而是把这两个调用实现为Leader和Candidate的周期性动作。本周期内的RPC调用失败，那就延迟到下一个周期内重试，本周期内只处理成功返回的调用结果。这里Leader的周期设定的是50ms，而Candidate的周期则是election timeout。这样的好处是能够把Leader能把复制日志条目和维持心跳的行为统一起来，代码写得会简洁一些。

Raft never commits log entries from previous terms by counting replicas. Only log entries from the leader’s current term are commited by counting replicas.

Figure8解释的很清楚，即使Leader当前任期前的日志存在与大多数机器上，也有可能被之后的Leader盖掉。Leader提交日志的根据只能是当前任期内添加的Entries的分布情况。