Raft TLA+形式化验证

TLA+

------------------------------- MODULE Raft -------------------------------

EXTENDS Naturals, FiniteSets, Sequences, TLC

CONSTANTS Server, Value, Follower, Candidate, Leader, Nil, 
    RequestVoteRequest, RequestVoteResponse, AppendEntriesRequest, 
    AppendEntriesResponse

VARIABLES messages, elections, allLogs, currentTerm, state, votedFor, log, 
    commitIndex, votesResponded, votesGranted, voterLog, nextIndex, matchIndex

leaderVars == <<nextIndex, matchIndex, elections>>
logVars == <<log, commitIndex>>
candidateVars == <<votesResponded, votesGranted, voterLog>>
serverVars == <<currentTerm, state, votedFor>>
vars == <<messages, allLogs, serverVars, candidateVars, leaderVars, logVars>>

Quorum == {i \in SUBSET Server: Cardinality(i)*2 > Cardinality(Server)}

LastTerm(xlog) == IF Len(xlog) # 0 THEN xlog[Len(xlog)].term ELSE 0

WithMessage(m, msgs) == 
    IF m \in DOMAIN msgs THEN
        [msgs EXCEPT ![m] = msgs[m] + 1]
    ELSE
        msgs @@ (m :> 1)

WithoutMessage(m, msgs) ==
    IF m \in DOMAIN msgs THEN
        [msgs EXCEPT ![m] = msgs[m] - 1]
    ELSE
        msgs

Send(m) == messages' = WithMessage(m, messages)

Discard(m) == messages' = WithoutMessage(m, messages)

Reply(response, request) ==
    messages' = WithoutMessage(request, WithMessage(response, messages))

Min(m) == CHOOSE x \in m: \A y \in m: y >= x

Max(m) == CHOOSE x \in m: \A y \in m: y <= x


----

InitHistoryVars == 
    /\ elections = {}
    /\ allLogs = {}
    /\ voterLog = [i \in Server |-> [j \in {} |-> <<>> ]]

InitServerVars == 
    /\ currentTerm = [i \in Server |-> 1]
    /\ state = [i \in Server |-> Follower]
    /\ votedFor = [i \in Server |-> Nil]

InitCandidateVars == 
    /\ votesResponded = [i \in Server |-> {}]
    /\ votesGranted = [i \in Server |-> {}]

InitLeaderVars == 
    /\ nextIndex = [i \in Server |-> [j \in Server |-> 1]]
    /\ matchIndex = [i \in Server |-> [j \in Server |-> 0]]

InitLogVars == 
    /\ log = [i \in Server |-> <<>>]
    /\ commitIndex = [i \in Server |-> 0]

Init == 
    /\ messages = [ m \in {} |-> 0 ]
    /\ InitHistoryVars
    /\ InitServerVars
    /\ InitCandidateVars
    /\ InitLeaderVars
    /\ InitLogVars

Restart(i) ==
    /\ state' = [state EXCEPT ![i] = Follower]
    /\ votesResponded' = [votesResponded EXCEPT ![i] = {}]
    /\ votesGranted' = [votesGranted EXCEPT ![i] = {}]
    /\ voterLog' = [voterLog EXCEPT ![i] = [j \in {} |-> <<>>]]
    /\ nextIndex' = [nextIndex EXCEPT ![i] = [j \in Server |-> 1]]
    /\ matchIndex' = [matchIndex EXCEPT ![i] = [j \in Server |-> 0]]
    /\ commitIndex' = [commitIndex EXCEPT ![i] = 0]
    /\ UNCHANGED <<messages, currentTerm, votedFor, log, elections>>

Timeout(i) == 
    /\ state[i] \in {Follower, Candidate}
    /\ state' = [state EXCEPT ![i] = Candidate]
    /\ currentTerm' = [currentTerm EXCEPT ![i] = @ + 1]
    /\ votedFor' = [votedFor EXCEPT ![i] = Nil]
    /\ votesResponded' = [votesResponded EXCEPT ![i] = {}]
    /\ votesGranted' = [votesGranted EXCEPT ![i] = {}]
    /\ voterLog' = [voterLog EXCEPT ![i] = [j \in {} |-> <<>> ]]
    /\ UNCHANGED <<messages, leaderVars, logVars>>

RequestVote(i, j) == 
    /\ state[i] = Candidate
    /\ j \notin votesResponded[i]
    /\ Send([mtype         |-> RequestVoteRequest,
             mterm         |-> currentTerm[i],
             mlastLogTerm  |-> LastTerm(log[i]),
             mlastLogIndex |-> Len(log[i]),
             msource       |-> i,
             mdest         |-> j])
    /\ UNCHANGED <<serverVars, candidateVars, leaderVars, logVars>>

AppendEntries(i, j) == 
    /\ i # j
    /\ state[i] = Leader
    /\  LET prevLogIndex == nextIndex[i][j] - 1
            prevLogTerm == IF prevLogIndex # 0 THEN 
                log[i][prevLogIndex].term 
                ELSE 0
            lastEntry == Min({Len(log[i]), nextIndex[i][j]})
            entries == SubSeq(log[i], nextIndex[i][j], lastEntry)
        IN Send([
            mtype |-> AppendEntriesRequest,
            mterm |-> currentTerm[i],
            mprevLogIndex |-> prevLogIndex,
            mprevLogTerm |-> prevLogTerm,
            mentries |-> entries,
            mlog |-> log[i],
            mcommitIndex |-> Min({commitIndex[i], lastEntry}),
            msource |-> i,
            mdest |-> j
            ])
    /\ UNCHANGED <<serverVars, candidateVars, leaderVars, logVars>>

BecomeLeader(i) ==
    /\ state[i] = Candidate
    /\ votesGranted[i] \in Quorum
    /\ state' = [state EXCEPT ![i] = Leader]
    /\ nextIndex' = [nextIndex EXCEPT ![i] = [j \in Server |-> Len(log[i]) + 1]]
    /\ matchIndex' = [matchIndex EXCEPT ![i] = [j \in Server |-> 0]]
    /\ elections' = elections \union {[eterm |-> currentTerm[i], eleader |-> i, 
        elog |-> log[i], evotes |-> votesGranted[i], evoterLog |-> voterLog[i]]}
    /\ UNCHANGED <<messages, currentTerm, votedFor, candidateVars, logVars>>

ClientRequest(i, v) == 
    /\ state[i] = Leader
    /\  LET
            entry == [term |-> currentTerm[i], value |-> v]
            newLog == Append(log[i], entry)
        IN
            log' = [log EXCEPT ![i] = newLog]
    /\ UNCHANGED <<messages, serverVars, candidateVars, leaderVars, commitIndex>>

AdvanceCommitIndex(i) ==
    /\ state[i] = Leader
    /\  LET
            Agree(index) == {i} \union {k \in Server: matchIndex[i][k] >= index}
            agreeIndexes == {index \in 1..Len(log[i]): Agree(index) \in Quorum}
            newCommitIndex == 
                IF  /\ agreeIndexes # {}
                    /\ log[i][Max(agreeIndexes)].term = currentTerm[i]
                THEN
                    Max(agreeIndexes)
                ELSE
                    commitIndex[i]
        IN
            commitIndex' = [commitIndex EXCEPT ![i] = newCommitIndex]
    /\ UNCHANGED <<messages, serverVars, candidateVars, leaderVars, log>>

HandleRequestVoteRequest(i, j, m) ==
    LET 
        logOK == 
            \/ m.mlastLogTerm > LastTerm(log[i])
            \/  /\ m.mlastLogTerm = LastTerm(log[i])
                /\ m.mlastLogIndex >= Len(log[i])
        grant == 
            /\ m.mterm = currentTerm[i]
            /\ logOK
            /\ votedFor[i] \in {Nil, j}
    IN
        /\ m.mterm <= currentTerm[i]
        /\  \/ grant /\ votedFor' = [votedFor EXCEPT ![i] = j]
            \/ ~grant /\ UNCHANGED votedFor
        /\ Reply([mtype |-> RequestVoteResponse, mterm |-> currentTerm[i], 
            mvoteGranted |-> grant, mlog |-> log[i], msource |-> i, mdest |-> j], m)
        /\ UNCHANGED <<state, currentTerm, candidateVars, leaderVars, logVars>>

HandleRequestVoteResponse(i, j, m) ==
    /\ m.mterm = currentTerm[i]
    /\ votesResponded' = [votesResponded EXCEPT ![i] = votesResponded[i] \union {j}]
    /\  \/  /\ m.mvoteGranted
            /\ votesGranted' = [votesGranted EXCEPT ![i] = @ \union {j}]
            /\ voterLog' = [voterLog EXCEPT ![i] = voterLog[i] @@ (j:>m.mlog)]
        \/  /\ ~m.mvoteGranted
            /\ UNCHANGED <<votesGranted, voterLog>>
    /\ Discard(m)
    /\ UNCHANGED <<serverVars, votedFor, leaderVars, logVars>>

HandleAppendEntriesRequest(i, j, m) ==
    LET 
        logOK ==
            \/ m.mprevLogIndex = 0
            \/  /\  m.mprevLogIndex > 0
                /\  m.mprevLogIndex <= Len(log[i])
                /\  m.prevLogTerm = log[i][m.mprevLogIndex].term
    IN
        /\ m.mterm <= currentTerm[i]
        /\  \/  /\
                    \/ m.mterm < currentTerm[i]
                    \/  /\ m.mterm = currentTerm[i]
                        /\ state[i] = Follower
                        /\ ~logOK
                /\ Reply([mtype |-> AppendEntriesResponse, mterm |-> currentTerm[i], 
                    msuccess |-> FALSE, mmatchIndex |-> 0, msource |-> i, mdest |-> j], m)
                /\ UNCHANGED <<serverVars, logVars>>
            \/  /\ m.mterm = currentTerm[i]
                /\ state[i] = Candidate
                /\ state' = [state EXCEPT ![i] = Follower]
                /\ UNCHANGED <<currentTerm, votedFor, logVars, messages>>
            \/  /\ m.mterm = currentTerm[i]
                /\ state[i] = Follower
                /\ logOK
                /\ 
                    LET
                        index == m.mprevIndex + 1
                    IN
                        \/  
                            /\  \/ m.mentries = <<>>
                                \/  /\ Len(log[i]) >= index
                                    /\ log[i][index].term = m.mentries[1].term
                            /\ commitIndex' = 
                                [commitIndex EXCEPT ![i] = m.mcommitIndex]
                            /\ Reply([mtype |-> AppendEntriesResponse, 
                                mterm |-> currentTerm[i], 
                                msuccess |-> TRUE, 
                                mmatchIndex |-> m.mprevLogIndex + Len(m.mentries), 
                                msource |-> i, mdest |-> j], m)
                            /\ UNCHANGED <<serverVars, logVars>>
                        \/
                            /\ m.mentries # <<>>
                            /\ Len(log[i]) >= index
                            /\ log[i][index].term # m.mentries[1].term
                            /\
                                LET new == [index2 \in 1..(Len(log[i])-1) |-> log[i][index2]]
                                IN  log' = [log EXCEPT ![i] = new]
                            /\ UNCHANGED <<serverVars, commitIndex, messages>>
                        \/
                            /\ m.mentries # <<>>
                            /\ Len(log[i]) = m.mprevIndex
                            /\ log' = [log EXCEPT ![i] = Append(log[i], m.mentries)]
                            /\ UNCHANGED <<serverVars, commitIndex, messages>>
        /\ UNCHANGED <<candidateVars, leaderVars>>

HandleAppendEntriesResponse(i, j, m) ==
    /\ m.mterm = currentTerm[i]
    /\  \/  /\ m.msuccess
            /\ nextIndex' = [nextIndex EXCEPT ![i][j] = m.matchIndex + 1]
            /\ matchIndex' = [matchIndex EXCEPT ![i][j] = m.matchIndex]
        \/  /\ ~m.msucess
            /\ nextIndex' = [nextIndex EXCEPT ![i][j] = Max({nextIndex[i][j] - 1, 1})]
            /\ UNCHANGED matchIndex
    /\ Discard(m)
    /\ UNCHANGED <<serverVars, candidateVars, logVars, elections>>

UpdateTerm(i, j, m) ==
    /\ currentTerm'    = [currentTerm EXCEPT ![i] = m.mterm]
    /\ state'          = [state       EXCEPT ![i] = Follower]
    /\ votedFor'       = [votedFor    EXCEPT ![i] = Nil]
    /\ UNCHANGED <<messages, candidateVars, leaderVars, logVars>>

DropStaleResponse(i, j, m) ==
    /\ m.mterm < currentTerm[i]
    /\ Discard(m)
    /\ UNCHANGED <<serverVars, candidateVars, leaderVars, logVars>>

HandleMsg(m, i, j) == 
    \/ /\ m.mtype = RequestVoteRequest
       /\ HandleRequestVoteRequest(i, j, m)
    \/ /\ m.mtype = RequestVoteResponse
       /\ \/ DropStaleResponse(i, j, m)
          \/ HandleRequestVoteResponse(i, j, m)
    \/ /\ m.mtype = AppendEntriesRequest
       /\ HandleAppendEntriesRequest(i, j, m)
    \/ /\ m.mtype = AppendEntriesResponse
       /\ \/ DropStaleResponse(i, j, m)
          \/ HandleAppendEntriesResponse(i, j, m)

Receive(m) ==
    LET 
        i == m.mdest
        j == m.msource
    IN
        IF m.mterm > currentTerm[i] THEN
            UpdateTerm(i, j, m)
        ELSE
            HandleMsg(m, i, j)

DuplicateMessage(m) == 
    /\ Send(m)
    /\ UNCHANGED <<serverVars, leaderVars, candidateVars, logVars>>

DropMessage(m) == 
    /\ Discard(m)
    /\ UNCHANGED <<serverVars, leaderVars, candidateVars, logVars>>

Next == 
    /\  \/ \E i \in Server: Restart(i)
        \/ \E i \in Server: Timeout(i)
        \/ \E i, j \in Server: RequestVote(i, j)
        \/ \E i \in Server: BecomeLeader(i)
        \/ \E i \in Server, v \in Value: ClientRequest(i, v)
        \/ \E i \in Server: AdvanceCommitIndex(i)
        \/ \E i, j \in Server: AppendEntries(i, j)
        \/ \E m \in DOMAIN messages: Receive(m)
        \/ \E m \in DOMAIN messages: DuplicateMessage(m)
        \/ \E m \in DOMAIN messages: DropMessage(m)
    /\ allLogs' = allLogs \union {log[i]: i \in Server}

Spect == Init /\ [][Next]_vars

=============================================================================

TLA Toolbox 运行参数

AppendEntriesResponse <- [ model value ]
Follower <- [ model value ]
Leader <- [ model value ]
Nil <- [ model value ]
RequestVoteResponse <- [ model value ]
Candidate <- [ model value ]
RequestVoteRequest <- [ model value ]
AppendEntriesRequest <- [ model value ]
Value <- {1,2}
Server <- [ model value ] <symmetrical> {s1, s2, s3}
  • Raft相较于Paxos更具有工程实践的意义,主要注重协议的落地性和可理解性。有关Paxos算法形式化验证的相关内容可参考PaxosCommit TLA+形式化验证
  • Raft强化了Leader的作用,利用基于时间的Lease机制来进行Leader选举和角色转换,也更容易实现Basic Paxos的P2c约束。Raft为每个Leader设定了Term,在每个任期内至多有1位Leader,且任期内产生的log均附带这一信息。
  • 上述例子仅用来验证基础的Raft一致性算法,不涉及分布式成员配置的更新,log压缩和客户端交互部分
  • Restart作为随时肯能发生的事件是不可预计的,每个节点重启后则转为Follower,重置其可能作为Candidate时相关的变量(其他节点对它的投票信息votesRespondedvotesGrantedvoterLog),重置Leader特有的熟悉nextIndexmatchIndex,重置commitIndex
  • Timeout在本例中属于不可预计的事件,仅会发生在FollowerCandidate上。超时后节点转为Candidate,将节点内保存的currentTerm+1,重置其对其他节点的投票votedFor,重置其可能作为Candidate时相关的变量(其他节点对它的投票信息votesRespondedvotesGrantedvoterLog
  • RequestVote(i, j)表示Candidate i 向 j 发送消息请求投票。消息中必须含有:当前节点Term,最新log的Term和Index
  • AppendEntries(i, j)表示由Leader i 向 j 发送 AppendEntriesRequest,消息中需包含:当前节点Term,从下标nextIndex[i][j]开始的log,nextIndex[i][j] - 1对应的Term和Index,已提交log的下标commitIndex[i]
  • BecomeLeader(i)当Candidate i 获得多数投票变成Leader后,将当前节点的nextIndex[i]置为Len(log[i]) + 1matchIndex[i]置为0
  • ClientRequest(i, v)表示客户端向Leader i 发起请求添加 v。通常工程实践中会由其他节点转发给Leader,此处省略
  • AdvanceCommitIndex(i)表示Leader i 更新其已提交log的下标commitIndex[i]。因为Raft保证log的连续性,所以找出被多数节点通过的最大下标的log,并判断其产生时的任期与当前是否相同并更新 commitIndex
  • HandleRequestVoteRequest(i, j, m)表示节点 i 接收并处理节点 j 的RequestVoteRequest请求 m (当且仅当m.mterm <= currentTerm[i])。每个节点仅会投票给含有最多log的节点,详见 grant ,投票成功与否均返回消息
  • HandleRequestVoteResponse(i, j, m)表示节点 i 接收并处理节点 j 的 RequestVoteResponse消息 m(当且仅当m.mterm = currentTerm[i])。根据投票的结果来决定是否成为Leader
  • HandleAppendEntriesRequest(i, j, m)表示节点 i 处理节点 j 的AppendEntriesRequest请求 m(当且仅当m.mterm <= currentTerm[i])。需要注意的是在该处理中,如果m.mterm = currentTerm[i] /\ state[i] = Candidate则节点 i 转为 Follower。m 中包含的prevLog的信息用作判断log是否合法。可能存在的情况分为3种:entries已经存在于节点 i 中,则向 j 发消息更新 matchIndex;entries冲突,则回退节点 i 的log;正常添加entries。处理冲突时,每次回退一个log太过于低效,实际应用中会由Leader发送snapshot再进行处理
  • HandleAppendEntriesResponse(i, j, m)表示节点 i 处理节点 j 的AppendEntriesResponse请求 m(当且仅当m.mterm = currentTerm[i])。根据添加entries的成功与否来更新nextIndexmatchIndex
  • UpdateTerm表示节点在收到每条RPC时会判断currentTerm是否最新,并修改currentTerm、state、votedFor
  • DropStaleResponse表示每个节点会丢掉Term小于自身的RPC

Safety

Time and availability

在raft中,election timeout的值需要满足如下条件:

broadcastTime << electionTimeout << MTBF

其中broadcastTimeout是server并行发送给其他server RPC并收到回复的时间;electionTimeout是选举超时时间;MTBF是一台server两次故障的间隔时间。
electionTimeout要大于broadcastTimeout的原因是,防止follower因为还没收到leader的心跳,而重新选主。
electionTimeout要小于MTBF的原因是,防止选举时,能正常工作的server没有达到大多数。
对于boradcastTimeout,一般在[0.5ms,20ms]之间,而MTBF一般非常大,至少是按照月为单位。因此,一般electionTimeout一般选择范围为[10ms,500ms]。因此,当leader挂掉后,能在较短时间内重新选主。

Cluster Membership Changes

在集群server发生变化时,不能一次性的把所有的server配置信息从老的替换为新的,因为,每台server的替换进度是不一样的,可能会导致出现双主的情况,如下图:
cluster_memship_wrong
如上图,Server 1和Server 2可能以Cold配置选出一个主,而Server 3,Server 4和Server 5可能以Cnew选出另外一个主,导致出现双主。

raft使用两阶段的过程来完成上述转换:

  • 第一阶段,新老配置都存在,称为joint consensus
  • 第二阶段,替换成新配置

raft_memship_right

  • leader首先创建Cold,new的log entry,然后提交(保证大多数的old和大多数的new都接收到该log entry);
  • leader创建Cnew的log entry,然后提交,保证大多数的new都接收到了该log entry。

这个过程中,有几个问题需要考虑。

  • 新加入的server一开始没有存储任何的log entry,当它们加入到集群中,可能有很长一段时间在追加日志的过程中,导致配置变更的log entry一直无法提交

Raft为此新增了一个阶段,此阶段新的server不作为选举的server,但是会从leader接受日志,当新加的server追上leader时,才开始做配置变更。

  • 原来的主可能不在新的配置中

在这种场景下,原来的主在提交了Cnew log entry(计算日志副本个数时,不包含自己)后,会变成follower状态。

  • 移除的server可能会干扰新的集群

移除的server不会受到新的leader的心跳,从而导致它们election timeout,然后重新开始选举,这会导致新的leader变成follower状态。Raft的解决方案是,当一台server接收到选举RPC时,如果此次接收到的时间跟leader发的心跳的时间间隔不超过最小的electionTimeout,则会拒绝掉此次选举。这个不会影响正常的选举过程,因为,每个server会在最小electionTimeout后发起选举,而可以避免老的server的干扰。

以上方式很难找到实际的实现案例,ETCD采用的成员变更方式如下

raft作者提出了一种比较简单的方法,一次只增加或减少一个成员,这样能够保证任何时刻,都不可能出现两个不相交的majority,所以,可以从旧成员组直接切到新成员组。

切换的时机是把成员变更日志写盘的时候,不管是否commit。这个切换时机带来的问题是如果这条成员变更日志最终没有commit,在发生leader切换的时候,成员组就需要回滚到旧的成员组。

Log compaction

Raft的日志会随着处理客户端请求数量的增多而不断增大,在实际系统中,日志不可能会无限地增长,原因如下:

  • 占用的存储空间随着日志增多而增加
  • 日志越多,server当掉重启时需要回放的时间就越长

因此,需要定期地清理日志,Raft采用最简单的快照方法。对系统当前做快照时,会把当前状态持久化到存储中,然后到快照点的日志项都可以被删除。

raft log compaction

Raft算法中每个server单独地做快照,即把当前状态机的状态写入到存储中(状态机中的状态都是已提交的log entry回放出来的)。除了状态机的状态外,Raft快照中还需要一些元数据信息,包括如下:

  • 快照中包含的最后一个log entry的index和term,记录这些信息的目的是为了使得AppendEntriesRPC的一致性检查能通过,因为,在复制紧跟着快照后的log entry时,AppendEntries RPC带上需要复制的log entry前一个log entry的(index, iterm),即快照的最后一个log entry的(index,term),因此,快照中需要记录最后一个log entry的(index,term)
  • 为了支持集群成员变更,快照中保存的元数据还会存储集群最新的配置信息。

当server完成快照后,可以删除快照最后一个log entry及其之前所有的log entry,以及之前的快照。

虽然每个server是独立地做快照的,但是也有可能存在需要leader向follower发送整个快照的情况,例如,一个follower的日志处于leader的最近一次快照之前,恰好leader做完快照之后把其快照中的log entry都删除了,这时,leader就无法通过发送log entry来同步了,只能通过发送完整快照。

leader通过InstallSnapshot RPC来完成发送快照的功能,follower收到此RPC后,根据不同情况会有不同的处理:

当follower中缺失快照中的日志时

  • follower会删除掉其上所有日志,并清空状态机

当follower中拥有快照中所有的日志时

  • follower会删掉快照所覆盖的log entry,但快照后所有日志都保留。备注:这里论文中没有提是否还是从leader接受快照,个人觉得follower可以自己做快照,并拒绝掉leader发快照的RPC请求

对于Raft快照,关于性能需要考虑的点有:

  • server何时做快照,太频繁地做快照会浪费磁盘I/O;太不频繁会导致server当掉后回放时间增加,可能的方案为当日志大小到一定空间时,开始快照。备注:如果所有server做快照的阈值空间都是一样的,那么快照点也不一定相同,因为,当server检测到日志超过大小,到其真正开始做快照中间还存在时间间隔,每个server的间隔可能不一样
  • 写快照花费的时间很长,不能让其影响正常的操作。可以采用copy-on-write操作,例如linux的fork

Client Interaction

Raft的client会把所有的请求发到leader上执行,在client刚启动时,会随机选择集群中的一个server

  • 如果选择的server是leader,那么client会把请求发到该server上
  • 如果选择的server不是leader,该server会把leader的地址告诉给client,后续client会把请求发给该leader
  • 如果此时没有leader,那么client会timeout,client会重试其他server,直到找到leader

Raft的目标使得client是linerizable的,即每个操作几乎是瞬间的,在其调用到返回结果的某个时间点,执行其执行一次。由于需要client的请求正好执行一次,这就需要client的配合,当leader挂掉之后,client需要重试其请求,因为有可能leader挂掉之前请求还没有成功执行。但是,也有可能leader挂掉之前,client的请求已经执行完成了,这时候就需要新leader能识别出该请求已经执行过,并返回之前执行的结果。可以通过为client的每个请求分配唯一的编号,当leader检测到请求没有执行过时,则执行它;如果执行过,则返回之前的结果。

只读的请求可以不写log就能执行,但是它有可能返回过期的数据,有如下场景:

  • 老的leader挂掉了,但它自身还认为自己是leader,于是client发读请求到该server时,可能获得的是老数据

可通过Lease Read解决上述问题

优缺点

  • Multi-Paxos在工程实现上更难,适合多点写入的scale out场景
  • Raft中所有数据以Leader为基准,工程实现上相对简单,Leader也易成为性能瓶颈,适合scale up

关于乱序

  • Raft和MultiPaxos的背后都是Lamport的Replicated State Machine这个模型,而乱序执行本身违反Replicated State Machine的定义,或者说Replicated State Machine本身的一个性能受限因素就是它所引入的totally order commands这个约束
  • Parallel Raft的’乱序执行’有两个重大前提
    • log entry之间的依赖能轻易的抽取、描述
    • “乱序执行”的log entry之间无依赖冲突

  • 5
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值