TLA+
- RM:资源管理
- Acceptors:提案接收者
- Majority:多数Acceptors的集合
- Ballot:提案的投票
- rmState:RM状态
- aState:Acceptor状态
- msgs:消息的集合
----------------------------- MODULE PaxosCommit ----------------------------
EXTENDS Integers
CONSTANT RM, Acceptors, Majority, Ballot
VARIABLES rmState, aState, msgs
CONSTANT prepared, aborted, none, phase1a, phase1b, phase2a, phase2b, commit,
abort, working, committed
Max(s) == IF s # {} THEN CHOOSE x \in s: \A y \in s: x >= y
ELSE -1
ASSUME
/\ Ballot \subseteq Nat
/\ 0 \in Ballot
/\ Majority \subseteq SUBSET Acceptors
/\ \A ms1, ms2 \in Majority: ms1 \intersect ms2 # {}
Messages ==
[type: {phase1a}, ins: RM, bal: Ballot \ {0}]
\union
[type: {phase1b}, ins: RM, bal: Ballot \union {-1}, mbal: Ballot,
val: {prepared, aborted, none}, acc: Acceptors ]
\union
[type: {phase2a}, ins: RM, bal: Ballot, val: {prepared, aborted} ]
\union
[type: {phase2b}, ins: RM, bal: Ballot, val: {prepared, aborted},
acc: Acceptors ]
\union
[type: {commit, abort}]
PCTypeOK ==
/\ rmState \in [RM -> {prepared, working, committed, aborted}]
/\ aState \in [RM -> [Acceptors -> [mbal: Ballot, bal: Ballot \union {-1},
val: {prepared, aborted, none} ]]]
/\ msgs \subseteq Messages
PCInit ==
/\ rmState = [r \in RM |-> working]
/\ aState = [r \in RM |-> [ac \in Acceptors |-> [mbal |-> 0, bal |-> -1,
val |-> none ]]]
/\ msgs = {}
Send(m) == msgs' = msgs \union {m}
RMPrepare(r) ==
/\ rmState[r] = working
/\ rmState' = [rmState EXCEPT ![r] = prepared]
/\ Send([type |-> phase2a, ins |-> r, bal |-> 0, val |-> prepared])
/\ UNCHANGED aState
RMChooseToAbort(r) ==
/\ rmState[r] = working
/\ rmState' = [rmState EXCEPT ![r] = aborted]
/\ Send([type |-> phase2a, ins |-> r, bal |-> 0, val |-> aborted])
/\ UNCHANGED aState
RMRcvCommitMsg(r) ==
/\ [type |-> commit] \in msgs
/\ rmState' = [rmState EXCEPT ![r] = committed]
/\ UNCHANGED <<aState, msgs>>
RMRcvAbortMsg(r) ==
/\ [type |-> abort] \in msgs
/\ rmState' = [rmState EXCEPT ![r] = aborted]
/\ UNCHANGED <<aState, msgs>>
Phase1a(bal, r) ==
/\ Send([type |-> phase1a, ins |-> r, bal |-> bal])
/\ UNCHANGED <<rmState, aState>>
Phase2a(bal, r) ==
/\ ~\E m \in msgs:
/\ m.type = phase2a
/\ m.bal = bal
/\ m.ins = r
/\ \E ms \in Majority:
LET
mset == {m \in msgs:
/\ m.type = phase1b
/\ m.ins = r
/\ m.mbal = bal
/\ m.acc \in ms}
maxbal == Max({m.bal: m \in mset})
val ==
IF maxbal = -1
THEN aborted
ELSE (CHOOSE m \in mset: m.bal = maxbal).val
IN
/\ \A ac \in ms: \E m \in mset: m.acc = ac
/\ Send([type |-> phase2a, ins |-> r, bal |-> bal, val |-> val])
/\ UNCHANGED <<rmState, aState>>
PCDecide ==
/\
LET
Decided(r, v) ==
\E b \in Ballot, ms \in Majority:
\A ac \in ms: [type |-> phase2b, ins |-> r, bal |-> b,
val |-> v, acc |-> ac] \in msgs
IN
\/ /\ \A r \in RM: Decided(r, prepared)
/\ Send([type |-> commit])
\/ /\ \E r \in RM: Decided(r, aborted)
/\ Send([type |-> abort])
/\ UNCHANGED<<rmState, aState>>
Phase1b(acc) ==
/\ \E m \in msgs:
/\ m.type = phase1a
/\ aState[m.ins][acc].mbal < m.bal
/\ aState' = [aState EXCEPT ![m.ins][acc].mbal = m.bal]
/\ Send([type |-> phase1b, ins |-> m.ins, mbal |-> m.bal,
bal |-> aState[m.ins][acc].bal, val |-> aState[m.ins][acc].val,
acc |-> acc])
/\ UNCHANGED rmState
Phase2b(acc) ==
/\ \E m \in msgs:
/\ m.type = phase2a
/\ aState[m.ins][acc].mbal <= m.bal
/\ aState' = [aState EXCEPT ![m.ins][acc].mbal = m.bal,
![m.ins][acc].bal = m.bal,
![m.ins][acc].val = m.val]
/\ Send([type |-> phase2b, ins |-> m.ins, bal |-> m.bal,
val |-> m.val, acc |-> acc])
/\ UNCHANGED rmState
PCNext ==
\/ \E r \in RM:
\/ RMPrepare(r)
\/ RMChooseToAbort(r)
\/ RMRcvCommitMsg(r)
\/ RMRcvAbortMsg(r)
\/ \E bal \in Ballot \ {0}, r \in RM: Phase1a(bal, r) \/ Phase2a(bal, r)
\/ PCDecide
\/ \E acc \in Acceptors: Phase1b(acc) \/ Phase2b(acc)
PCSpec ==
PCInit /\ [][PCNext]_<<rmState, aState, msgs>>
THEOREM PCSpec => PCTypeOK
=============================================================================
- 具体的
Paxos
算法描述和说明见 Wikipedia ,上述例子为完整的Paxos决议
并最终Commit
的抽象描述,用于验证算法中约束的正确性和安全性但不保证活性。网络丢包延迟重复之类的情况不显示表示(所有消息都留在msgs中,由程序选用)。 - Leader的选举被弱化,默认当一致性达成后则直接产生Leader并进行后续操作,Leader发出的消息均广播,详见
PCDecide
。 - RM有4种状态:
working
, prepared
, committed
, aborted
。状态初始化为working
,提案达成后的提交同Two-phase Commit
,详见PCDecide
:所有的RM都prepared
,则Commit
,有一个aborted
则Abort
。 - 当超时且未形成提案,发起新的一轮投票,同
Phase1a
。 - Acceptor收到含有phase1a的消息后,仅投票给最大bal,在发送给RM的消息中包含已被接受提案的bal和val,同
Phase1b
。 - 当某个bal被大多数Acceptor接受后,则通知所有Acceptor接受提案,同
Phase2a
。其中所选出的val要保证在bal下的安全性,即满足论文中的P2c
,也是算法的核心。 - 在
Phase2b
中,Acceptor接受最大bal的提案。
Ballot <- {0,1,2}
Majority <- {{a1, a2}, {a2, a3}, {a1, a3}}
Acceptors <- [ model value ] <symmetrical>{a1, a2, a3}
RM <- [ model value ] <symmetrical>{r1, r2}
prepared, aborted, none, phase1a, phase1b, phase2a, phase2b, commit, abort, working, committed <- [ model value ]
总结
- 最原始的Paxos是一种较为抽象的共识算法,涵盖了包括Proposer和Acceptor在任意时刻休眠或苏醒的情况,且只要时间足够那么最终定能达成一致,是分布式共识问题的一种相对完美的解决方案。但该算法仅能对单个提案达成共识,且相对偏理论,因此无法直接套用在现实问题中。
- 目前在工程实践上普遍强化了Leader的作用,如
Multi-Paxos
,Raft
。