1. 引言
BFT:Byzantine Fault Tolerant
SMR:State Machine Replication
- 1)PBFT:为BFT SMR的黄金标准,建议看Barbara Liskov 2001年 PBFT视频介绍。
- 2)Tendermint:在nodes中使用了peer-to-peer gossip协议的现代BFT算法。
- 3)SBFT:为基于PBFT,具有better scalability和best-case latency的BFT算法。
- 4)HotStuff:可提供linearity和responsiveness的BFT算法。LibraBFT就是基于HotStuff的。
以上四种算法都可以通过view-change协议来更换leader,主要区别在于:
- rotating leader:Tendermint和HotStuff是基于rotating leader的,leader会定期rotate更换。此时leader rotation (view-change) 是系统正常操作的一部分。
- stable leader:PBFT和SBFT都是基于stable leader的。除非发现了问题,否则leader不会更换,即leader可stay很多个commands/blocks。
2. Sawtooth中的PBFT
Sawtooth PBFT中采用的rotating leader机制,会定期更换primary。当secondary nodes判断当前primary 为faulty时,也会更换leader。
Sawtooth PBFT算法为:voting-based BFT算法。对于 n = 3 f + 1 n=3f+1 n=3f+1个节点,最多可容纳 f f f个错误节点。此外,经过PBFT共识的区块是固化的,即网络中不存在分叉。PBFT中至少需要4个节点来达成共识。
网络中的节点通过发送许多消息来达成共识、提交区块以及维护一个健康的leader node。
其中:
- primary node:是指当前的leader。
- Sawtooth定期切换新leader 的过程称为:view-change。当发现当前leader犯错时,也会通过view-change切换leader。
网络中的节点分工为:
- primary节点:构建并发布区块
- 其它节点(也称为secondary节点):对区块投票并监控当前leader的健康状况。
2.1 view changes
view:指某一node为primary所持续的时间。
view change:切换不同的primary node的过程。
在4节点网络中,node0在view0为primary,node1在view1为primary,node2在view2为primary,node3在view3为primary,从view4开始,又循环到node0。
2.2 sequence numbers
网络除了会跟随views moving,也会跟随a series of sequence numbers moving。在Sawtooth PBFT中,a node的sequence number等于链上下一个block的block number。如,sequence number 为10的节点,表示已committed block 9,当前正在evaluating block 10。
同时,在每个消息中包含了a sequence number,用于表示which block the message is for。如message中的sequence number为10,表明其对应block number 10。
2.3 information storage
每个节点需在其state中维护以下关键信息:
- List of PBFT member nodes in the network
- Current view number, which identifies the primary node
- Current sequence number, which is also the number of the block being processed
- The current head of the chain
- If in normal mode, the step of the algorithm it’s on
- Log of all blocks it has received
- Log of all messages it has received
2.4 message type
Sawtooth PBFT的消息类型有:
- PrePrepare:当primary node publish a new block时,有primary node发出。
- Prepare:由每个节点在Preparing阶段广播出。
- Commit:由每个节点在Committing阶段广播出。
- ViewChange:任何怀疑当前primary为faulty的节点都可发出。
- NewView:sent by the node that will be the new primary to complete a view change。
- Seal:proves that a block was committed after 2 f + 1 2f+1 2f+1 nodes agreed to commit it。
- SealRequest:sent by a node that is requesting a consensus seal for the block that was committed at a given sequence number。
当Sawtooth PBFT算法初始化成功后,通常有2种模式:
- normal mode:用于processing blocks。
- view changing mode:用于switch to a different primary node。
2.5 Sawtooth PBFT的normal mode
在normal mode中,节点会check blocks and approve them to be committed to the blockchain。除非需要进行view change,通常均工作在normal mode。
Sawtooth PBFT会在网络节点中以consensus engine方式运行,为独立的进程,可通过consensus API来handle consensus-related functionality and communicates with the validator。
下图中的v是validator,N1为primary node,N2/N3/N4为secondary nodes。
具体的流程为:
1)所有节点开始进入PrePreparing
阶段,该阶段的主要目的是:primary节点publish a new block,并endorse the block with a PrePrepare
message。
- primary node将发送一个request给其validator来initialize a new block。After a configurable timeout (Sawtooth中配置参数为
block_publishing_delay
,默认为1000ms),该primary node将发送a request to the validator to finalize the block and broadcast it to the network。 - After receiving the block in a
BlockNew
update and ensuring that the block is valid, all nodes will store the block in their PBFT logs。 - After receiving the
BlockNew
update, the primary will broadcast aPrePrepare
message for that block to all of the nodes in the network. When the nodes receive thisPrePrepare
message, they will make sure it is valid; if it is, they will add it to their respective logs and move on to thePreparing phase
.
2)在Preparing
阶段,除primary node之外的所有secondary nodes都将广播a Prepare
message that matches the accepted PrePrepare
message。每个节点都将其自己的Prepare
加到自己的log中,同时接收由其它节点发来的PrePrepare
messages,并将这些messages也添加到自己的log中。一旦该节点收到
2
f
+
1
2f+1
2f+1个匹配PrePrepare
message 的 Prepare
message,其将进入Committing
阶段。
3)在Committing
阶段,与Preparing
阶段类似,所有节点将广播Commit
message给网络中的所有节点,直到其收到
2
f
+
1
2f+1
2f+1个Commit
message,然后进入Finishing
阶段。Preparing
和Commiting
阶段最大的不同是,在Committing
阶段,primary node也可以对外广播消息。
4)一旦进入Finishing
阶段,每个节点都将告知其各自的validator to commit the block for which they have a matching PrePrepare
、
2
f
+
1
2f+1
2f+1个Prepare
messages 和
2
f
+
1
2f+1
2f+1个Commit
messages。每个node都将等待a BlockCommit
notification from its validator to signal that the block has been successfully committed to the chain。收到该confirmation之后,节点将更新其state:
- 将其sequence number 加 1。
- update its current chain head to the block that was just committed。
- 重置其状态为
PrePreparing
。
然后primary node将initialize a new block,并重复以上整个流程。
2.6 Sawtooth PBFT的 view changing mode
view change会切换一个不同的primary node。当有以下情况发生时,将触发view change:
- idle timeout expires——当一个node进入
PrePreparing
阶段时,其将开始idle timeout。当该节点收到a new block and a matchingPrePrepare
from the primary for its current sequence number before the timeout expires时,其将停止timeout计时;否则,当idle timeout expire时,其将发起a view change。 - commit timeout expires——当一个节点进入
Preparing
阶段时,其将开始commit timeout。若该节点可move on to theFinishing
phase and send a request to the validator to commit the block before the timeout expires时,其将停止timeout计时;否则,当commit timeout expire时,其将发起a view change。 - view change timeout expires——当节点开始a view change to view
v
,将开始view change timeout计时。若该节点可在view change timeout expires之前完成view change,则将停止计时;否则,其将发起a new view change to viewv+1
。 - 当收到来自于同一view的多个
PrePrepare
messages,对应相同的sequence number但是不同的blocks时,这以为这primary在作弊,该行为是无效的,可发起新的view change。 - 从primary节点收到
Prepare
message时,也意味着该primary在作弊,该行为是无效的,可发起新的view change。 - 从同一view中收到
f
+
1
f+1
f+1个
ViewChange
消息时,这可ensure that a node does not wait too long to start a view change; since only f f f nodes (at most) can be faulty at any given time, if more than f f f nodes decide to start a view change, other nodes can safely join them to perform that view change。
当发起a view change时,节点将做以下动作:
- 1)update its mode to
ViewChanging(v)
,其中v
为the view the node is changing to。 - 2)停止idle timeout和commit timeout计时,因为在view change之前已没必要。
- 3)若view change已启动,则停止view change timeout计时,后续启动后该值也将更新。
- 4)广播一个
ViewChange
message for the new view。
若满足以下条件,ViewChange
messages将被接收并添加到log中:
- 比节点当前view 更晚。
- 若节点处于
ViewChanging(v)
阶段,该view中的消息必须大于等于v
。
一旦节点接收到
2
f
+
1
2f+1
2f+1个ViewChange
messages for the new view时,其将开始view change timeout计时。该timeout可保证新的primary将及时启动新的view。(Sawtooth中的view change timeout值配置在view_change_duration
项中,初始值为5000ms。具体的计算公式为
(
D
e
s
i
r
e
d
V
i
e
w
N
u
m
b
e
r
−
C
u
r
r
e
n
t
V
i
e
w
N
u
m
b
e
r
)
∗
V
i
e
w
C
h
a
n
g
e
D
u
r
a
t
i
o
n
(DesiredViewNumber-CurrentViewNumber)*ViewChangeDuration
(DesiredViewNumber−CurrentViewNumber)∗ViewChangeDuration。)
当the primary for the new view收到
2
f
+
1
2f+1
2f+1个ViewChange
messages时,其将向网络广播a NewView
message,表示view change已完成。为了证明该view change是有效的,primary将在NewView
消息中包含从其他节点收到的已签名的
2
f
+
1
2f+1
2f+1个ViewChange
messages(其中也包含primary自己的“vote”),以便于其他节点进行验证。
若节点在view change timeout expires之前收到了新primary的valid NewView
message,其将做如下动作:
- 1)停止view change timeout计时。
- 2)update its view to match the new value。
- 3)切换回Normal mode。
若节点在view change timeout expire之前仍未收到NewView
,则其将停止timeout计时,然后发起新一轮的view change for view v+1
(where v
is the view it was attempting to change to before)。
参考资料
[1] What is the difference between PBFT, Tendermint, SBFT and HotStuff ?
[2] Sawtooth PBFT共识
[3] Introduction to PBFT
[4] Sawtooth PBFT共识