HotStuff-2phase详解

置顶 ganzr

已于 2024-11-27 10:41:18 修改

阅读量1.1k

点赞数 12

分类专栏： blockchain consensus algorithm 文章标签：共识算法

于 2024-11-27 10:40:52 首次发布

本文链接：https://blog.csdn.net/ganzr/article/details/144076516

版权

consensus algorithm 同时被 2 个专栏收录

3 篇文章

订阅专栏

blockchain

2 篇文章

订阅专栏

Background

why 2-phase?

在所有partial-synchronize条件下的leader-based BFT算法中，至少需要2-phase才能保证在leader-failure时的safety。第二个phase存在的本质就是为proposal提供一个lock机制。2-phase的情况下能够保证节点在commit一个proposal时，至少有f+1个non-faulty lock在该proposal，即节点在接受到phase-1的QC时，节点vote for phase-2 and lock on the proposal。

what is lock？在2-phase情况下，一个节点如果lock on在某个proposal，那么这个节点会在view-change之后，对new leader发出的new proposal，如果new proposal conflict with old proposal，则该节点会拒绝投票，除非该节点可以看到有至少f+1个节点也lock on 该new proposal

2-phase BFT

why not 1-phase?

在1-phase的情况下，节点只要收到一个proposal 的QC就会commit，这会导致其他没有收到该QC的节点给另一个conflicting proposal投票并构建conflicting QC，从而破坏了safety。

why 3-phase ?

2-phase中，lock机制带来的副作用就是在leader failed的情况下节点间需要同步lock，这个同步操作会影响responsiveness。HotStuff增加额外的一个phase来实现optimistic responsiveness。在view-change过程中，有可能出现prepareQC只有一个non-faulty持有，所以new leader必须等待足够长的时间来保证能够保证接收到该prepareQC，否则持有该prepareQC的non-faulty节点会一直lock，从而影响liveness。

what is responsive？ iff 一个分布式系统的事务处理延时只与真实的网络延迟有关时，该分布式系统是responsive的。例如在2-phase的hotstuff中，每次view-change，new leader都要等待一个预先设定的时间来确保收到了所有non-faulty的节点的high_qc，

3-phase HotStuff

在3-phase时，non-faulty节点在收到lockedQC时才会lock on the proposal，此时，至少有f+1个non-faulty节点会持有该proposal的prepareQC，所以new-leader只需要收集n-f个节点的high_qc就可以保证能够收到该proposal的prepareQC。

3-phase中，new-leader只需要收集n-f个high_qc就能保证同步到最高的lock 证明：如果一个节点lock，说明收到一个lockedQC，也就是n-f个节点收到了对应的proposalQC，也就是至少n-2f=f+1个non-faulty节点收到了propsalQC，此时收集的n-f个high_qc如果都不包含这f+1个节点的proposalQC，那么就至少需要有n-f+f+1=n+1个节点，这与只有n个节点矛盾，所以n-f个high_qc中至少包含一个最高的proposalQC

why not 3-phase？

可以看到，3-phase中，lock的条件被加强了，从原来收到prepareQC和vote for phase-2 到收到lockedQC。而所谓的commit，本质上就是节点判断n-f个节点lock on the proposal，而HotStuff就是用phase-3的投票构成的commitQC来实现这种判断。

对比2-phase可以看出，HotStuff其实是在prepareQC和commitQC中增加了一个lockedQC来保证lock能够在leader-failure时被获取到。

HotStuff-2的出发点就是，并不需要单独的lockedQC来保证new leader获取到最新的lock，因为最新的lock必然是上一个view中产生，所以如果new leader收到了上一个view的prepareQC，那么就可以确认自己收到了最新的lock，这也就解决了liveness的问题。如果没收到，则需要等待一个预设的最长延迟时间来保证最新的lock能够被接收到。

HotStuff-3本质上只要收到commitQC就能commit，但是为了保证足够的responsiveness，也就是某一个non-faulty结点不会长时间lock在一个proposal，hotstuff-3采用了必须QC的view必须连续的条件，因为连续，所以不可能有其他的分叉情况，故不会出现长期lock的情况。

what is the difference between lock and commit？

if 节点确定有n-f个节点lock on a proposal，这个节点才会commit该proposal。在2-phase中，节点接收到proposal的prepareQC并vote之后，节点lock on the proposal，在接收到phase-2的QC后，说明有n-f个节点lock on the proposal，节点commit the proposal；在3-phase中，节点在接收到proposal的lockedQC后（即phase-2的QC），节点lock on the proposal，在接收到phase-3的QC（即commit-QC)后，节点commit the proposal。

lock mechanism

PBFT —— replicas broadcasts and collects locks when leader-failed

Tendermint —— new leader wait for all locks from non-faulty replicas in view-change

DiemBFT-v4 —— replicas broadcasts and collects locks when leader-failed

HotStuff —— new leader waits for the first n-f locks in view-change

HotStuff-2 —— wait for lock from preceding view or wait maximal message delay in view-change

这些共识的lock机制如下：

2-phase语意下：prepareQC——phase-1的QC，commitQC——phase-2的QC

3-phase语意下：prepareQC——phase-1的QC，lockedQC——phase-2的QC，commitQC——phase-3的QC

protocol	when lock	when commit	sync lock	responsive	view-change complexity
PBFT	accept prepareQC and vote for prepareQC	accept commitQC	quadratic broadcast	yes	O(n2)
Tendermint	accept prepareQC and vote for prepareQC	accept commitQC	wait predetermined delay time	no	O(n)
DiemBFT-v4	accept prepareQC and vote for prepareQC	accept commitQC	quadratic broadcast when timeout	yes	O(n) normal
O(n2) timeout
HotStuff	accept lockedQC	accept commitQC	wait first n-f highest prepareQC	yes	O(n)
HotStuff-2	accept prepareQC and vote for prepareQC	accept commitQC	wait util lock from last view or maximal delay	optimistic	O(n)

Properties

P0: 2-phase view regime
P1: optimistically no wait over sequences of decisions(optimistic responsiveness)
P2: optimistically linear communication
P3: balanced communication load over sequences of decisions
P4:O(n2) worst-case communication

Name	feature	property	view-change complexity	authenticator complexity（correct leader/leader failure）
PBFT	2-phase replace leader when failed extra protocol for view-change	P0,P1	O(n2)	O(n2)/O(n3)
Tendermint	2-phase leader-rotation maximal network delay in view-change	P0,P2,P3	O(n)	O(n)/O(n)
HotStuff	3-phase leader-rotation wait for the first n-f high_qc in view-change	P1,P2,P3,P4	O(n)	O(n)/O(n)
DiemBFT-v4	2-phase leader-rotation extra protocol for timeout(quadratic view-change when timeout)	P0,P1,P2,P3	O(n2)	O(n)/O(n)
HotStuff-2	2-phase leader-rotation maximal network delay in timeout	P0,P1,P2,P3,P4	O(n)	O(n)/O(n)

Reference

【1】Eli Gafni. Round-by-round fault detectors (extended abstract): unifying synchrony and asynchrony. In Proceedings of the Seventeenth Annual ACM Symposium on Principles of Distributed Computing (PODC ’98), 1998. https://dl.acm.org/doi/pdf/10.1145/277697.277724

【2】https://developers.diem.com/papers/diem-consensus-state-machine-replication-in-the-diem-blockchain/2021-08-17.pdf

【3】https://arxiv.org/pdf/1803.05069.pdf

【4】What is the difference between PBFT, Tendermint, HotStuff, and HotStuff-2?