【论文阅读】 The Honey Badger of BFT Protocols

  论文标题:BFT协议中的蜜糖獾



The surprising success of cryptocurrencies has led to a surge of interest in deploying large scale, highly robust, Byzantine fault tolerant (BFT) protocols for mission-critical applications, such as financial transactions.


Although the conventional wisdom is to build atop a (weakly) synchronous protocol such as PBFT (or a variation thereof), such protocols rely critically on network timing assumptions, and only guarantee liveness when the network behaves as expected.


We argue these protocols are ill-suited for this deployment scenario.


We present an alternative, HoneyBadgerBFT, the first practical asynchronous BFT protocol, which guarantees liveness without making any timing assumptions.


We base our solution on a novel atomic broadcast protocol that achieves optimal asymptotic efficiency.


We present an implementation and experimental results to show our system can achieve throughput of tens of thousands of transactions per second, and scales to over a hundred nodes on a wide area network.


We even conduct BFT experiments over Tor, without needing to tune any parameters.


Unlike the alternatives, HoneyBadgerBFT simply does not care about the underlying network.

1. 介绍


Distributed fault tolerant protocols are promising solutions for mission-critical infrastructure, such as financial transaction databases.


Traditionally, they have been deployed at relatively small scale, and typically in a single administrative domain where adversarial attacks might not be a primary concern.


As a representative example, a deployment of Google’s fault tolerant lock service, Chubby [14], consists of five nodes, and tolerates up to two crash faults.


In recent years, a new embodiment of distributed systems called “cryptocurrencies” or “blockchains” have emerged, beginning with Bitcoin’s phenomenal success [43].


Such cryptocurrency systems represent a surprising and effective breakthrough [12], and open a new chapter in our understanding of distributed systems.


Cryptocurrency systems challenge our traditional belief about the deployment environment for fault tolerance protocols.


Unlike the classic “5 Chubby nodes within Google” environment, cryptocurrencies have revealed and stimulated a new demand for consensus protocols over a wide area network, among a large number of nodes that are mutually distrustful, and moreover, network connections can be much more unpredictable than the classical LAN setting, or even adversarial.


This new setting poses interesting new challenges, and calls upon us to rethink the design of fault tolerant protocols.


Robustness is a first-class citizen.


Cryptocurrencies demonstrate the demand for and viability of an unusual operating point that prioritizes robustness above all else, even at the expense of performance.


In fact, Bitcoin provides terrible performance by distributed systems standards: a transaction takes on average 10 minutes to be committed, and the system as a whole achieves throughput on the order of 10 transactions per second.


However, in comparison with traditional fault tolerant deployment scenarios, cryptocurrencies thrive in a highly adversarial environment, where well-motivated and malicious attacks are expected (if not commonplace).


For this reason, many of Bitcoin’s enthusiastic supporters refer to it as the “Honey Badger of Money” [41].

我们注意到,对健壮性的需求往往与对分散化的需求密切相关- -因为分散化通常需要广域网络中大量不同的参与者的参与。

We note that the demand for robustness is often closely related to the demand for decentralization — since decentralization would typically require the participation of a large number of diverse participants in a wide-area network.


Favor throughput over latency.


Most existing works on scalable fault tolerance protocols [6, 49] focus on optimizing scalability in a LAN environment controlled by a single administrative domain.


Since bandwidth provisioning is ample, these works often focus on reducing (cryptographic) computations and minimizing response time while under contention (i.e., requests competing for the same object).


In contrast, blockchains have stirred interest in a class of financial applications where response time and contention are not the most critical factors, e.g., payment and settlement networks [1].


In fact, some financial applications intentionally introduce delays in committing transactions to allow for possible rollback/chargeback operations.


Although these applications are not latency critical, banks and financial institutions have expressed interest in a high-throughput alternative of the blockchain technology, to be able to sustain high volumes of requests.

例如,Visa平均处理2,000 tx/sec,峰值为59,000 tx/sec[1]。

For example, the Visa processes 2,000 tx/sec on average, with a peak of 59,000 tx/sec [1].


1.1 Our Contributions


Timing assumptions considered harmful.


Most existing Byzantine fault tolerant (BFT) systems, even those called “robust,” assume some variation of weak synchrony, where, roughly speaking, messages are guaranteed to be delivered after a certain bound ∆, but ∆ may be time-varying or unknown to the protocol designer.


We argue that protocols based on timing assumptions are unsuitable for decentralized, cryptocurrency settings, where network links can be unreliable, network speeds change rapidly, and network delays may even be adversarially induced.


First, the liveness properties of weakly synchronous protocols can fail completely when the expected timing assumptions are violated (e.g., due to a malicious network adversary).


To demonstrate this, we explicitly construct an adversarial “intermittently synchronous” network that violates the assumptions, such that existing weakly synchronous protocols such as PBFT [20] would grind to a halt (Section 3).


Second, even when the weak synchrony assumptions are satisfied in practice, weakly synchronous protocols degrade significantly in throughput when the underlying network is unpredictable.


Ideally, we would like a protocol whose throughput closely tracks the network’s performance even under rapidly changing network conditions.


Unfortunately, weakly asynchronous protocols require timeout parameters that are finicky to tune, especially in cryptocurrency application settings; and when the chosen timeout values are either too long or too short, throughput can be hampered.


As a concrete example, we show that even when the weak synchrony assumptions are satisfied, such protocols are slow to recover from transient network partitions (Section 3).


Practical asynchronous BFT.


We propose HoneyBadgerBFT, the first BFT atomic broadcast protocol to provide optimal asymptotic efficiency in the asynchronous setting.


We therefore directly refute the prevailing wisdom that such protocols a re necessarily impractical.

由于Cachin et al[15],我们对已知的最好的异步原子广播协议进行了显著的效率改进,该协议要求每个节点对每个提交的事务传输O(N2)位,基本上限制了除最小网络外的所有网络的吞吐量。

We make significant efficiency improvements on the best priorknown asynchronous atomic broadcast protocol, due to Cachin et al [15], which requires each node to transmit O(N2) bits for each committed transaction, substantially limiting its throughput for all but the smallest networks.


This inefficiency has two root causes.


The first cause is redundant work among the parties.


However, a naïve attempt to eliminate the redundancy compromises the fairness property, and allows for targeted censorship attacks.


We invent a novel solution to overcome this problem by using threshold publickey encryption to prevent these attacks.


The second cause is the use of a suboptimal instantiation of the Asynchronous Common Subset (ACS) subcomponent.


We show how to efficiently instantiate ACS by combining existing but overlooked techniques: efficient reliable broadcast using erasure codes [18], and a reduction from ACS to reliable broadcast from the multi-party computation literature [9].


HoneyBadgerBFT’s design is optimized for a cryptocurrencylike deployment scenario where network bandwidth is the scarce resource, but computation is relatively ample.


This allows us to take advantage of cryptographic building blocks (in particular, threshold public-key encryption) that would be considered too expensive in a classical fault-tolerant database setting where the primary goal is to minimize response time even under contention.


In an asynchronous network, messages are eventually delivered but no other timing assumption is made.


Unlike existing weakly synchronous protocols where parameter tuning can be finicky, HoneyBadgerBFT does not care.


Regardless of how network conditions fluctuate, HoneyBadgerBFT’s throughput always closely tracks the network’s available bandwidth.


Imprecisely speaking, HoneyBadgerBFT eventually makes progress as long as messages eventually get delivered; moreover, it makes progress as soon as messages are delivered.


We formally prove the security and liveness of our HoneyBadgerBFT protocol, and show experimentally that it provides better throughput than the classical PBFT protocol [20] even in the optimistic case.


Implementation and large-scale experiments. We provide a fullfledged implementation of HoneyBadgerBFT, which will we release as free open source software in the near future.1We demonstrate experimental results from an Amazon AWS deployment with more than 100 nodes distributed across 5 continents. To demonstrate its versatility and robustness, we also deployed HoneyBadgerBFT over the Tor anonymous relay network without changing any parameters, and present throughput and latency results.


1.2 Suggested Deployment Scenarios


Among numerous conceivable applications, we highlight two likely deployment scenarios that are sought after by banks, financial institutions, and advocates for fully decentralized cryptocurrencies.


Confederation cryptocurrencies.


The success of decentralized cryptocurrencies such as Bitcoin has inspired banks and financial institutions to inspect their transaction processing and settlement infrastructure with a new light.


“Confederation cryptocurrency” is an oft-cited vision [24, 25, 47], where a conglomerate of financial institutions jointly contribute to a Byzantine agreement protocol to allow fast and robust settlement of transactions.


Passions are running high that this approach will streamline today’s slow and clunky infrastructure for inter-bank settlement.


As a result, several new open source projects aim to build a suitable BFT protocol for this setting, such as IBM’s Open Blockchain and the Hyperledger project [40].


A confederation cryptocurrency would require a BFT protocol deployed over the wide-area network, possibly involving hundreds to thousands of consensus nodes.


In this setting, enrollment can easily be controlled, such that the set of consensus nodes are known a priori — often referred to as the “permissioned” blockchain.


Clearly HoneyBadgerBFT is a natural candidate for use in such confederation cryptocurrencies.


Applicability to permissionless blockchains.


By contrast, decentralized cryptocurrencies such as Bitcoin and Ethereum opt for a “permissionless” blockchain, where enrollment is open to anyone, and nodes may join and leave dynamically and frequently.

为了在这种设置下实现安全性,已知的共识协议依赖于工作量证明来击败Sybil攻击,并在吞吐量和延迟方面付出了巨大的代价,例如,比特币每10分钟提交一次交易,其吞吐量限制为7 tx/秒,即使在当前块大小最大化的情况下。

To achieve security in this setting, known consensus protocols rely on proofs-of-work to defeat Sybil attacks, and pay an enormous price in terms of throughput and latency, e.g., Bitcoin commits transactions every ∼ 10 min, and its throughput limited by 7 tx/sec even when the current block size is maximized.


Several recent works have suggested the promising idea of leveraging either a slower, external blockchain such as Bitcoin or economic “proof-of-stake” assumptions involving the underlying currency itself [32, 32, 35, 37] to bootstrap faster BFT protocols, by selecting a random committee to perform BFT in every different epoch.


These approaches promise to achieve the best of both worlds, security in an open enrollment, decentralized network, and the throughput and response time matching classical BFT protocols.


Here too HoneyBadgerBFT is a natural choice since the randomly selected committee can be geographically heterogeneous.

2. 背景和相关工作



Our overall goal is to build a replicated state machine, where clients generate and submit transactions and a network of nodes receives and processes them.


Abstracting away from application specific details (such as how to represent state and compute transitions), it suffices to build a totally globally-consistent, totallyordered, append-only transaction log.


Traditionally, such a primitive is called total order or atomic broadcast [23]; in Bitcoin parlance, we would call it a blockchain.


Fault tolerant state machine replication protocols provide strong safety and liveness guarantees, allowing a distributed system to provide correct service in spite of network latency and the failure of some nodes.


A vast body of work has studied such protocols, offering different performance tradeoffs, tolerating different forms of failures and attacks, and making varying assumptions about the underlying network.


We explain below the most closely related efforts to ours.


2.1 Robust BFT Protocols


While Paxos [36], Raft [45], and many other well-known protocols tolerate crash faults, Byzantine fault tolerant protocols (BFT), beginning with PBFT [20], tolerate even arbitrary (e.g., maliciously) corrupted nodes.


Many subsequent protocols offer improved performance, often through optimistic execution that provides excellent performance when there are no faults, clients do not contend much, and the network is well-behaved, and at least some progress otherwise [2, 5, 33, 39, 51].


In general, BFT systems are evaluated in deployment scenarios where latency and CPU are the bottleneck [49], thus the most effective protocols reduce the number of rounds and minimize expensive cryptographic operations.


Clement et al [22] initiated a recent line of work [4, 6, 10, 21, 22, 50] by advocating improvement of the worst-case performance, providing service quality guarantees even when the system is under attack — even if this comes at the expense of performance in the optimistic case.


However, although the “Robust BFT” protocols in this vein gracefully tolerate compromised nodes, they still rely on timing assumptions about the underlying network.


Our work takes this approach further, guaranteeing good throughput even in a fully asynchronous network.

2.2 随机的协议。

2.2 Randomized Agreement


Deterministic asynchronous protocols are impossible for most tasks [27].


While the vast majority of practical BFT protocols steer clear of this impossibility result by making timing assumptions, randomness (and, in particular, cryptography) provides an alternative route.


Indeed we know of asynchronous BFT protocols for a variety of tasks such as binary agreement (ABA), reliable broadcast (RBC), and more [13, 15, 16].


Our work is most closely related to SINTRA [17], a system implementation based on the asynchronous atomic broadcast protocol from Cachin et al (CKPS01) [15].

该协议由原子广播协议(ABC)简化为公共子集协议(ACS), ACS简化为多值验证协议(MVBA)。

This protocol consists of a reduction from atomic broadcast (ABC) to common subset agreement (ACS), as well as a reduction from ACS to multi-value validated agreement (MVBA).


The key invention we contribute is a novel reduction from ABC to ACS that provides better efficiency (by an O(N) factor) through batching, while using threshold encryption to preserve censorship resilience (see Section 4.4).


We also obtain better efficiency by cherry-picking from the literature improved instantiations of subcomponents.


In particular, we sidestep the expensive MVBA primitive by using an alternative ACS [9] along with an efficient RBC [18] as explained in Section 4.4.

Table 1 summarizes the asymptotic performance of HoneyBadgerBFT with several other atomic broadcast protocols. Here “Comm.compl.” denotes the expected communication complexity (i.e., total bytes transferred) per committed transaction. Since PBFT relies on weak synchrony assumptions, it may therefore fail to make progress at all in an asynchronous network. Protocols KS02 [34] and RC05 [46] are optimistic, falling back to an expensive recovery mode based on MVBA. As mentioned the protocol of Cachin et al (CKPS01) [15] can be improved using a more efficient ACS construction [9, 18]. We also obtain another O(N) improvement through our novel reduction.

表1总结了HoneyBadgerBFT与其他几个原子广播协议的渐近性能。这里 “Comm.compl.”表示每个提交的事务的预期通信复杂度(即传输的总字节数)。由于PBFT依赖于较弱的同步假设,因此在异步网络中它可能根本无法取得进展。协议KS02[34]和RC05[46]是乐观的,回落到基于MVBA的昂贵恢复模式。如前所述,Cachin等人(CKPS01)的[15]协议可以使用更有效的ACS结构进行改进[9,18]。通过我们的新还原,我们还获得了另一个O(N)的改进。


Finally, King and Saia [30,31] have recently developed agreement protocols with less-than-quadratic number of messages by routing communications over a sparse graph.


However, extending these results to the asynchronous setting remains an open problem.


Table 1: Asymptotic communication complexity (bits per transaction, expected) for atomic broadcast protocols




Almost all modern BFT protocols rely on timing assumptions (such as partial or weak synchrony) to guarantee liveness.


Purely asynchronous BFT protocols have received considerably less attention in recent years.


Consider the following argument, which, if it held, would justify this narrowed focus: [X] Weak synchrony assumptions are unavoidable, since in any network that violates these assumptions, even asynchronous protocols would provide unacceptable performance.


In this section, we present make two counterarguments that refute the premise above.


First, we illustrate the theoretical separation between the asynchronous and weakly synchronous network models.


Specifically we construct an adversarial network scheduler that violates PBFT’s weak synchrony assumption (and indeed causes it to fail) but under which any purely asynchronous protocol (such as HoneyBadgerBFT) makes good progress.


Second, we make a practical observation: even when their assumptions are met, weakly synchronous protocols are slow to recover from a network partition once it heals, whereas asynchronous protocols make progress as soon as messages are delivered.


3.1 Many Forms of Timing Assumptions


Before proceeding we review the various standard forms of timing assumptions.


In an asynchronous network, the adversary can deliver messages in any order and at any time, but nonetheless must eventually deliver every message sent between correct nodes.


Nodes in an asynchronous network effectively have no use for “real time” clocks, and can only take actions based on the ordering of messages they receive.


The well-known FLP [27] result rules out the possibility of deterministic asynchronous protocols for atomic broadcast and many other tasks.


A deterministic protocol must therefore make some stronger timing assumptions.


A convenient (but very strong) network assumption is synchrony: a ∆-synchronous network guarantees that every message sent is delivered after at most a delay of ∆ (where ∆ is a measure of real time).


Weaker timing assumptions come in several forms.


In the unknown-∆ model, the protocol is unable to use the delay bound as a parameter.


Alternatively, in the eventually synchronous model, the message delay bound ∆ is only guaranteed to hold after some (unknown) instant, called the “Global Stabilization Time.


” Collectively, these two models are referred to as partial synchrony [26].


Yet another variation is weak synchrony [26], in which the delay bound is time varying, but eventually does not grow faster than a polynomial function of time [20].


In terms of feasibility, the above are equivalent — a protocol that succeeds in one setting can be systematically adapted for another.


In terms of concrete performance, however, adjusting for weak synchrony means gradually increasing the timeout parameter over time (e.g., by an “exponential back-off” policy).


As we show later, this results in delays when recovering from transient network partitions.


Protocols typically manifest these assumptions in the form of a timeout event.


For example, if parties detect that no progress has been made within a certain interval, then they take a corrective action such as electing a new leader.


Asynchronous protocols do not rely on timers, and make progress whenever messages are delivered, regardless of actual clock time.


Counting rounds in asynchronous networks.


Although the guarantee of eventual delivery is decoupled from notions of “real time,” it is nonetheless desirable to characterize the running time of asynchronous protocols.

标准的方法(如Canetti和Rabin[19]所解释的)是对手给每个消息分配一个虚拟整数,但必须在发送任何(r + 1)-消息之前在正确节点之间传递每个(r−1)-消息。

The standard approach (e.g., as explained by Canetti and Rabin [19]) is for the adversary to assign each message a virtual round number, subject to the condition that every (r − 1)-message between correct nodes must be delivered before any (r + 1)-message is sent.

3.2 当弱同步失败

3.2 When Weak Synchrony Fails


We now proceed to describe why weakly synchronous BFT protocols can fail (or suffer from performance degradation) when network conditions are adversarial (or unpredictable).


This motivates why such protocols are unsuited for the cryptocurrency-oriented application scenarios described in Section 1.


A network scheduler that thwarts PBFT.


We use Practical Byzantine Fault Tolerance (PBFT) [20], the classic leader-based BFT protocol, a representative example to describe how an adversarial network scheduler can cause a class of leader-based BFT protocols [4, 6, 10, 22, 33, 50] to grind to a halt.


At any given time, the designated leader is responsible for proposing the next batch of transactions.


If progress isn’t made, either because the leader is faulty or because the network has stalled, then the nodes attempt to elect a new leader.


The PBFT protocol critically relies on a weakly synchronous network for liveness.


We construct an adversarial scheduler that violates this assumption, and indeed prevents PBFT from making any progress at all, but for which HoneyBadgerBFT (and, in fact, any asynchronous protocol) performs well.


It is unsurprising that a protocol based on timing assumptions fails when those assumptions are violated; however, demonstrating an explicit attack helps motivate our asynchronous construction.


The intuition behind our scheduler is simple.


First, we assume that a single node has crashed.


Then, the network delays messages whenever a correct node is the leader, preventing progress and causing the next node in round-robin order to become the new leader.


When the crashed node is the next up to become the leader, the scheduler immediately heals the network partition and delivers messages very rapidly among the honest nodes; however, since the leader has crashed, no progress is made here either.


This attack violates the weak synchrony assumption because it must delay messages for longer and longer each cycle, since PBFT widens its timeout interval after each failed leader election.


On the other hand, it provides larger and larger periods of synchrony as well.


However, since these periods of synchrony occur at inconvenient times, PBFT is unable to make use of them.


Looking ahead, HoneyBadgerBFT, and indeed any asynchronous protocol, would be able to make progress during these opportunistic periods of synchrony.

为了证实我们的分析,我们实现了这个恶意调度器作为一个代理,它拦截和延迟所有视图更改消息到新的leader,并在一个1200行的Python PBFT实现上测试它。

To confirm our analysis, we implemented this malicious scheduler as a proxy that intercepted and delayed all view change messages to the new leader, and tested it against a 1200 line Python implementation of PBFT.


The results and message logs we observed were consistent with the above analysis; our replicas became stuck in a loop requesting view changes that never succeeded.


In the Ap pendix A we give a complete description of PBFT and explain how it behaves under this attack.


Slow recovery from network partitions.


Even if the weak synchrony assumption is eventually satisfied, protocols that rely on it may also be slow to recover from transient network partitions.


Consider the following scenario, which is simply a finite prefix of the attack described above: one node is crashed, and the network is temporarily partitioned for a duration of 2D∆.


Our scheduler heals the network partition precisely when it is the crashed node’s turn to become leader.


Since the timeout interval at this point is now 2D+1∆, the protocol must wait for another 2D+1∆ interval before beginning to elect a new leader, despite that the network is synchronous during this interval.


The tradeoff between robustness and responsiveness. Such behaviors we observe above are not specific to PBFT, but rather are fundamentally inherent to protocols that rely on timeouts to cope with crashes. Regardless of the protocol variant, a practitioner must tune their timeout policy according to some tradeoff. At one extreme (eventual synchrony), the practitioner makes a specific estimate about the network delay ∆. If the estimate is too low, then the system may make no progress at all; too high, and it does not utilize the available bandwidth. At the other extreme (weak synchrony), the practitioner avoids specifying any absolute delay, but nonetheless must choose a “gain” that affects how quickly the system tracks varying conditions. An asynchronous protocol avoids the need to tune such parameters.

4. HoneyBadgerBFT协议

  1. THE HoneyBadgerBFT PROTOCOL


In this section we present HoneyBadgerBFT, the first asynchronous atomic broadcast protocol to achieve optimal asymptotic efficiency.


4.1 Problem Definition: Atomic Broadcast


We first define our network model and the atomic broadcast problem.


Our setting involves a network of N designated nodes, with distinct well-known identities (P0 through PN−1).


The nodes receive transactions as input, and their goal is to reach common agreement on an ordering of these transactions.


Our model particularly matches the deployment scenario of a “permissioned blockchain” where transactions can be submitted by arbitrary clients, but the nodes responsible for carrying out the protocol are fixed.


The atomic broadcast primitive allows us to abstract away any application-specific details, such as how transactions are to be interpreted (to prevent replay attacks, for example, an application might define a transaction to include signatures and sequence numbers).


For our purposes, transactions are simply unique strings.


In practice, clients would generate transactions and send them to all of the nodes, and consider them committed after collecting signatures from a majority of nodes.


To simplify our presentation, we do not explicitly model clients, but rather assume that transactions are chosen by the adversary and provided as input to the nodes.


Likewise, a transaction is considered committed once it is output by a node.


Our system model makes the following assumptions:


(Purely asynchronous network) We assume each pair of nodes is connected by a reliable authenticated point-to-point channel that does not drop messages.2The delivery schedule is entirely determined by the adversary, but every message sent between correct nodes must eventually be delivered. We will be interested in characterizing the running time of protocols based on the number of asynchronous rounds (as described in Section 2). As the network may queue messages with arbitrary delay, we also assume nodes have unbounded buffers and are able to process all the messages they receive.


(Static Byzantine faults) The adversary is given complete control of up to f faulty nodes, where f is a protocol parameter.

注意,在此设置中,3 f + 1≤N(我们的协议实现)是广播协议的下限。

Note that 3 f + 1 ≤ N (which our protocol achieves) is the lower bound for broadcast protocols in this setting.


(Trusted setup) For ease of presentation, we assume that nodes may interact with a trusted dealer during an initial protocolspecific setup phase, which we will use to establish public keys and secret shares.

注意,在实际部署中,如果实际的受信任方不可用,则可以使用分布式密钥生成协议(c.f, Boldyreva[11])。

Note that in a real deployment, if an actual trusted party is unavailable, then a distributed key generation protocol could be used instead (c.f., Boldyreva [11]).


All the distributed key generation protocols we know of rely on timing assumptions; fortunately these assumptions need only to hold during setup.

定义1。原子广播协议必须满足以下特性,所有这些特性在异步网络中(作为安全参数λ的1 - negl(λ)函数)都应具有高概率(λ),尽管有任意对手:

DEFINITION 1. An atomic broadcast protocol must satisfy the following properties, all of which should hold with high probability (as a function 1 − negl(λ) of a security parameter , λ) in an asynchronous network and in spite of an arbitrary adversary:


•(总订单)如果一个正确的节点输出了交易的顺序htx0,tx1,…Tx ji和另一个有输出htx00,tx01,…当I≤min(j, j0)时,则txi = tx0i。


• (Agreement) If any correct node outputs a transaction tx, then every correct node outputs tx.

• (Total Order) If one correct node has output the sequence of transactions htx0,tx1, …tx ji and another has output htx00,tx01, …tx0j0i, then txi = tx0i for i ≤ min( j, j0).

• (Censorship Resilience) If a transaction tx is input to N − f correct nodes, then it is eventually output by every correct node.




The censorship resilience property is a liveness property that prevents an adversary from blocking even a single transaction from being committed. This property has been referred to by other names, for example “fairness” by Cachin et al [15], but we prefer this more descriptive phrase.

Performance metrics. We will primarily be interested in analyzing the efficiency and transaction delay of our atomic broadcast protocol.

• (Efficiency) Assume that the input buffers of each honest node are sufficiently full Ω(poly(N,λ)). Then efficiency is the expected communication cost for each node amortized over all committed transactions.



Since each node must output each transaction, O(1) efficiency (which our protocol achieves) is asymptotically optimal. The above definition of efficiency assumes the network is under load, reflecting our primary goal: to sustain high throughput while fully utilizing the network’s available bandwidth. Since we achieve good throughput by batching, our system uses more bandwidth per committed transaction during periods of low demand when transactions arrive infrequently. A stronger definition without this qualification would be appropriate if our goal was to minimize costs (e.g., for usage-based billing).

In practice, network links have limited capacity, and if more transactions are submitted than the network can handle, a guarantee on confirmation time cannot hold in general. Therefore we define transaction delay below relative to the number of transactions that have been input ahead of the transaction in question. A finite transaction delay implies censorship resilience.


(Transaction delay) Suppose an adversary passes a transaction tx as input to N − f correct nodes. Let T be the “backlog”, i.e. the difference between the total number of transactions previously input to any correct node and the number of transactions that have been committed. Then transaction delay is the expected number of asynchronous rounds before tx is output by every correct node as a function of T .


4.2 Overview and Intuition


In HoneyBadgerBFT, nodes receive transactions as input and store them in their (unbounded) buffers.


The protocol proceeds in epochs, where after each epoch, a new batch of transactions is appended to the committed log.


At the beginning of each epoch, nodes choose a subset of the transactions in their buffer (by a policy we will define shortly), and provide them as input to an instance of a randomized agreement protocol.


At the end of the agreement protocol, the final set of transactions for this epoch is chosen.


At this high level, our approach is similar to existing asynchronous atomic broadcast protocols, and in particular to Cachin et al [15], the basis for a large scale transaction processing system (SINTRA).


Like ours, Cachin’s protocol is centered around an instance of the Asynchronous Common Subset (ACS) primitive.


Roughly speaking, the ACS primitive allows each node to propose a value, and guarantees that every node outputs a common vector containing the input values of at least N − 2 f correct nodes.


It is trivial to build atomic broadcast from this primitive — each node simply proposes a subset of transactions from the front its queue, and outputs the union of the elements in the agreed-upon vector.


However, there are two important challenges.


Challenge 1: Achieving censorship resilience.


The cost of ACS depends directly on size of the transaction sets proposed by each node.


Since the output vector contains at least N − f such sets, we can therefore improve the overall efficiency by ensuring that nodes propose mostly disjoint sets of transactions, thus committing more distinct transactions in one batch for the same cost.


Therefore instead of simply choosing the first element(s) from its buffer (as in CKPS01 [15]), each node in our protocol proposes a randomly chosen sample, such that each transaction is, on average, proposed by only one node.


However, implemented naïvely, this optimization would compromise censorship resilience, since the ACS primitive allows the adversary to choose which nodes’ proposals are ultimately included.


The adversary could selectively censor a transaction excluding whichever node(s) propose it.


We avoid this pitfall by using threshold encryption, which prevents the adversary from learning which transactions are proposed by which nodes, until after agreement is already reached.


The full protocol will be described in Section 4.3.


Challenge 2: Practical throughput.


Although the theoretical feasibility of asynchronous ACS and atomic broadcast have been known [9, 15, 17], their practical performance is not.

据我们所知,唯一实现ACS的其他工作是由Cachin和Portiz[17]完成的,他们表明他们可以在广域网上实现0.4 tx/sec的吞吐量。

To the best of our knowledge, the only other work that implemented ACS was by Cachin and Portiz [17], who showed that they could attain a throughput of 0.4 tx/sec over a wide area network.


Therefore, an interesting question is whether such protocols can attain high throughput in practice.


In this paper, we show that by stitching together a carefully chosen array of sub-components, we can efficiently instantiate ACS and attain much greater throughput both asymptotically and in practice.


Notably, we improve the asymptotic cost (per node) of ACS from O(N2) (as in Cachin et al [15, 17]) to O(1).


Since the components we cherry-pick have not been presented together before (to our knowledge), we provide a self-contained description of the whole construction in Section 4.4.


Modular protocol composition.


We are now ready to present our constructions formally.


Before doing so, we make a remark about the style of our presentation.


We define our protocols in a modular style, where each protocol may run several instances of other (sub)protocols.


The outer protocol can provide input to and receive output from the subprotocol.


A node may begin executing a (sub)protocol even before providing it input (e.g., if it receives messages from other nodes).


It is essential to isolate such (sub)protocol instances to ensure that messages pertaining to one instance cannot be replayed in another.


This is achieved in practice by associating to each (sub)protocol instance a unique string (a session identifier), tagging any messages sent or received in this (sub)protocol with this identifier, and routing messages accordingly.


We suppress these message tags in our protocol descriptions for ease of reading.


We use brackets to distinguish between tagged instances of a subprotocol.


For example, RBC[i] denotes an ith instance of the RBC subprotocol.


We implicitly assume that asynchronous communications between parties are over authenticated asynchronous channels.


In reality, such channels could be instantiated using TLS sockets, for example, as we discuss in Section 5.


To distinguish different message types sent between parties within a protocol, we use a label in typewriter font (e.g., VAL(m) indicates a message m of type VAL).


4.3 Constructing HoneyBadgerBFT from Asynchronous Common Subset


Building block: ACS.


Our main building block is a primitive called asynchronous common subset (ACS).


The theoretical feasibility of constructing ACS has been demonstrated in several works [9, 15].


In this section, we will present the formal definition of ACS and use it as a blackbox to construct HoneyBadgerBFT.

在后面的4.4节中,我们将展示通过结合过去有些被忽略的几个结构,我们可以有效地实例化ACS !。

Later in Section 4.4, we will show that by combining several constructions that were somewhat overlooked in the past, we can instantiate ACS efficiently!

•(V有效性)如果一个正确节点输出集合V,那么| V |≥N−f,并且V包含至少N−2个正确节点的输入。



More formally, an ACS protocol satisfies the following properties: • (V alidity) If a correct node outputs a set v, then |v| ≥ N − f and v contains the inputs of at least N − 2 f correct nodes.

• (Agreement) If a correct node outputs v, then every node outputs v.

• (Totality) If N − f correct nodes receive an input, then all correct nodes produce an output.

构建块:阈值加密。阈值加密方案TPKE是一种加密原语,允许任何一方将消息加密到一个主公钥,这样网络节点必须协同工作来解密它。一旦f + 1个正确节点计算并揭示密文的解密共享,就可以恢复明文;在至少一个正确的节点揭示其解密共享之前,攻击者对明文一无所知。阈值方案提供以下接口:

Building block: threshold encryption. A threshold encryption scheme TPKE is a cryptographic primitive that allows any party to encrypt a message to a master public key, such that the network nodes must work together to decrypt it. Once f + 1 correct nodes compute and reveal decryption shares for a ciphertext, the plaintext can be recovered; until at least one correct node reveals its decryption share, the attacker learns nothing about the plaintext. A threshold scheme provides the following interface:

TPKE。设置(1λ)→PK,{滑雪}生成一个公共密钥PK,连同每个政党滑雪•TPKE.Enc密钥(PK, m)→C m•TPKE.DecShare加密消息(滑雪,C)→σ我产生第i个分享的解密(或⊥如果C是畸形)•TPKE.Dec (PK C{我σ})→m结合一组解密股票{我σ}至少f + 1党获得明文m(或者,如果C包含无效的股票,然后确认无效的股票)。

TPKE.Setup(1λ ) → PK,{SKi} generates a public encryption key PK, along with secret keys for each party SKi • TPKE.Enc(PK,m) → C encrypts a message m • TPKE.DecShare(SKi,C) → σi produces the ith share of the decryption (or ⊥ if C is malformed) • TPKE.Dec(PK,C,{i,σi}) → m combines a set of decryption shares {i,σi} from at least f +1 parties obtain the plaintext m (or, if C contains invalid shares, then the invalid shares are identified).


In our concrete instantiation, we use the threshold encryption scheme of Baek and Zheng [7]. This scheme is also robust (as required by our protocol), which means that even for an adversarially generated ciphertext C, at most one plaintext (besides ⊥) can be recovered. Note that we assume TPKE.Dec effectively identifies invalid decryption shares among the inputs. Finally, the scheme satisfies the obvious correctness properties, as well as a threshold version of the IND-CPA game.

设B = Ω(λN2 logN)为批大小参数。



//步骤1:随机选择和加密•设提议是bB/Nc事务从buf的前B个元素的随机选择•加密x:= TPKE.Enc(PK,提议)//步骤2:密文协议•将x作为输入传递给ACS[r] //见图4•receive {v j} j∈S,其中S⊂[1…]N], from ACS[r] //第3步:解密•for each j∈S: let e j:= TPKE。DecShare(SKi,v j)组播DEC(r, j,i,e j)等待接收至少f + 1个形式为DEC(r, j,k,e j,k) decode yj:= TPKE的消息。Dec(PK,{(k,e j,k)})•让blockr:=已排序(∪j∈S{y j}),使blockr按规范顺序排序(例如,按字典序)•设置buf:= buf−blockr



Atomic broadcast from ACS.


We now describe in more detail our atomic broadcast protocol, defined in Figure 1.


As mentioned, this protocol is centered around an instance of ACS.


In order to obtain scalable efficiency, we choose a batching policy.


We let B be a batch size, and will commit Ω(B) transactions in each epoch.


Each node proposes B/N transactions from its queue.


To ensure that nodes propose mostly distinct transactions, we randomly select these transactions from the first B in each queue.

正如我们将在第4.4节中看到的,ACS实例化的总通信成本为O(N2|v| + λN3 logN),其中|v|限制了任何节点输入的大小。

As we will see in Section 4.4, our ACS instantiation has a total communication cost of O(N2|v| + λN3 logN), where |v| bounds the size of any node’s input.

因此,我们选择批处理大小B = Ω(λN2 logN),以便每个节点(B/N)的贡献吸收了这个附加开销。

We therefore choose a batch size B = Ω(λN2 logN) so that the contribution from each node (B/N) absorbs this additive overhead.


In order to prevent the adversary from influencing the outcome we use a threshold encryption scheme, as described below.


In a nutshell, each node chooses a set of transactions, and then encrypts it.


Each node then passes the encryption as input to the ACS subroutine.


The output of ACS is therefore a vector of ciphertexts.


The ciphertexts are decrypted once the ACS is complete.


This guarantees that the set of transactions is fully determined before the adversary learns the particular contents of the proposals made by each node.


This guarantees that an adversary cannot selectively prevent a transaction from being committed once it is in the front of the queue at enough correct nodes.


4.4 Instantiating ACS Efficiently

Cachin等人提出了一个我们称之为CKPS01的协议,该协议(隐含地 它将ACS简化为多值验证的拜占庭协议(MVBA)[15]。




Cachin et al present a protocol we call CKPS01 that (implicitly) reduces ACS to multi-valued validated Byzantine agreement (MVBA) [15].

Roughly speaking, MVBA allows nodes to propose values satisfying a predicate, one of which is ultimately chosen.

The reduction is simple: the validation predicate says that the output must be a vector of signed inputs from at least N − f parties.

Unfortunately, the MVBA primitive agreement becomes a bottleneck, because the only construction we know of incurs an overhead of O(N3|v|).


We avoid this bottleneck by using an alternative instantiation of ACS that sidesteps MVBA entirely.


The instantiation we use is due to Ben-Or et al [9] and has, in our view, been somewhat overlooked.


In fact, it predates CKPS01 [15], and was initially developed for a mostly unrelated purpose (as a tool for achieving efficient asynchronous multi-party computation [9]).


This protocol is a reduction from ACS to reliable broadcast (RBC) and asynchronous binary Byzantine agreement (ABA).


Only recently do we know of efficient constructions for these subcomponents, which we explain shortly.


At a high level, the ACS protocol proceeds in two main phases.


In the first phase, each node Pi uses RBC to disseminate its proposed value to the other nodes, followed by ABA to decide on a bit vector that indicates which RBCs have successfully completed.


We now briefly explain the RBC and ABA constructions before explaing the Ben-Or protocol in more detail.


•(协议)如果任意两个正确的节点交付v和v0,则v = v0。


Communication-optimal reliable roadcast. An asynchronous reliable broadcast channel satisfies the following properties: • (Agreement) If any two correct nodes deliver v and v0, then v = v0.

• (Totality) If any correct node delivers v, then all correct nodes deliver v • (V alidity) If the sender is correct and inputs v, then all correct nodes deliver v

虽然Bracha[13]的经典可靠广播协议需要O(N2|v|)比特的总通信量来广播一个大小为|v|的消息,但Cachin和Tessaro[18]观察到,即使在最坏的情况下,擦除编码可以将这个成本降低到仅为O(N|v| + λN2 logN)。这对大的信息来说是一个重大的改进(即当|v| ? λN logN),这(回顾第4.3节)指导我们对批处理规模的选择。这里使用擦除编码最多引起一个小的恒定系数的开销,等于NN-2 f < 3。

While Bracha’s [13] classic reliable broadcast protocol requires O(N2|v|) bits of total communication in order to broadcast a message of size |v|, Cachin and Tessaro [18] observed that erasure coding can reduce this cost to merely O(N|v| + λN2 logN), even in the worst case. This is a significant improvement for large messages (i.e., when |v| ? λN logN), which, (looking back to Section 4.3) guides our choice of batch size. The use of erasure coding here induces at most a small constant factor of overhead, equal to NN−2 f < 3.


If the sender is correct, the total running time is three (asynchronous) rounds; and in any case, at most two rounds elapse between when the first correct node outputs a value and the last outputs a value.


The reliable broadcast algorithm shown in Figure 2.


在输入(v)时(如果Pi = PSender):
设{s j} j∈[N]是应用于v的(N 2 f,N)擦除编码方案的块
设h是在{s j}上计算的Merkle树根将VAL(h,b j,s j)
发送到每一方P j,其中b j是第j个Merkle树分支

在检查b j是否是根h和叶s j的有效Merkle分支,否则丢弃

从收到的任何N-2 f个叶中插入{s0j}
重新计算Merkle根h0,如果h0 6= h,则中止

如果尚未发送就绪(h),则多播就绪(h)收到f + 1个匹配就绪(h)消息后, 如果尚未发送就绪消息,则发送多播就绪消息(h)
在收到2个f + 1匹配就绪消息(h)后,等待N 2 f回应消息,然后解码v





Binary Agreement. Binary agreement is a standard primitive that allows nodes to agree on the value of a single bit. More formally, binary agreement guarantees three properties: • (Agreement) If any correct node outputs the bit b, then every correct node outputs b.

• (Termination) If all correct nodes receive input, then every correct node outputs a bit.

• (V alidity) If any correct node outputs b, then at least one correct node received b as input.


The validity property implies unanimity: if all of the correct nodes receive the same input value b, then b must be the decided value.


On the other hand, if at any point two nodes receive different inputs, then the adversary may force the decision to either value even before the remaining nodes receive input.


We instantiate this primitive with a protocol from Moustefaoui et al [42], which is based on a cryptographic common coin.


We defer explanation of this instantiation to the Appendix.


Its expected running time is O(1), and in fact completes within O(k) rounds with probability 1 − 2−k.


Agreeing on a subset of proposed values.

综上所述,我们使用Ben-Or等人[9]的协议来商定一组包含至少N f个节点的完整建议的值。

Putting the above pieces together, we use a protocol from Ben-Or et al [9] to agree on a set of values containing the entire proposals of at least N − f nodes.


At a high level, this protocol proceeds in two main phases.


in the first phase, each node Pi uses Reliable Broadcast to disseminate its proposed value to the other nodes.

在第二阶段,使用二进制拜占庭协议的N个并发实例来商定一个位向量{b j} j∈[1…N],其中b j = 1表示P j的建议值包含在最终集合中。

In the second stage, N concurrent instances of binary Byzantine agreement are used to agree on a bit vector {b j} j∈[1…N], where b j = 1 indicates that P j’s proposed value is included in the final set.


Actually the simple description above conceals a subtle challenge, for which Ben-Or provide a clever solution.

在上述示意图的实现中,一个天真的尝试是让每个节点等待第一次(N f)广播完成,然后为与之对应的二进制协议实例建议1,为所有其他实例建议0。

A naïve attempt at an implementation of the above sketch would have each node to wait for the first (N − f ) broadcasts to complete, and then propose 1 for the binary agreement instances corresponding to those and 0 for all the others.


However, correct nodes might observe the broadcasts complete in a different order.


Since binary agreement only guarantees that the output is 1 if all the correct nodes unaninimously propose 1, it is possible that the resulting bit vector could be empty.

为避免这一问题,节点在确定最终向量将至少设置N f位之前,不会建议0。

为了给这个协议的流程提供一些直觉,我们在图3中叙述了几个可能的场景。图4给出了Ben-Or等人[9]的算法。预期运行时间为O(logN ),因为它必须等待所有二进制协议实例完成。当用上面描述的可靠广播和二进制协议结构实例化时,假设|v|是任何节点输入的最大尺寸,总通信复杂度是O(N2|v| + λN3 logN)。

To avoid this problem, nodes abstain from proposing 0 until they are certain that the final vector will have at least N − f bits set.

To provide some intuition for the flow of this protocol, we narrate several possible scenarios in Figure 3. The algorithm from Ben-Or et al [9] is given in Figure 4. The running time is O(logN) in expectation, since it must wait for all binary agreement instances to finish. When instantiated with the reliable broadcast and binary agreement constructions described above, the total communication complexity is O(N2|v| + λN3 logN) assuming |v| is the largest size of any node’s input.

图3:(ACS执行的图解示例。)我们的协议的每次执行包括运行N个可靠广播(RBC)的并发实例,以及N个拜占庭协议(BA),它们依次使用预期的恒定数量的公共硬币。我们从节点0的角度举例说明了这些实例如何发生的几个可能的例子。(a)在普通情况下,节点0从索引1处的可靠广播接收值V1(节点1的建议值)。因此,节点0向BA1提供输入“是”,BA1输出“是”(b)RBC 2需要太长时间才能完成,节点0已经接收到(N f)“是”输出,因此对BA2投“否”票。但是,其他节点已经看到RBC2成功完成,因此BA2的结果为“是”,节点0必须等待V2。©在RBC3完成之前,BA3以“否”结束。


收到输入vi时,输入vi至RBCi //见图RBC j发送v j时,如果输入尚未提供给BA j,则向BA j提供输入1。见图11 BA至少N f个实例发送值1时,向尚未提供输入的BA的每个实例提供输入0。

完成BA的所有实例后,让c .⊂[1…N]是提供1的每个BA的索引。等待每个RBC j的输出v j,使得j ∈ C .最终输出∪ j∈Cv j。



4.5 Analysis



First we observe that the agreement and total order properties follow immediately from the definition of ACS and robustness of the TPKE scheme.

THEOREM 1. (Agreement and total order). The HoneyBadgerBFT protocol satisfies the agreement and total order properties, except for negligible probability.



These two properties follow immediately from properties of the high-level protoocls, ACS and TPKE.


Each ACS instance guarantees that nodes agree on a vector of ciphertexts in each epoch (Step 2).


The robustness of TPKE guarantees that each correct node decrypts these ciphertexts to consistent values (Step 3).


This suffices to ensure agreement and total order.

定理2.(复杂度)。假设批量大小为B=Ω(λN2 logN),每个HoneyBadgerBFT epoch的运行时间预期为O(logN),总的预期通信复杂性为O(B)。

THEOREM 2. (Complexity). Assuming a batch size of B = Ω(λN2 logN), the running time for each HoneyBadgerBFT epoch is O(logN) in expectation, and the total expected communication complexity is O(B).


PROOF. The cost and running time of ACS is explained in Section 4.4. The N instances of threshold decryption incur one additional round and an additional cost of O(λN2), which does not affect the overall asymptotic cost.


The HoneyBadgerBFT protocol may commit up to B transactions in a single epoch. However, the actual number may be less than this, since some correct nodes may propose overlapping transaction sets, others may respond too late, and corrupted nodes may propose an empty set. Fortunately, we prove (in the Appendix) that assuming each correct node’s queue is full, then B/4 serves as an lower bound for the expected number of transactions committed in an epoch.5


THEOREM 3. (Efficiency). Assuming each correct node’s queue contains at least B distinct transactions, then the expected number of transactions committed in an epoch is at least B4 , resulting in constant efficiency.


Finally, we prove (in the Appendix) that the adversary cannot significantly delay the commit of any transaction.

定理4。(审查弹性)。假设对手将事务tx作为输入传递给N-f个正确的节点。设T为“积压”的大小,即先前输入到任何正确节点的事务总数和已经提交的事务数之差。则tx在O(T /B + λ)个时期内被提交,除非概率可以忽略。

THEOREM 4. (Censorship Resilience). Suppose an adversary passes a transaction tx as input to N − f correct nodes. Let T be the size of the “backlog”, i.e. the difference between the total number of transactions previously input to any correct node and the number of transactions that have been committed. Then tx is commited within O(T /B + λ) epochs except with negligible probability.




In this section we carry out several experiments and performance measurements using a prototype implementation of the HoneyBadgerBFT protocol.


Unless otherwise noted, numbers reported in this section are by default for the optimistic case where all nodes are behaving honestly.


First we demonstrate that HoneyBadgerBFT is indeed scalable by performing an experiment in a wide area network, including up to 104 nodes in five continents.


Even under these conditions, HoneyBadgerBFT can reach peak throughputs of thousands of transactions per second.


Furthermore, by a comparison with PBFT, a representative partially synchronous protocol, HoneyBadgerBFT performs only a small constant factor worse.


Finally, we demonstrate the feasibility of running asynchronous BFT over the Tor anonymous communication layer.


Implementation details.


We developed a prototype implementation of HoneyBadgerBFT in Python, using the gevent library for concurrent tasks.


我们使用Charm [3] PBC库的Python包装器[38]来实现这些阈值密码学方案。对于阈值签名,我们使用提供的MNT224曲线,导致签名(和签名份额)只有65字节,并启发式地提供112比特的安全性。6我们的阈值加密方案需要一个对称双线性组:因此我们使用SS512组,启发式地提供80比特的安全性[44] 。

For deterministic erasure coding, we use the zfec library [52], which implements Reed-Solomon codes. For instantiating the common coin primitive, we implement Boldyreva’s pairing-based threshold signature scheme [11]. For threshold encryption of transactions, we use Baek and Zheng’s scheme [7] to encrypt a 256-bit ephemeral key, followed by AES-256 in CBC mode over the actual payload.

We implement these threshold cryptography schemes using the Charm [3] Python wrappers for PBC library [38]. For threshold signatures, we use the provided MNT224 curve, resulting in signatures (and signature shares) of only 65 bytes, and heuristically providing 112 bits of security.6Our threshold encryption scheme requires a symmetric bilinear group: we therefore use the SS512 group, which heuristically provides 80 bits of security [44]


In our EC2 experiments, we use ordinary (unauthenticated) TCP sockets.


In a real deployment we would use TLS with both client and server authentication, adding insignificant overhead for longlived sessions.


Similarly, in our Tor experiment, only one endpoint of each socket is authenticated (via the “hidden service” address).


Our theoretical model assumes nodes have unbounded buffers.


In practice, more resources could be added dynamically to a node whenever memory consumption reaches a watermark, (e.g., whenever it is 75% full) though our prototype implementation does not yet include this feature.


Failure to provision an adequate buffer would count against the failure budget f .


5.1 Bandwidth Breakdown and Evaluation


We first analyze the bandwidth costs of our system.

在所有实验中,我们假设每个mT = 250字节的恒定交易大小,这将允许ECDSA签名、两个公钥以及应用程序有效载荷(即,大约为典型比特币交易的大小)。

In all experiments, we assume a constant transaction size of mT = 250 bytes each, which would admit an ECDSA signature, two public keys, as well as an application payload (i.e., approximately the size of a typical Bitcoin transaction).

我们的实验使用参数N = 4 f,8,每一方提出一批B/N事务。

Our experiments use the parameter N = 4 f ,8 and each party proposes a batch of B/N transactions.

为了模拟最差情况,节点从大小为b的相同队列开始,我们将运行时间记录为从实验开始到第(N f)个节点输出值的时间。

To model the worst case scenario, nodes begin with identical queues of size B. We record the running time as the time from the beginning of the experiment to when the (N − f )-th node outputs a value.


Bandwidth and breakdown findings.


The overall bandwidth consumed by each node consists of a fixed additive overhead as well as a transaction dependent overhead.


For all parameter values we considered, the additive overhead is dominated by an O(λN2) term resulting from the threshold cryptography in the ABA phases and the decryption phase that follows.


The ABA phase involves each node transmitting 4N2 signature shares in expectation.

只有RBC阶段会产生与事务相关的开销,等于擦除编码扩展因子r = NN 2 f。

Only the RBC phase incurs a transaction-dependent overhead, equal to the erasure coding expansion factor r = NN−2 f .

由于回显消息中包含Merkle树分支,RBC阶段也会对开销产生N2 logN哈希。

The RBC phase also contributes N2 logN hashes to the overhead because of Merkle tree branches included in the ECHO messages.


The total communication cost (per node) is estimated as:



当我们增加建议的批量B时,系统的有效吞吐量增加,因此成本中与交易相关的部分占主导地位。如图5所示,对于N = 128,对于高达1024个事务的批量,与事务无关的带宽仍然在总成本中占主导地位。然而,当批量大小达到16384时,依赖于事务的部分开始占主导地位,这主要是由RBC造成的。节点传输擦除编码块的回声阶段。

where mE and mD are respectively the size of a ciphertext and decryption share in the TPKE scheme, and mS is the size of a TSIG signature share.

The system’s effective throughput increases as we increase the proposed batch size B, such that the transaction-dependent portion of the cost dominates. As Figure 5 shows, for N = 128, for batch sizes up to 1024 transactions, the transaction-independent bandwidth still dominates to overall cost. However, when when the batch size reaches 16384, the transaction-dependent portion begins to dominate — largely resulting from the RBC.ECHO stage where nodes transmit erasure-coded blocks.

图5:不同批量的估计通信成本,单位为兆字节(每个节点)。对于小批量,固定成本随着O(N2对数)增长。饱和时,开销系数接近NN 2 f < 3。

5.2 Experiments on Amazon EC2



我们在32、40、48、56、64和104个亚马逊EC2 t2.medium实例上运行HoneyBagderBFT,这些实例均匀分布在其跨越5大洲的8个地区。在我们的实验中,我们改变了批量大小,使每个节点提出256、512、1024、2048、4096、8192、16384、32768、65536或131072个交易。

To see how practical our design is, we deployed our protocol on Amazon EC2 services and comprehensively tested its performance.

We ran HoneyBagderBFT on 32, 40, 48, 56, 64, and 104 Amazon EC2 t2.medium instances uniformly distributed throughout its 8 regions spanning 5 continents. In our experiments, we varied the batch size such that each node proposed 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, or 131072 transactions.


吞吐量。吞吐量被定义为单位时间内提交的事务数量。在我们的实验中,如果没有另外指定,我们使用“每秒确认的事务”作为度量单位。图6显示了吞吐量和所有N方提出的事务总数之间的关系。容错参数设置为f = N/4。



Throughput. Throughput is defined as the number of transactions committed per unit of time. In our experiment, we use “confirmed transactions per second” as our measure unit if not specified otherwise. Figure 6 shows the relationship between throughput and total number of transactions proposed by all N parties. The fault tolerance parameter is set to be f = N/4.

Findings. From Figure 6 we can see for each setting, the throughput increases as the number of proposed transactions increases. We achieve throughput exceeding 20,000 transactions per second for medium size networks of up to 40 nodes. For a large 104 node network, we attain more than 1,500 transactions per second. Given an infinite batch size, all network sizes would eventually converge to a common upper bound, limited only by available bandwidth.

Although the total bandwidth consumed in the network increases (linearly) with each additional node, the additional nodes also contribute additional bandwidth capacity.

吞吐量、延时和规模的权衡。延迟被定义为从第一个节点收到客户请求到第(N - f )-个节点完成共识协议的时间间隔。这是合理的,因为(N - f )-第1个节点完成协议意味着诚实的各方完成了共识。

Throughput, latency, and scale tradeoffs. Latency is defined as the time interval between the time the first node receives a client request and when the (N − f )-th node finishes the consensus protocol. This is reasonable because the (N − f )-th node finishing the protocol implies the accomplishment of the consensus for the honest parties.

Figure 7: Latency vs. throughput for experiments over wide area networks. Error bars indicate 95% confidence intervals.


图7显示了N和f = N/4的不同选择下延迟和吞吐量之间的关系。正斜率表明我们的实验尚未使可用带宽完全饱和,即使批量较大,我们也能获得更好的通量。图7还显示,延迟随着节点数量的增加而增加,这主要源于协议的ABA阶段。事实上,在N = 104时,对于我们尝试的批量大小范围,我们的系统是CPU受限的,而不是带宽受限的,因为我们的实现是单线程的,并且必须验证O(N2)阈值签名。无论如何,我们最大的104节点实验在6分钟内完成。

Figure 7 shows the relationship between latency and throughput for different choices of N and f = N/4. The positive slopes indicate that our experiments have not yet fully saturated the available bandwidth, and we would attain better throughput even with larger batch sizes. Figure 7 also shows that latency increases as the number of nodes increases, largely stemming from the ABA phase of the protocol. In fact, at N = 104, for the range of batch sizes we tried, our system is CPU bound rather than bandwidth bound because our implementation is single threaded and must verify O(N2) threshold signatures. Regardless, our largest experiment with 104 nodes completes in under 6 minutes.


Although more nodes (with equal bandwidth provisioning) could be added to the network without affecting maximum attainable throughput, the minimal bandwidth consumed to commit one batch (and therefore the latency) increases with O(N2 logN). This constraint implies a limit on scalability, depending on the cost of bandwidth and users’ latency tolerance.


与PBFT的比较。图8显示了与PBFT协议的比较,后者是用于部分同步网络的经典BFT协议。我们使用Croman等人[24]的Python实现,运行在平均分布在Amazon AWS区域的8、16、32和64个节点上。选择批量大小是为了使网络的可用带宽饱和。

Comparison with PBFT. Figure 8 shows a comparison with the PBFT protocol, a classic BFT protocol for partially synchronous networks. We use the Python implementation from Croman et al [24], running on 8, 16, 32, and 64 nodes evenly distributed among Amazon AWS regions. Batch sizes were chosen to saturate the network’s available bandwidth.


Fundamentally, while PBFT and our protocol have the same asymptotic communication complexity in total, our protocol distributes this load evenly among the network links, whereas PBFT bottlenecks on the leader’s available bandwidth. Thus PBFT’s attainable throughput diminishes with the number of nodes, while HoneyBadgerBFT’s remains roughly constant.


Note that this experiment reflects only the optimistic case, with no faults or network interruptions. Even for small networks, HoneyBadgerBFT provides significantly better robustness under adversarial conditions as noted in Section 3. In particular, PBFT would achieve zero throughput against an adversarial asynchronous scheduler, whereas HoneyBadgerBFT would complete epochs at a regular rate.

5.3 Tor上的实验

5.3 Experiments over Tor


To demonstrate the robustness of HoneyBadgerBFT, we run the first instance (to our knowledge) of a fault tolerant consensus protocol carried out over Tor (the most successful anonymous communication network).


Tor adds significant and varying latency compared to our original AWS deployment.


Regardless, we show that we can run HoneyBadgerBFT without tuning any parameters.


Hiding HoneyBadgerBFT nodes behind the shroud of Tor may offer even better robustness.


Since it helps the nodes to conceal their IP addresses, it can help them avoid targeted network attacks and attacks involving their physical location.


Brief background on Tor.


The Tor network consists of approximately 6,500 relays, which are listed in a public directory service.


Tor enables “hidden services,” which are servers that accept connections via Tor in order to conceal their location.


When a client establishes a connection to a hidden service, both the client and the server construct 3-hop circuits to a common “rendezvous point.


” Thus each connection to a hidden service routes data through 5 randomly chosen relays.


Tor provides a means for relay nodes to advertise their capacity and utilization, and these self-reported metrics are aggregated by the Tor project.


According to these metrics,9 the total capacity of the network is ∼145Gbps, and the current utilization is ∼65Gbps.


Tor experiment setup.


We design our experiment setup such that we could run all N HoneyBadgerBFT nodes on a single desktop machine running the Tor daemon software, while being able to realistically reflect Tor relay paths.

由于每个HoneyBadgerBFT节点都形成了与其他节点的连接,因此我们在每个实验中总共构建了N2 Tor电路,从我们的机器开始和结束,并经过5个随机继电器。

To do this, we configured our machine to listen on N hidden services (one hidden service for each HoneyBadgerBFT node in our simulated network).


Since each HoneyBadgerBFT node forms a connection to each other node, we construct a total of N2 Tor circuits per experiment, beginning and ending with our machine, and passing through 5 random relays.


In summary, all pairwise overlay links traverse real Tor circuits consisting of random relay nodes, designed so that the performance obtained is representative of a real HoneyBadgerBFT deployment over Tor (despite all simulated nodes running on a single host machine).


Since Tor provides a critical public service for many users, it is important to ensure that research experiments conducted on the live network do not adversely impact it.


We formed connections from only a single vantage point (and thus avoid receiving), and ran experiments of short duration (several minutes) and with small parameters (only 256 circuits formed in our largest experiment).



图9显示了延迟如何随吞吐量变化。 与我们的EC2实验相反,在那里节点有充足的带宽,Tor 电路受限于电路中最慢的链接。 我们达到了Tor每秒超过800个交易的最大吞吐量。

Figure 9 shows how latency changes with throughput.In contrast to our EC2 experiment where nodes have ample bandwidth, Tor circuits are limited by the slowest link in the circuit.We attain a maximum throughput of over 800 transactions per second of Tor.


In general, messages transmitted over Tor’s relay network tends to have significant and highly variable latency.


For instance, during our experiment on 8 parties proposing 16384 transactions per party, a single message can be delayed for 316.18 seconds and the delay variance is over 2208 while the average delay is only 12 seconds.


We stress that our protocol did not need to be tuned for such network conditions, as would a traditional eventually-synchronous protocol.




We have presented HoneyBadgerBFT, the first efficient and highthroughput asynchronous BFT protocol.


Through our implementation and experimental results we demonstrate that HoneyBadgerBFT can be a suitable component in incipient cryptocurrency-inspired deployments of fault tolerant transaction processing systems.


More generally, we believe our work demonstrates the promise of building dependable and transaction processing systems based on asynchronous protocol.



a .攻击PBFT。





The PBFT protocol consists of two main workflows: a “fast path” that provides good performance in optimistic case (when the network is synchronous and the leader functions correctly), and a “view-change” procedure to change leaders.


The fast path consists of three rounds of communication: PRE_PREPARE, PREPARE, and COMMIT.


The leader of a given view is responsible for totally ordering all requests.


Upon receiving a client request, the leader multicasts a PRE_PREPARE message specifying the request and a sequence number to all other replicas, who respond by multicasting a corresponding PREPARE message.

副本在收到2 f条准备消息(除了相应的PRE_PREPARE消息之外)时多播一条提交消息,并在收到2 f + 1条提交消息(包括它们自己的)时执行请求。

Replicas multicast a COMMIT message on receipt of 2 f PREPARE messages (in addition to the corresponding PRE_PREPARE message), and execute the request on receipt of 2 f + 1 COMMIT messages (including their own).

当请求花费太长时间来执行(即,长于超时间隔),先前发起的视图改变花费太长时间,或者它接收到具有更高视图号的f + 1个视图改变消息时,副本增加它们的视图号并多播VIEW_CHANGE消息来选举新的领导者。

Replicas increment their view number and multicast a VIEW_CHANGE message to elect a new leader when a request takes too long to execute (i.e., longer than a timeout interval), a previously initiated view change has taken too long, or it receives f + 1 VIEW_CHANGE messages with a higher view number.


The leader of the next view is determined by the view number modulo the number of replicas (thus, leadership is transferred in a round-robin manner).

一旦接收到2 f + 1个VIEW_CHANGE消息,新的领导者就多播NEW_VIEW消息,并将其作为有效视图的证据。

The new leader multicasts a NEW_VIEW message once it receives 2 f + 1 VIEW_CHANGE messages and includes them as proof of a valid view.


A replica accepts the NEW_VIEW mesage if its number is equal to or greater than its own current view number, and resumes processing messages as normal; however messages with lower view numbers are ignored.


The timeout interval is initialized to a fixed value (∆), but increases by a factor of 2 with each consecutive unsuccessful leader election.


An intermittently synchronous network that thwarts PBFT.


The scheduler does not drop or reorder any messages, but simply delays delivering messages to whichever node is the current leader.


In particular, whenever the current leader is a faulty node, this means that messages among all honest nodes are delivered immediately.


Shortly we provide a detailed illustration of the PBFT protocol behaves under our attack.


To confirm our analysis, we implemented this malicious scheduler as a proxy that intercepted and delayed all view change messages to the new leader, and tested it against a 1200 line Python implementation of PBFT.


The results and message logs we observed were consistent with the above analysis; our replicas became stuck in a loop requesting view changes that never succeeded.


Since this scheduler is intermittently synchronous, any purely asynchronous protocol (including HoneyBadgerBFT) would make good progress during periods of synchrony, regardless of preceding intervals.



How PBFT behaves under attack.


In Figure 10, we illustrate our attack on PBFT.


The scheduler does not drop or reorder any messages, but simply delays delivering messages to whichever node is the current leader.


In particular, whenever the current leader is a faulty node, this means that messages among all honest nodes are delivered immediately.


We abbreviate client requests as “Req,” NEW_VIEW messages as “N,” VIEW_CHANGE messages as “V ,” and PRE_PREPARE messages as “PP .” The subscript on a message indicates the view in which it was sent. Here, ? followed by a message indicates that this message has been broadcast to all other nodes (called replicas) by the replica specified by the column number, at the time specified by the row number multiplied by the fixed timeout interval ∆. Similarly, • followed by a message indicates that this message has been delivered to the replica specified by the column number, at the time specified by the row. As multiple VIEW_CHANGE messages for a given view are sent to each individual node, •Vn indicates the delivery of all VIEW_CHANGE messages with view number n. A red “X” appended to a delivered message indicates that the message is ignored because the view number does not match that replica’s current view. A “*” indicates that a timer has been started as a result of the delivered message. “**” indicates that a replica’s view number has incremented as a result of the delivered message(s). A red region indicates that all broadcast operations from this replica at this time will be delayed by ∆. A pink region indicates that the receipt of all messages will be delayed by ∆.

在这个例子中,有问题的副本0最初是领导者,并扣留了一个PRE_PREPARE消息,时间超过了超时周期∆。这触发了所有节点增加他们的视图计数器,并为视图1多播一个VIEW_CHANGE消息。然后,调度器推迟了所有VIEW_CHANGE消息对副本1(视图1的领导者)的传递。其余节点的视图改变操作超时,因为它们没有从副本1收到有效的NEW_VIEW消息。然后,节点0、2和3将他们的视图计数器增加到2,并组播另一个VIEW_CHANGE消息。此时,视图1的VIEW_CHANGE消息被传递给副本1,副本1通过在视图1中组播一个NEW_VIEW和一个PRE_PREPARE消息来回应。 这些消息随后被传递,随后被所有其他节点忽略,因为它们已经进展到了视图编号2。然后,副本1将收到视图2的VIEW_CHANGE消息,并相应增加其视图计数器。然后,调度器会推迟所有VIEW_CHANGE消息对副本2的传递,确保所有其他节点的视图改变操作再次超时。这个过程将持续到有问题的副本0再次被选为领导者,此时调度器将以加速的速度传递所有消息,而副本0则扣留相应的NEW_VIEW和PRE_PREPARE消息以触发另一个视图变化,并重复这个循环。只要调度器扣留预定的非故障领导者的VIEW_CHANGE消息的时间超过(指数级增加的)超时间隔,这个循环就可以无限期地继续下去,阻止任何视图改变成功,并阻止协议取得任何进展,尽管在副本0是领导者的时间间隔内(0∆,8∆,64∆…)所有非故障副本都能够不受任何干扰地进行通信。

In this example, the faulty replica 0 is initially the leader and withholds a PRE_PREPARE message for longer than the timeout period ∆.This triggers all nodes to increment their view counter and multicast a VIEW_CHANGE message for view number 1. The scheduler then delays the delivery of all VIEW_CHANGE messages to replica 1 (the leader in view 1). The view change operation for the remaining nodes times out, as they do not receive a valid NEW_VIEW message from replica 1. Nodes 0,2, and 3 then increment their view counters to 2, and multicast another VIEW_CHANGE message. At this point, the VIEW_CHANGE messages for view 1 are delivered to replica 1, which responds by multicasting a NEW_VIEW and a PRE_PREPARE message in view 1. These messages are then delivered and subsequently ignored by all other nodes, as they have progressed to view number 2. Replica 1 will then receive the VIEW_CHANGE messages for view 2, and increments its view counter accordingly. The scheduler then delays the delivery of all VIEW_CHANGE messages to replica 2, ensuring that the view change operation of all other nodes times out again. This process will continue until the faulty replica 0 is again elected leader, at which point the scheduler will deliver all messages at an accelerated rate while replica 0 withholds the corresponding NEW_VIEW and PRE_PREPARE messages to trigger another view change and repeat this cycle. The cycle may continue indefinitely so long as the scheduler withholds VIEW_CHANGE messages from the intended non-faulty leader for longer than the (exponentially increasing) timeout interval, preventing any view changes from succeeding and stopping the protocol from making any progress, despite the fact that at time intervals where replica 0 is the leader (0∆,8∆,64∆…) all non-faulty replicas are able to communicate without any interference.


Intermittently synchronous networks.


To more clearly illustrate the difference between asynchronous networks, we introduce a new network performance assumption, ∆-intermittently synchrony, which is strictly weaker than even weak synchrony.


The idea is that a ∆intermittently synchronous network approximates a ∆-synchronous network in the sense that on average it delivers messages at a rate of 1/∆.


However, the delivery rate may be unevenly distributed in time (e.g., “bursty”), delivering no messages at all during some time intervals and delivering messages rapidly during others.

定义2。如果对于任何初始时间T0,并且对于任何持续时间D,存在区间[T0,T1]使得T1 T0≥D,并且在[T0,T1]期间推进的异步回合数至少为(T1 T0)/∈,则网络是∈-间歇同步的。

DEFINITION 2. A network is ∆-intermittently synchronous if for any initial time T0, and for any duration D, there exists an interval [T0,T1] such that T1 − T0 ≥ D and the number of asynchronous rounds advanced during [T0,T1] is at least (T1 − T0)/∆.


It is clearly the case that every ∆-synchronous network is also ∆-intermittently synchronous, since for every interval of duration ∆, messages sent prior to that interval are delivered by the end of that interval.


It is also clear that any intermittently synchronous network guarantees eventual delivery (i.e., it is no weaker than the asynchronous model).


Asynchronous protocols make progress whenever rounds of messages are delivered.


Since an intermittently-synchronous network guarantees messages are delivered on average within ∆, this means any asynchronous protocol also makes progress at an average rate of ∆.




We now restate and prove the theorems originally stated in Section 4.5.


THEOREM 3. (Efficiency). Assuming each correct node’s queue contains at least B distinct transactions, then the expected number of transactions committed in an epoch is at least B4 , resulting in constant efficiency.

证明。首先,我们考虑一个实验,其中用随机明文的加密代替阈值加密的密文。在这种情况下,对手不知道关于每个诚实方的建议批次的任何信息。我们将首先展示在这个实验中,一个epoch中提交的事务的预期数量至少是14 B。

PROOF. First, we consider an experiment where the thresholdencrypted ciphertexts are replaced with encryptions of random plaintexts. In this case, the adversary does not learn any information about the proposed batch for each honest party. We will first show that in this experiment, the expected number of transactions committed in an epoch is at least 14 B.

实验一。每个正确的节点从buf[: B]中选择B/N个不同事务的随机子集,其中buf[: B]表示其队列中的前B个元素。对手选择N ^ 2f个正确的节点,让S表示它们提议的事务的并集——回想一下,ACS协议保证协定集至少包含N ^ 2f个正确节点提议的事务。设X1表示s中不同事务的数量。

Experiment 1. Each correct node selects a random subset of B/N distinct transactions from buf[: B], where buf[: B] denotes the first B elements in its queue. The adversary selects N − 2 f correct nodes and let S denote the union of their proposed transactions — recall that the ACS protocol guarantees that the agreed set contains at least transactions proposed by N − 2 f correct nodes. Let X1 denote the number of distinct transactions in S.

buf[: B]的内容可以被对抗性地选择,显然,最坏的情况是buf[: B]对所有诚实方都是相同的;因为否则E[X1]只能更大。

The contents of buf[: B] can be adversarially chosen, and clearly, the worst case is when buf[: B] is identical for all honest parties; since otherwise E[X1] can only be greater.

我们现在考虑一个稍微不同的实验,其中不是从buf[: B]中选择B/N个不同的元素;每个诚实方从buf[: B]中选择一组B/N元素进行替换。在这个随机过程中,约定集合中不同元素的预期数量只能更小。另请注意,我们可以限制(N 2 f)(B/N)> B/3,因为N > 3 f。因此,我们将使用以下更简单的实验来限制实验1中一致同意的集合中不同项目的数量:

We now consider a slightly different experiment where instead of choosing B/N distinct elements from buf[: B]; each honest party chooses a set of B/N elements from buf[: B] with replacement. The expected number of distinct elements in the agreed set can only be smaller in this stochastic process. Also note that we can bound (N − 2 f )(B/N) > B/3 since N > 3 f . Therefore, we will bound the number of distinct items in the agreed set in Experiment 1 with the following, much simpler experiment:

实验二。把B3球扔进垃圾箱。让X2表示至少有一个球的箱子的数量。显然,E[X2] ≤ E[X1]。

我们现在驶往X2。因为对于每个容器,空的概率是1 1B B/3,所以至少有一个球的容器的期望数量是E[X2]= B(1(1 1B)B/3)> B(1 E1/3)> 14b

Experiment 2. Throw B3 balls at B bins. Let X2 denote the number of bins with at least one ball. Clearly, E[X2] ≤ E[X1].

We now bound E[X2]. Since for each bin, the probability of being empty is 1 − 1B B/3, the expected number of bins with at least one ball is E[X2] = B(1 − (1 − 1B )B/3) > B(1 − e−1/3) > 14 B.

We now claim that when the ciphertexts are instantiated with real encryptions rather than random ones, no polynomial-time adversary can cause the expected number of committed transactions in an epoch to be smaller than B4 . We can prove this by contradiction.

我们现在声称,当用真实加密而不是随机加密来实例化密文时,没有多项式时间的对手能够使一个时期中提交事务的预期数量小于B4。我们可以用矛盾来证明这一点。如果某个多项式时间对手A可以使期望值为B4或更小,那么我们可以构造一个区分器D,它可以通过运行Aω(λ)个历元来区分随机密文和真实密文。如果跨越这些时期的平均事务数小于14 B,D猜测密文是真实的;否则它会猜测它们是随机。根据标准赫夫丁界限,D以1 exp(ω(λ))的概率成功。注意,我们只依赖于底层门限加密方案的语义安全性(即IND-CPA)(而不是依赖于像INDCCA2这样更强的定义);这是因为在ACS子协议完成之前,对手不能在一个时期内解密任何密文。

If some polynomial-time adversary A can cause the expectation to be B4 or smaller, then we can construct a distinguisher D that can distinguish random vs. real ciphertexts by running A for Ω(λ) many epochs. If the average number of transactions across these epochs is smaller than 14 B, D guesses that the ciphertexts are real; otherwise it guess they are random. By a standard Hoeffding bound, D succeeds with 1 − exp(−Ω(λ)) probability. Note that we rely only on the semantic security (i.e., IND-CPA) of the underlying threshold encryption scheme (not on a stronger definition like INDCCA2); this is because the adversary cannot decrypt any ciphertexts in an epoch until the ACS subprotocol completes.

定理4。(审查弹性)。假设对手将事务tx作为输入传递给N-f个正确的节点。设T为“积压”的大小,即先前输入到任何正确节点的事务总数和已经提交的事务数之差。则tx在O(T /B + λ)个时期内被提交,除非概率可以忽略。

THEOREM 4. (Censorship Resilience). Suppose an adversary passes a transaction tx as input to N − f correct nodes. Let T be the size of the “backlog”, i.e. the difference between the total number of transactions previously input to any correct node and the number of transactions that have been committed. Then tx is commited within O(T /B + λ) epochs except with negligible probability.

在每个时期的开始,每个正确的节点可以处于两种状态之一:或者(类型1) tx出现在它的队列的前面(即,前B个元素),或者(类型2)它的队列在tx前面放置了多于B个元素。

主要思想是,在每个时期中,对手必须包括至少dN/6e类型1节点(类型1时期)或至少dN/6e类型2节点(类型2时期)的提议。在类型1时期中,tx以至少1 E1/6的概率被提交。显然,在O(λ)这样的时期之后,tx将可能已经被提交。然而,在类型2时期,我们期望从初始积压中清除至少B(1e 1/6)个事务。

因此,我们将表明,在O(T /B + λ)类型2时期之后,所有T个事务都将被提交的概率很高。

引理1。在最多O(T /B + λ)个类型2的时期之后,来自积压的T个事务将以高概率被提交。

At the beginning of each epoch, each correct node can be in one of two states: either (Type 1) tx appears in the front of its queue (i.e., the first B elements), or else (Type 2) it queue has more than B elements placed in front of tx.

The main idea is that in each epoch the adversary must include the proposals of either at least dN/6e Type 1 nodes (a Type 1 epoch), or at least dN/6e Type 2 nodes (a Type 2 epoch). In a Type 1 epoch, tx is committed with probability at least 1 − e−1/6. Clearly after O(λ) such epochs, tx will likely have been committed. However, in a Type 2 epoch, we expect to clear at least B(1−e−1/6) transactions from the initial backlog.

We will therefore show that after O(T /B + λ) Type 2 epochs, with high probability all T transactions will have been committed.

LEMMA 1. After at most O(T /B + λ) Type 2 epochs, T transactions from the backlog will have been committed with high probability.

设ε > 0是一个常数,我们将用它作为尾界分析的安全裕度。设X表示如上所述的k个时期后提交的事务总数。利用定理3的期望分析,X的期望值为E[X] ≥ kB8。

当k = max(λ,8T(1ε)B)时,我们选择等待的历元数,以确保k ≥ λ且E[X]T≥εE[X]。



Let ε > 0 be a constant, which we will use as a safety margin for our tail bound analysis. Let X denote total number of committed transactions after k epochs as described. Using the expectation analysis from Theorem 3, the expected value of X is E[X] ≥ kB8 .

We choose the number of epochs to wait as k = max(λ, 8T (1−ε)B ), which ensures that k ≥ λ and that E[X] − T ≥ εE[X].

Although the adversary may correlate its behavior from one epoch to the next, the bound on E[X] depends only on the random choices of the parties, which are independent. Therefore using Hoeffding’s inequality, we have
giving us the desired bound.








Realizing binary agreement from a common coin. Binary agreement allows nodes to agree on the value of a single bit. More formally, binary agreement guarantees three properties: • (Agreement) If any correct node outputs the bit b, then every correct node outputs b.

• (Termination) If all correct nodes receive input, then every correct node outputs a bit.

• (V alidity) If any correct node outputs b, then at least one correct node received b as input.





The validity property implies unanimity: if all of the correct nodes receive the same input value b, then b must be the decided value.

On the other hand, if at any point two nodes receive different inputs, then the adversary may force the decision to either value even before the remaining nodes receive input.

We instantiate this primitive with a protocol based on cryptographic common coin, which essentially act as synchronizing gadgets. The adversary only learns the value of the next coin after a majority of correct nodes have committed to a vote — if the coin matches the majority vote, then that is the decided value. The adversary can influence the majority vote each round, but only until the coin is revealed.

The Byzantine agreement algorithm from Moustefaoui et al [42] is shown in Figure 11. Its expected running time is O(1), and in fact completes within O(k) rounds with probability 1 − 2−k. When instantiated with the common coin defined below, the total communication complexity is O(λN2), since it uses a constant number of common coins.

从门限签名方案实现普通硬币。公共硬币是满足以下性质的协议:如果f + 1方调用GetCoin(),那么各方最终都会收到一个公共值s。



按照Cachin等人[16]的思路,一个普通硬币可以由一个唯一的门限签名方案实现。(N,f)-门限签名方案涉及将签名密钥ski的份额分发给N方的每一方。给定消息,使用秘密密钥ski的一方可以计算任意消息m上的签名份额。给定消息m的f + 1个这样的签名份额,任何人都可以组合这些份额以产生有效的签名,该签名在公共密钥pk下验证。对于少于f + 1的份额,(即,除非至少一个诚实方故意计算并披露份额),对手什么也学不到。我们依赖于一个额外的唯一性属性,它保证对于一个给定的公钥pk,在每个消息m上正好存在一个有效的签名。


Realizing a common coin from a threshold signature scheme. A common coin is a protocol that satisfies the following properties: • If f + 1 parties call GetCoin(), then all parties eventually receive a common value, s.

• The value s is uniformly sampled in the range {0,1}λ , and cannot be influenced by the adversary.

• Until at least one party calls GetCoin(), no information about s is revealed to the adversary.

Following Cachin et al [16], a common coin can be realized from a unique threshold signature scheme. An (N, f )-threshold signature scheme involves distributing shares of a signing key ski to each of N parties. Given a message, a party using secret key ski can compute a signature share on an arbitrary message m. Given f + 1 such signature shares for message m, anyone can combine the shares to produce a valid signature, which verifies under the public key pk. With fewer than f + 1 shares, (i.e., unless at least one honest party deliberately computes and reveals a share), the adversary learns nothing. We rely on an additional uniqueness property, which guarantees that for a given public key pk, there exists exactly one valid signature on each message m.

The idea of Cachin et al [16] is simply to use the threshold signature as a source of random bits, by signing a string that serves as the “name” of the coin. This naturally allows the protocol to be used to generate a sequence (or random-access table) of coins, and makes it convenient to use in modular subprotocols.


我们假设ThresholdCombine是健壮的,也就是说,如果它用一组多于f + 1个签名部分运行,它会拒绝任何无效的部分。特别地,如果提供了2个f + 1份额,则f + 1的有效子集肯定在其中。在实践中,以这种方式检测到的任何不正确的份额都可以用作指控节点的证据。

具体地,我们使用基于双线性群和Gap Diffie Hellman假设的有效阈值方案[11]。我们用TSIG来指代这个方案。普通硬币只需要一轮异步完成,每个节点的通信开销为O(Nλ)。

We assume that ThresholdCombine is robust, in the sense that if it is run with a set of more than f + 1 signature shares, it rejects any invalid ones. In particular, if 2 f + 1 shares are provided, certainly a valid subset of f + 1 is among them. In practice, any incorrect shares detected this way can be used as evidence to incriminate a node.

Concretely, we use an efficient threshold scheme [11] based on bilinear groups and the Gap Diffie Hellman assumption. We use TSIG to refer to this scheme. The common coin requires only one asynchronous round to complete, and the communication cost is O(Nλ) per node.

