一致性算法探寻（扩展版）12

最新推荐文章于 2024-09-29 08:32:14 发布

weixin_34294649

最新推荐文章于 2024-09-29 08:32:14 发布

阅读量99

点赞数

文章标签：大数据 python

原文链接：https://my.oschina.net/daidetian/blog/490992

版权

2019独角兽企业重金招聘Python工程师标准>>>

10 Related work

There have been numerous publications related to consensus algorithms, many of which fall into one of the following categories:

Lamport’s original description of Paxos [15], and attempts to explain it more clearly [16, 20, 21].
Elaborations of Paxos, which fill in missing details and modify the algorithm to provide a better foundation for implementation [26, 39, 13].
Systems that implement consensus algorithms, such as Chubby [2, 4], ZooKeeper [11, 12], and Spanner [6]. The algorithms for Chubby and Spanner have not been published in detail, though both claim to be based on Paxos. ZooKeeper’s algorithm has been published in more detail, but it is quite different from Paxos.
Performance optimizations that can be applied to Paxos [18, 19, 3, 25, 1, 27].
Oki and Liskov’s Viewstamped Replication (VR), an alternative approach to consensus developed around the same time as Paxos. The original description [29] was intertwined with a protocol for distributed transactions, but the core consensus protocol has been separated in a recent update [22]. VR uses a leader-based approach with many similarities to Raft.

The greatest difference between Raft and Paxos is Raft’s strong leadership: Raft uses leader election as an essential part of the consensus protocol, and it concentrates as much functionality as possible in the leader. This approach results in a simpler algorithm that is easier to understand. For example, in Paxos, leader election is orthogonal to the basic consensus protocol: it serves only as a performance optimization and is not required for achieving consensus. However, this results in additional mechanism: Paxos includes both a two-phase protocol for basic consensus and a separate mechanism for leader election. In contrast, Raft incorporates leader election directly into the consensus algorithm and uses it as the first of the two phases of consensus. This results in less mechanism than in Paxos.

Like Raft, VR and ZooKeeper are leader-based and therefore share many of Raft’s advantages over Paxos. However, Raft has less mechanism that VR or ZooKeeper because it minimizes the functionality in non-leaders. For example, log entries in Raft flow in only one direction: outward from the leader in AppendEntries RPCs. In VR log entries flow in both directions (leaders can receive log entries during the election process); this results in additional mechanism and complexity. The published description of ZooKeeper also transfers log entries both to and from the leader, but the implementation is apparently more like Raft [35].

Raft has fewer message types than any other algorithm for consensus-based log replication that we are aware of. For example, we counted the message types VR and ZooKeeper use for basic consensus and membership changes (excluding log compaction and client interaction, as these are nearly independent of the algorithms). VR and ZooKeeper each define 10 different message types, while Raft has only 4 message types (two RPC requests and their responses). Raft’s messages are a bit more dense than the other algorithms’, but they are simpler collectively. In addition, VR and ZooKeeper are described in terms of transmitting entire logs during leader changes; additional message types will be required to optimize these mechanisms so that they are practical.

Raft’s strong leadership approach simplifies the algorithm, but it precludes some performance optimizations. For example, Egalitarian Paxos (EPaxos) can achieve higher performance under some conditions with a leaderless approach [27]. EPaxos exploits commutativity in state machine commands. Any server can commit a command with just one round of communication as long as other commands that are proposed concurrently commute with it. However, if commands that are proposed concurrently do not commute with each other, EPaxos requires an additional round of communication. Because any server may commit commands, EPaxos balances load well between servers and is able to achieve lower latency than Raft in WAN settings. However, it adds significant complexity to Paxos.

Several different approaches for cluster membership changes have been proposed or implemented in other work, including Lamport’s original proposal [15], VR [22], and SMART [24]. We chose the joint consensus approach for Raft because it leverages the rest of the consensus protocol, so that very little additional mechanism is required for membership changes. Lamport’s α-based approach was not an option for Raft because it assumes consensus can be reached without a leader. In comparison to VR and SMART, Raft’s reconfiguration algorithm has the advantage that membership changes can occur without limiting the processing of normal requests; in contrast, VR stops all normal processing during configuration changes, and SMART imposes an α-like limit on the number of outstanding requests. Raft’s approach also adds less mechanism than either VR or SMART.

10 相关工作

已经有不少关于一致性算法的出版物了，大部分都着眼于以下目录中的一个：

Lamport的original description of Paxos[15]（Paxos原理描述），并尝试更清楚的解释它[16,20,21]。
Elaborations of Paxos（Paxos阐述），填补了缺失的细节并修改算法来提供一个更好实现的基础[26,39,13]。
实现一致性算法的系统，如Chubby[2,4]，Zookeeper[11,12]和Spanner[6]。Chubby和Spanner的算法未公布细节，虽然他们都宣称基于Paxos。Zookeeper的算法则发表了更多的细节，但它和Paxos太不一样了。
可用于Paxos的性能优化[18,19,3,25,1,27]。
Oki和Liskov的Viewstamped Replication (VR)，和Paxos差不多时间开发的另一个一致性方案。原理描述[29]捆绑着一个用于分布式事务的协议，但核心一致性协议在最新的更新中被分离出来了[22]。VR使用基于leader的策略，和Raft很相似

Raft和Paxos最大的不同是强领导力：Raft使用leader选举作为一致性协议的必不可少的一部分，它给leader集中了尽可能多的功能性。这个策略使它成为更容易理解并更简单的算法。比如，Paxos中，leader选举在基础一致性协议中时垂直的：它只作为性能优化的一个方式并且不是达成一致的必要部分。然而，这导致一个额外机制：Paxos为了基础一致性包含一个两段式的协议，并为了leader选举包含了一个分离机制。相比之下，Raft直接将leader选举合并进了一致性算法并将之作为一致性二阶段中的第一阶段。结果就比Paxos机制要少。

和Raft差不多，VR、Zookeeper也基于leader，因此相对Paxos也有很多Raft的优势。然而，因为最大限度的减少非leader的功能性，Raft比VR和Zookeeper机制更少。例如，日志条目在Raft中只有一个流向：在AppendEntries RPC中从leader流出的。在VR中，日志条目有两个流向（leader在选举期间接收日志条目）；这导致了额外机制和复杂性。Zookeeper发布的描述中，日志条目也是双向流通的，但实现和Raft更相似。

Raft比其他我们都知道的基于一致性的日志复制算法消息类型要少。例如，我们计算了VR和Zookeeper在基础一致性和成员关系变更中的消息类型（不包括日志压缩和客户端交互，他们几乎都是独立的算法）。VR和Zookeeper各自定义了10个不同的消息类型，而Raft只有4种消息类型（2个request和他们的response）。Raft的消息比其他算法的更紧密，但要更简单。此外，VR和Zookeeper在leader切换的term里需要流转整个日志；额外的消息类型用来优化这些机制，使之能用于实际。

Raft的强力领导策略简化了算法，但它也阻碍了一些算法优化。例如，Egalitarian Paxos (EPaxos) 在某些情况下弱领导策略可以达到更高的性能[27]。EPaxos在状态机命令中利用可交换性。任何一台服务器可以在一轮交流即其他服务器同时提出通信。然而，如果同时提出命令但不互相通信，EPaxos需要一轮额外的通信。因为每个服务器都可以提交命令，EPaxos使服务器间的负载平衡并在WAN设置方面能达到比Raft更低的延迟。然而，它明显的增加了Paxos的复杂性。

也有几种不同的集群成员关系变更的策略被提出或在工作中实现，包括Lamport的original proposal[15]，VR[22]以及SMART[24]。我们为Raft选择了joint consensus策略，因为它利用了一致性协议的其余部分，所以成员关系变更时只需要少量的额外机制。Lamport的α-based策略不是Raft的选择，因为它设定了可以没有leader就达到一致性。相对VR和SMART而言，Raft的重新配置算法的优势在于可以在不限制正常请求的处理的情况下进行成员关系变更；相比之下，VR在配置变更期间停止所有正常的进程，而SMART强加了一个α-like的限制给未完成请求数。Raft的策略比VR或SMART少了很多机制。

转载于:https://my.oschina.net/daidetian/blog/490992