Paxos与Raft 拜占庭问题

小小毛毛虫~

已于 2022-02-16 11:03:03 修改

阅读量397

点赞数

文章标签： zookeeper java 分布式

于 2022-02-16 09:20:30 首次发布

原文链接：https://www.jianshu.com/p/f1aea2d10f5b

版权

本文介绍了分布式系统中的共识算法Paxos和Raft。Paxos是最先被证明的共识算法，涉及提案者、接受者和学习者的角色，通过两阶段提交达成一致。而Raft算法简化了这一过程，通过领导者选举和日志复制实现一致性。在Raft中，节点分为follower、candidate和leader，通过选举和日志同步确保系统的一致性。

摘要由CSDN通过智能技术生成

Raft - 动画演示 Raft

Paxos

Paxos问题指分布式系统中存在故障fault，但不存在恶意corrupt节点场景（消息可能丢失但不会造假）下的共识达成（Consensus）问题。

注：对于corrupt情形是拜占庭问题

拜占庭故障

这个问题是在1982年由Lamport, Shostak, Pease 提出 ——The problem of reaching a consensus among distributed units if some of them give misleading answers. （在分布式单元中的其中几个成员给出错误讯息的条件下，使分布式单元达到一致的难题。）The original problem（原始问题是关于几个将军策划政变。其中有些将军撒谎说可以支持一个具体的计划，或者支持其他将军告诉他们的话。） concerns generals plotting a coup. Some generals lie about whether they will support a particular plan and what other generals told them. What percentage of liars can a decision making algorithm tolerate and still correctly determine a consensus? （一个决策算法可以容忍多少百分比的骗子，然后仍然能够正确确定共识？）

最后结论是：既要想容忍t个判国者，必须保证总的将军的个数大于3t。（数学难题）

[转]拜占庭故障 & Paxos 算法 - CharyGao - 博客园

Paxos是第一个被证明的共识算法，原理基于两阶段提交并进行扩展。算法中将节点分为三种类型：

倡议者proposer：提交一个提案，等待大家批准为结案，往往是客户端担任。
接受者acceptor：负责对提案进行投票，往往服务器担任。提议超过半数的接受者投票及被选中。
学习者learner：被告知提案结果，并与之统一，不参与投票过程。客户端和服务端都可担任。

每个节点在协议中可以担任多个角色。

Paxos的特点：

一个或多个节点可以提出提议
系统针对所有提案中的某个提案必须达成一致
最多只能对一个确定的提案达成一致
只要超过半数的节点存活且可互相通信，整个系统一定能达成一致状态

两个阶段分别是准备（prepare）和提交（commit）。准备阶段解决大家对哪个提案进行投票的问题，提交阶段解决确认最终值的问题。

简单来说，提案者发出提案后，收到一些反馈，有两种结果，一种结果是自己的提案被大多数节点接受了，另外一种是没被接受，没被接受就过会再试试。
提案者收到来自大多数的接受反馈，也不能认为这就是最终确认。因为这些接收者并不知道自己刚反馈的提案就是全局的绝对大多数。
所以，引入新的一轮再确认阶段是必须的，提案者在判断这个提案可能被大多数接受的情况下，发起一轮新的确认提案。这就进入了提交阶段。
提交阶段的提案发送出去，其他阶段进行提案值比较，返回最大的，所以提案者收到返回消息不带新的提案，说明锁定成功，如果有新的提案内容，进行提案值最大比较，然后替换更大的值。如果没有收到足够多的回复，则需要再次发出请求。

一旦多数接受了共同的提案值，则形成决议，称为最终确认的提案。

Raft算法是Paxos算法的一种简化实现。

Raft is a protocol for implementing distribution consensus.

包括三种角色：leader，candidate和follower。

follow:所有节点都以follower的状态开始，如果没有收到leader消息则会变成candidate状态。
candidate：会向其他节点拉选票，如果得到大部分的票则成为leader，这个过程是Leader选举。
leader：所有对系统的修改都会先经过leader。

其有两个基本过程：

Leader选举：每个candidate随机经过一定时间都会提出选举方案，最近阶段中的票最多者被选为leader。
同步log：leader会找到系统中log（各种事件的发生记录）最新的记录，并强制所有的follow来刷新到这个记录。(Log replication ---- Each change request to leader, and each change is added as an entry in the node's log. The log entry is current unsubmitted, so it won't update the node's value. To commit the entry, the node first replicates the log entry to the follower nodes. The the leader waits until a marjority of followers have written the entry. The entry is commited in the leader node. Then the leader notifies the followers that the entry is committed. The cluster has now come to consensus about the system state.)
Raft一致性算法是通过选出一个leader来简化日志副本的管理，例如，，日志项（log entry）只允许从leader流向follower。

下面是动画演示Raft，清晰理解Raft共识如何达成。

Log replcation process:

(1) First client sends a change to leader

(2) The change is appended to the leader's log

(3) The change is sent to the followers on the next heatbeat

(4) The change is committed once the leader receives majority of acknowldeges from follwers, and send a response to the client at the same time

(5)

Raft