如何实现一个 Paxos

最新推荐文章于 2024-05-21 09:30:00 发布

阿里云技术

最新推荐文章于 2024-05-21 09:30:00 发布

阅读量281

点赞数 1

文章标签：算法 java 人工智能云计算阿里云

本文链接：https://blog.csdn.net/weixin_43970890/article/details/125890370

版权

本文介绍的实现代码参考了 RAFT 中的概念以及 phxpaxos 的实现和架构设计，实现 multi-paxos 算法，主要针对线程安全和模块抽象进行强化，网络、成员管理、日志、快照、存储以接口形式接入，算法设计为事件驱动，仅包含头文件，便于移植和扩展。...

摘要由CSDN通过智能技术生成

Paxos 作为一个经典的分布式一致性算法(Consensus Algorithm)，在各种教材中也被当做范例来讲解。但由于其抽象性，很少有人基于朴素 Paxos 开发一致性库，而 RAFT 则是工业界里实现较多的一致性算法，RAFT 的论文可以在下面参考资料中找到（In Search of an Understandable Consensus Algorithm），RAFT 通过引入强 leader 角色，解决了 Paxos 算法中很多工程实现难题，同时引入了日志+状态机的概念，将多节点同步进行了高度抽象，解决了很多问题。这里我之所以反其道而行之，选择 Paxos 进行实现，主要是因为：

Paxos 开源实现较少，经典，各种定义高度抽象（适合作为通用库），挑战性强
正确性不依赖 leader 选举，适合快速写入节点切换（抢主），本实现里，单paxos group，3节点本地回环内存存储，3节点并发写性能16k/s，10ms leader lease优化43k/s（MBP13 2018下测试）
实现限制少，扩展性强

本实现代码参考了 RAFT 中的概念以及 phxpaxos 的实现和架构设计，实现 multi-paxos 算法，主要针对线程安全和模块抽象进行强化，网络、成员管理、日志、快照、存储以接口形式接入，算法设计为事件驱动，仅包含头文件，便于移植和扩展。

本文假设读者对 Paxos 协议有一定的了解，并不会对 Paxos 算法的推导证明和一些基本概念做过多讲解，主要着重于 Paxos 的工程实现。如果读者对 Paxos 算法的推导证明感兴趣可以阅读参考资料中的相关论文资料。

有了 Paxos 可以干什么

Paxos 如此知名，写了个库可以干些啥炫酷的事情呢？

最直观的，你可以在 Paxos 基础上实现一个分布式系统，它具备：

强一致性，保证各个节点的数据都是一样的，及时并发地在多个节点上做写操作
高可用性，例如3节点的 Paxos 系统，可以容忍任何一个节点挂掉，同时继续提供服务

基于 Paxos 系统的日志+状态机，可以轻易实现带状态的高可用服务，比如一个分布式 KV 存储系统。再结合快照+成员管理，可以让这个服务具备在线迁移、动态添加多副本等诸多高级功能。是不是心动了呢，让我们进入下面的算法实现环节。

代码地址

Talk is cheap, show me the code.

先放代码仓库链接

zpaxos github 仓库

个人习惯将基础类算法库直接写成头文件，便于后续代码引用和移植到其他项目中，同时可以让编译器充分内联各种函数，缺点是编译时间变慢。公开的代码中，为了减少额外项目引用，仅带了个日志库（spdlog，同样的 header only），单元测试写的比较简单，感兴趣的小伙伴也可以加些更多的测试。

核心算法目录

测试代码目录

Paxos 算法基础

这里为避免翻译造成错误理解，下面全部拷贝Paxos Made Simple原文作为参考

算法目标

A consensus algorithm ensures that a single one among the proposed values is chosen

Only a value that has been proposed may be chosen,
Only a single value is chosen, and
A process never learns that a value has been chosen unless it actually has been.

一个最朴素的一致性算法的目的，就是在一堆对等节点中协商出一个大家都公认的值，同时这个值是其中某个节点提出的而且在这个值确定后，能被所有节点获知。

算法实现

关于 Paxos 算法的推导证明，已经有很多文章描述了，这里我就不在赘述，毕竟本文的主要目标是实现一个 Paxos 库，我们着重于代码的实现。

Phase 1. (prepare)

A proposer selects a proposal number n and sends a prepare request with number n to a majority of acceptors.
If an acceptor receives a prepare request with number n greater than that of any prepare request to which it has already responded, then it responds to the request with a promise not to accept any more proposals numbered less than n and with the highest-numbered proposal (if any) that it has accepted.

Phase 2. (accept)

If the proposer receives a response to its prepare requests (numbered n) from a majority of acceptors, then it sends an accept request to each of those acceptors for a proposal numbered n with a value v, where v is the value of the highest-numbered proposal among the responses, or is any value if the responses reported no proposals.
If an acceptor receives an accept request for a proposal numbered n, it accepts the proposal unless it has already responded to a prepare request having a number greater than n.

最基础的流程则是这个两轮投票，为了实现投票，我们需要对描述中的实体进行代码实现。

基类Cbase

base.h 定义了算法中所需要的实体，主要包括，投票 ballot_number_t，值 value_t，acceptor 状态 state_t，角色间传递的消息 message_t。

struct ballot_number_t final {
    proposal_id_t proposal_id;
    node_id_t node_id;
};

struct value_t final {
    state_machine_id_t state_machine_id;
    utility::Cbuffer buffer;
};

struct state_t final {
    ballot_number_t promised, accepted;
    value_t value;
};

struct message_t final {
    enum type_e {
        noop = 0,
        prepare,
        prepare_promise,
        prepare_reject,
        accept,
        accept_accept,
        accept_reject,
        value_chosen,
        learn_ping,
        learn_pong,
        learn_request,
        learn_response
    } type;

    // Sender info.
    group_id_t group_id;
    instance_id_t instance_id;
    node_id_t node_id;

    /**
     * Following field may optional.
     */

    // As sequence number for reply.
    proposal_id_t proposal_id;

    ballot_number_t ballot;
    value_t value;

    // For learner data transmit.
    bool overload; // Used in ping & pong. This should be consider when send learn request.
    instance_id_t min_stored_instance_id; // Used in ping and pong.
    std::vector<learn_t> learn_batch;
    std::vector<Csnapshot::sha

最低0.47元/天解锁文章

阿里云技术

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
如何实现一个 Paxos

本文介绍的实现代码参考了 RAFT 中的概念以及 phxpaxos 的实现和架构设计，实现 multi-paxos 算法，主要针对线程安全和模块抽象进行强化，网络、成员管理、日志、快照、存储以接口形式接入，算法设计为事件驱动，仅包含头文件，便于移植和扩展。...
复制链接

扫一扫