Redis（九）：主从复制的设计与实现解析

Java后端架构V

已于 2022-03-10 21:18:13 修改

阅读量783

点赞数

分类专栏： Java Redis 架构文章标签： java 架构 redis 微服务分布式

于 2021-01-03 22:55:37 首次发布

本文链接：https://blog.csdn.net/pisa8559/article/details/112161491

版权

本文探讨了Redis主从复制的重要性，包括数据冗余、故障恢复、负载均衡和读写分离，并介绍了主从复制的基本操作和实现原理。主从复制涉及主库和从库的设置、数据同步方式的选择以及异步复制机制。文章详细解析了主从同步的流程，包括slaveof命令、连接建立、数据同步和异常处理，并讨论了全量和部分复制的实现。最后，讨论了如何持续同步数据和同步过程中需要注意的问题。

摘要由CSDN通过智能技术生成

前面几篇我们已经完全理解了redis的基本功能的实现了。

　　但单靠基本功能实现，往往还是称不上优秀的项目的。毕竟，我们现在面对的都是复杂的环境，高并发的场景，大数据量的可能。

　　简而言之，现在的系统一般都需要支持分布式部署，不存在单点问题，才算是一个合格的系统。

　　而redis作为一个存储系统，单点问题肯定是不行的。

　　最简单的，就是起码得支持读写分离功能，因为我们面临的许多问题，一般是面对大量的查询问题。而要做到读写分离功能，就是要把主节点的数据同步到从节点上。从而可以让从节点接受读请求，以减轻主节点的读压力。

　　就让我们来分析下 Redis 是如何进行主从同步数据的吧！主从同步，换个名称也就是数据复制。

0. 主从复制的作用

　　数据冗余：主从复制实现了数据的热备份，是持久化之外的一种数据冗余方式。

　　故障恢复：当主节点出现问题时，可以由从节点提供服务，实现快速的故障恢复；实际上是一种服务的冗余。

　　负载均衡：在主从复制的基础上，配合读写分离，可以由主节点提供写服务，由从节点提供读服务（即写Redis数据时应用连接主节点，读Redis数据时应用连接从节点），分担服务器负载；尤其是在写少读多的场景下，通过多个从节点分担读负载，可以大大提高Redis服务器的并发量。

　　读写分离：可以用于实现读写分离，主库写、从库读，读写分离不仅可以提高服务器的负载能力，同时可根据需求的变化，改变从库的数量；

　　高可用基石：除了上述作用以外，主从复制还是哨兵和集群能够实施的基础，因此说主从复制是Redis高可用的基础。

1. Redis 主从复制简介

　　在主从复制中，数据库分为两类，一类是主库(master)，另一类是同步主库数据的从库(slave)。主库可以进行读写操作，当写操作导致数据变化时会自动同步到从库。而从库一般是只读的(特定情况也可以写,通过参数slave-read-only指定)，并接受来自主库的数据，一个主库可拥有多个从库，而一个从库只能有一个主库。这样就使得redis的主从架构有了两种模式：一类是一主多从如下图1，二类是“链式主从复制”--主->从->主-从如下图2。

2. Redis 主从复制的操作步骤简略说明

　　1. 首先，你得有至少2个redis server 实例，单机多实例或者多机多实例皆可。

　　2. 配置主从关系，使用 slaveof master_host master_port; (config rewrite 可直接写入配置文件，避免每次都重新写)

　　3. 验证主从配置，使用 info Replication;

　　上面的操作步骤是进行实时操作的，也可以直接将 master/slave 配置放到 redis.conf 中，启动时直接加载。

　　当master需要使用密码进行访问时，可以使用命令 masterauth 进行授权。

  masterauth 123456                # 写到redis.conf配置文件中
    config set masterauth 123456    # 通过命令行进行授权

3. 主要同步的实现原理

　　主从复制大致流程为：

　　　　1. slaveof 是我们的开启方法，它会将master信息写入到从节点；
　　　　2. 然后与master进行建立连接；
　　　　3. 然后master决定复制方式是全量同步还是部分同步；
　　　　4. master进行数据准备；
　　　　5. 将需要同步的发送给slave节点；
　　　　6. 从节点执行发送过来的数据；

　　但是，我们需要进行深入理解。

3.1. slaveof 命令源码解析

　　slaveof 为我们操作开启主从复制开启了入口，其接口定义如下：

{"slaveof",slaveofCommand,3,"ast",0,NULL,0,0,0,0,0},

// 用法 slaveof <master_host> <master_port>  建立主从关系
// slaveof no one 取消主从同步
// replication.c    
void slaveofCommand(client *c) {
    /* SLAVEOF is not allowed in cluster mode as replication is automatically
     * configured using the current address of the master node. */
    if (server.cluster_enabled) {
        addReplyError(c,"SLAVEOF not allowed in cluster mode.");
        return;
    }

    /* The special host/port combination "NO" "ONE" turns the instance
     * into a master. Otherwise the new master address is set. */
    // slaveof no one, 取消主从同步
    if (!strcasecmp(c->argv[1]->ptr,"no") &&
        !strcasecmp(c->argv[2]->ptr,"one")) {
        if (server.masterhost) {
            // 取消当前的master关联，返回客户端目前状态信息，结束
            replicationUnsetMaster();
            sds client = catClientInfoString(sdsempty(),c);
            serverLog(LL_NOTICE,"MASTER MODE enabled (user request from '%s')",
                client);
            sdsfree(client);
        }
    } else {
        long port;

        if ((getLongFromObjectOrReply(c, c->argv[2], &port, NULL) != C_OK))
            return;

        /* Check if we are already attached to the specified slave */
        // 只能和一个 master 建立主从关系
        if (server.masterhost && !strcasecmp(server.masterhost,c->argv[1]->ptr)
            && server.masterport == port) {
            serverLog(LL_NOTICE,"SLAVE OF would result into synchronization with the master we are already connected with. No operation performed.");
            addReplySds(c,sdsnew("+OK Already connected to specified master\r\n"));
            return;
        }
        /* There was no previous master or the user specified a different one,
         * we can continue. */
        // 设置master信息
        replicationSetMaster(c->argv[1]->ptr, port);
        // 输出client状态信息
        sds client = catClientInfoString(sdsempty(),c);
        serverLog(LL_NOTICE,"SLAVE OF %s:%d enabled (user request from '%s')",
            server.masterhost, server.masterport, client);
        sdsfree(client);
    }
    addReply(c,shared.ok);
}
// 绑定新的master关联
/* Set replication to the specified master address and port. */
void replicationSetMaster(char *ip, int port) {
    sdsfree(server.masterhost);
    server.masterhost = sdsnew(ip);
    server.masterport = port;
    if (server.master) freeClient(server.master);
    // slave 不进行阻塞客户端
    disconnectAllBlockedClients(); /* Clients blocked in master, now slave. */
    // 断开所有 slave 连接
    disconnectSlaves(); /* Force our slaves to resync with us as well. */
    // cacheMaster 丢弃
    replicationDiscardCachedMaster(); /* Don't try a PSYNC. */
    // 链式主从复制删除
    freeReplicationBacklog(); /* Don't allow our chained slaves to PSYNC. */
    // 断开正在连接slave请求
    cancelReplicationHandshake();
    server.repl_state = REPL_STATE_CONNECT;
    server.master_repl_offset = 0;
    server.repl_down_since = 0;
}
// 取消master关联
/* Cancel replication, setting the instance as a master itself. */
void replicationUnsetMaster(void) {
    if (server.masterhost == NULL) return; /* Nothing to do. */
    sdsfree(server.masterhost);
    server.masterhost = NULL;
    if (server.master) {
        if (listLength(server.slaves) == 0) {
            /* If this instance is turned into a master and there are no
             * slaves, it inherits the replication offset from the master.
             * Under certain conditions this makes replicas comparable by
             * replication offset to understand what is the most updated. */
            server.master_repl_offset = server.master->reploff;
            freeReplicationBacklog();
        }
        freeClient(server.master);
    }
    replicationDiscardCachedMaster();
    cancelReplicationHandshake();
    server.repl_state = REPL_STATE_NONE;
}

// blocked.c, 解除所有的阻塞客户端
/* Mass-unblock clients because something changed in the instance that makes
 * blocking no longer safe. For example clients blocked in list operations
 * in an instance which turns from master to slave is unsafe, so this function
 * is called when a master turns into a slave.
 *
 * The semantics is to send an -UNBLOCKED error to the client, disconnecting
 * it at the same time. */
void disconnectAllBlockedClients(void) {
    listNode *ln;
    listIter li;

    listRewind(server.clients,&li);
    while((ln = listNext(&li))) {
        client *c = listNodeValue(ln);

        if (c->flags & CLIENT_BLOCKED) {
            addReplySds(c,sdsnew(
                "-UNBLOCKED force unblock from blocking operation, "
                "instance state changed (master -> slave?)\r\n"));
            unblockClient(c);
            c->flags |= CLIENT_CLOSE_AFTER_REPLY;
        }
    }
}
// networking.c, 断开所有的 slave 连接
/* Close all the slaves connections. This is useful in chained replication
 * when we resync with our own master and want to force all our slaves to
 * resync with us as well. */
void disconnectSlaves(void) {
    while (listLength(server.slaves)) {
        listNode *ln = listFirst(server.slaves);
        freeClient((client*)ln->value);
    }
}
// replication.c
/* Free a cached master, called when there are no longer the conditions for
 * a partial resync on reconnection. */
void replicationDiscardCachedMaster(void) {
    if (server.cached_master == NULL) return;

    serverLog(LL_NOTICE,"Discarding previously cached master state.");
    server.cached_master->flags &= ~CLIENT_MASTER;
    freeClient(server.cached_master);
    server.cached_master = NULL;
}
// replication.c
void freeReplicationBacklog(void) {
    serverAssert(listLength(server.slaves) == 0);
    zfree(server.repl_backlog);
    server.repl_backlog = NULL;
}
// replication.c
/* This function aborts a non blocking replication attempt if there is one
 * in progress, by canceling the non-blocking connect attempt or
 * the initial bulk transfer.
 *
 * If there was a replication handshake in progress 1 is returned and
 * the replication state (server.repl_state) set to REPL_STATE_CONNECT.
 *
 * Otherwise zero is returned and no operation is perforemd at all. */
int cancelReplicationHandshake(void) {
    if (server.repl_state == REPL_STATE_TRANSFER) {
        replicationAbortSyncTransfer();
        server.repl_state = REPL_STATE_CONNECT;
    } else if (server.repl_state == REPL_STATE_CONNECTING ||
               slaveIsInHandshakeState())
    {
        undoConnectWithMaster();
        server.repl_state = REPL_STATE_CONNECT;
    } else {
        return 0;
    }
    return 1;
}

// networking.c
/* Concatenate a string representing the state of a client in an human
 * readable format, into the sds string 's'. */
sds catClientInfoString(sds s, client *client) {
    char flags[16], events[3], *p;
    int emask;

    p = flags;
    if (client->flags & CLIENT_SLAVE) {
        if (client->flags & CLIENT_MONITOR)
            *p++ = 'O';
        else
            *p++ = 'S';
    }
    if (client->flags & CLIENT_MASTER) *p++ = 'M';
    if (client->flags & CLIENT_MULTI) *p++ = 'x';
    if (client->flags & CLIENT_BLOCKED) *p++ = 'b';
    if (client->flags & CLIENT_DIRTY_CAS) *p++ = 'd';
    if (client->flags & CLIENT_CLOSE_AFTER_REPLY) *p++ = 'c';
    if (client->flags & CLIENT_UNBLOCKED) *p++ = 'u';
    if (client->flags & CLIENT_CLOSE_ASAP) *p++ = 'A';
    if (client->flags & CLIENT_UNIX_SOCKET) *p++ = 'U';
    if (client->flags & CLIENT_READONLY) *p++ = 'r';
    if (p == flags) *p++ = 'N';
    *p++ = '\0';

    emask = client->fd == -1 ? 0 : aeGetFileEvents(server.el,client->fd);
    p = events;
    if (emask & AE_READABLE) *p++ = 'r';
    if (emask & AE_WRITABLE) *p++ = 'w';
    *p = '\0';
    // 可变参数定义: sds sdscatfmt(sds s, char const *fmt, ...) 
    return sdscatfmt(s,
        "id=%U addr=%s fd=%i name=%s age=%I idle=%I flags=%s db=%i sub=%i psub=%i multi=%i qbuf=%U qbuf-free=%U obl=%U oll=%U omem=%U events=%s cmd=%s",
        (unsigned long long) client->id,
        getClientPeerId(client),
        client->fd,
        client->name ? (char*)client->name->ptr : "",
        (long long)(server.unixtime - client->ctime),
        (long long)(server.unixtime - client->lastinteraction),
        flags,
        client->db->id,
        (int) dictSize(client->pubsub_channels),
        (int) listLength(client->pubsub_patterns),
        (client->flags & CLIENT_MULTI) ? client->mstate.count : -1,
        (unsigned long long) sdslen(client->querybuf),
        (unsigned long long) sdsavail(client->querybuf),
        (unsigned long long) client->bufpos,
        (unsigned long long) listLength(client->reply),
        (unsigned long long) getClientOutputBufferMemoryUsage(client),
        events,
        client->lastcmd ? client->lastcmd->name : "NULL");
}

所以，slaveof 只是做简单的验证，然后设置了下 master 信息，然后就返回了。那么是谁在做同步的工作呢？

　　其实同步任务是由 cron 任务运行的。

3.2. 如何执行同步任务？

　　因为复制是比较耗性能的东西，如果和用户线程共享处理过程的话，将可能引起并发性能的。所以，redis使用异步 cron 任务的形式实现主从复制功能。

// server.c, 初始化server，注册 cron 
void initServer(void) {
    ...
    /* Create out timers, that's our main way to process background
     * operations. */
    // 添加 serverCron 到 eventLoop 中，以便后续可以执行定时脚本
    if (aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL) == AE_ERR) {
        serverPanic("Can't create event loop timers.");
        exit(1);
    }
    ...
}

// ae.c, 添加时间事件
long long aeCreateTimeEvent(aeEventLoop *eventLoop, long long milliseconds,
        aeTimeProc *proc, void *clientData,
        aeEventFinalizerProc *finalizerProc)
{
    long long id = eventLoop->timeEventNextId++;
    aeTimeEvent *te;

    te = zmalloc(sizeof(*te));
    if (te == NULL) return AE_ERR;
    te->id = id;
    aeAddMillisecondsToNow(milliseconds,&te->when_sec,&te->when_ms);
    te->timeProc = proc;
    te->finalizerProc = finalizerProc;
    te->clientData = clientData;
    te->next = eventLoop->timeEventHead;
    eventLoop->timeEventHead = te;
    return id;
}
    
// server.c, 主脚本运行入口, 每1秒运行1次
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
    ...
    /* Replication cron function -- used to reconnect to master and
     * to detect transfer failures. */
    // 主从复制，连接 master，我们的入口
    run_with_period(1000) replicationCron();
    ...
    server.cronloops++;
    return 1000/server.hz;
}

// 重点入口: replicationCron()
// replication.c, 主从复制定时脚本
/* Replication cron function, called 1 time per second. */
void replicationCron(void) {
    static long long replication_cron_loops = 0;

    /* Non blocking connection timeout? */
    // 连接超时处理，取消重连
    if (server.masterhost &&
        (server.repl_state == REPL_STATE_CONNECTING ||
         slaveIsInHandshakeState()) &&
         (time(NULL)-server.repl_transfer_lastio) > server.repl_timeout)
    {
        serverLog(LL_WARNING,"Timeout connecting to the MASTER...");
        cancelReplicationHandshake();
    }

    /* Bulk transfer I/O timeout? */
    // 传输数据超时，取消重连
    if (server.masterhost && server.repl_state == REPL_STATE_TRANSFER &&
        (time(NULL)-server.repl_transfer_lastio) > server.repl_timeout)
    {
        serverLog(LL_WARNING,"Timeout receiving bulk data from MASTER... If the problem persists try to set the 'repl-timeout' parameter in redis.conf to a larger value.");
        cancelReplicationHandshake();
    }

    /* Timed out master when we are an already connected slave? */
    // slave 会话超时
    if (server.masterhost && server.repl_state == REPL_STATE_CONNECTED &&
        (time(NULL)-server.master->lastinteraction) > server.repl_timeout)
    {
        serverLog(LL_WARNING,"MASTER timeout: no data nor PING received...");
        freeClient(server.master);
    }

    /* Check if we should connect to a MASTER */
    // 3.2.1. 初次设置master时，一定会进行连接处理
    if (server.repl_state == REPL_STATE_CONNECT) {
        serverLog(LL_NOTICE,"Connecting to MASTER %s:%d",
            server.masterhost, server.masterport);
        if (connectWithMaster() == C_OK) {
            serverLog(LL_NOTICE,"MASTER <-> SLAVE sync started");
        }
    }

    /* Send ACK to master from time to time.
     * Note that we do not send periodic acks to masters that don't
     * support PSYNC and replication offsets. */
    // 3.2.2. 每次定时任务执行，都会发生 ACK 给master
    if (server.masterhost && server.master &&
        !(server.master->flags & CLIENT_PRE_PSYNC))
        replicationSendAck();

    /* If we have attached slaves, PING them from time to time.
     * So slaves can implement an explicit timeout to masters, and will
     * be able to detect a link disconnection even if the TCP connection
     * will not actually go down. */
    listIter li;
    listNode *ln;
    robj *ping_argv[1];

    /* First, send PING according to ping_slave_period. */
    // 3.2.3. 发送 PING 请求
    // 默认 repl_ping_slave_period: 10
    if ((replication_cron_loops % server.repl_ping_slave_period) == 0) {
        ping_argv[0] = createStringObject("PING",4);
        replicationFeedSlaves(server.slaves, server.slaveseldb,
            ping_argv, 1);
        decrRefCount(ping_argv[0]);
    }

    /* Second, send a newline to all the slaves in pre-synchronization
     * stage, that is, slaves waiting for the master to create the RDB file.
     * The newline will be ignored by the slave but will refresh the
     * last-io timer preventing a timeout. In this case we ignore the
     * ping period and refresh the connection once per second since certain
     * timeouts are set at a few seconds (example: PSYNC response). */
    // 3.2.4. 向以当前节点为master的slaves 发送空行数据
    listRewind(server.slaves,&li);
    while((ln = listNext(&li))) {
        client *slave = ln->value;

        if (slave->replstate == SLAVE_STATE_WAIT_BGSAVE_START ||
            (slave->replstate == SLAVE_STATE_WAIT_BGSAVE_END &&
             server.rdb_child_type != RDB_CHILD_TYPE_SOCKET))
        {
            if (write(slave->fd, "\n", 1) == -1) {
                /* Don't worry, it's just a ping. */
            }
        }
    }

    /* Disconnect timedout slaves. */
    // 断开连接超时的 slaves
    if (listLength(server.slaves)) {
        listIter li;
        listNode *ln;

        listRewind(server.slaves,&li);
        while((ln = listNext(&li))) {
            client *slave = ln->value;

            if (slave->replstate != SLAVE_STATE_ONLINE) continue;
            if (slave->flags & CLIENT_PRE_PSYNC) continue;
            if ((server.unixtime - slave->repl_ack_time) > server.repl_timeout)
            {
                serverLog(LL_WARNING, "Disconnecting timedout slave: %s",
                    replicationGetSlaveName(slave));
                freeClient(slave);
            }
        }
    }

    /* If we have no attached slaves and there is a replication backlog
     * using memory, free it after some (configured) time. */
    // 如果没有slave 跟随当前节点，一段时间后将backlog 释放掉
    if (listLength(server.slaves) == 0 && server.repl_backlog_time_limit &&
        server.repl_backlog)
    {
        time_t idle = server.unixtime - server.repl_no_slaves_since;

        if (idle > server.repl_backlog_time_limit) {