redis源码分析(8)——replication

redis的主从同步用于实现高可用,并且可以实现读写分离,用于抗大量读请求。通过两种方式指定一个slave从master复制:

- 在配置文件中配置:slaveof <masterip> <masterport>

- 或者在客户端中执行命令:slaveof <master> <masterport>

这两种方式对应的操作很简单,都是将server.repl_state设置为REDIS_REPL_CONNECT,即将slave设置成同步的初始状态,并设置server的masterhost和masterport字段为对应master的ip和端口。

在主从同步过程中,实际上master端和slave端都是一个状态机,整个过程不是在一个函数中执行的,被事件循环打断成多次的函数调用,在函数中会根据当前状态执行相应的操作。

master端的状态有:

/* Slave replication state - from the point of view of the master.
 * In SEND_BULK and ONLINE state the slave receives new updates
 * in its output queue. In the WAIT_BGSAVE state instead the server is waiting
 * to start the next background saving in order to send updates to it. */
#define REDIS_REPL_WAIT_BGSAVE_START 6 /* We need to produce a new RDB file. */
#define REDIS_REPL_WAIT_BGSAVE_END 7 /* Waiting RDB file creation to finish. */
#define REDIS_REPL_SEND_BULK 8 /* Sending RDB file to slave. */
#define REDIS_REPL_ONLINE 9 /* RDB file transmitted, sending just updates. */

slave端的状态:

/* Slave replication state - from the point of view of the slave. */
#define REDIS_REPL_NONE 0 /* No active replication */
#define REDIS_REPL_CONNECT 1 /* Must connect to master */
#define REDIS_REPL_CONNECTING 2 /* Connecting to master */
#define REDIS_REPL_RECEIVE_PONG 3 /* Wait for PING reply */
#define REDIS_REPL_TRANSFER 4 /* Receiving .rdb from master */
#define REDIS_REPL_CONNECTED 5 /* Connected to master */

在serverCron(默认10ms执行一次)中,会执行replicationCron函数用于维护同步的状态。

slave与master交互的时序图如下,由于交互的状态很复杂,这里只描述第一次同步成功的过程:


整个同步过程可以分为两个步骤:一是生成并传输快照(RDB),二是传输增量backlog。下面就看一下主从同步相关的代码。

    /* Replication cron function -- used to reconnect to master and
     * to detect transfer failures. */
    run_with_period(1000) replicationCron();
在serverCron函数中,会近似以大于1秒的时间间隔执行replicationCron函数,用来维护同步关系。主从同步很复杂,这里我们先看一下同步过程,然后再介绍维护同步过程的其他逻辑。

1. slave发起连接

在replicationCron函数中:

    /* Check if we should connect to a MASTER */
    if (server.repl_state == REDIS_REPL_CONNECT) {
        redisLog(REDIS_NOTICE,"Connecting to MASTER %s:%d",
            server.masterhost, server.masterport);
        if (connectWithMaster() == REDIS_OK) {
            redisLog(REDIS_NOTICE,"MASTER <-> SLAVE sync started");
        }
    }

如果slave的同步状态是REDIS_REPL_CONNECT,会调用connectWithMaster()函数发起对master的连接。

int connectWithMaster(void) {
    int fd;

    fd = anetTcpNonBlockConnect(NULL,server.masterhost,server.masterport);
    if (fd == -1) {
        redisLog(REDIS_WARNING,"Unable to connect to MASTER: %s",
            strerror(errno));
        return REDIS_ERR;
    }

    if (aeCreateFileEvent(server.el,fd,AE_READABLE|AE_WRITABLE,syncWithMaster,NULL) ==
            AE_ERR)
    {
        close(fd);
        redisLog(REDIS_WARNING,"Can't create readable event for SYNC");
        return REDIS_ERR;
    }

    server.repl_transfer_lastio = server.unixtime;
    server.repl_transfer_s = fd;
    server.repl_state = REDIS_REPL_CONNECTING;
    return REDIS_OK;
}
这个函数主要完成以下几件事:

- 非阻塞的调用connect,连接master

- 对该连接socket注册读、写事件的处理函数syncWithMaster

- 更新server.repl_transfer_lastio,避免超时,被干掉

- server.repl_transfer_s表示连接的socket

- 更新同步状态为REDIS_REPL_CONNECTING

当连接建立成功后,会回调syncWithMaster,接下来看一下这个函数。

    char tmpfile[256], *err;
    int dfd, maxtries = 5;
    int sockerr = 0, psync_result;
    socklen_t errlen = sizeof(sockerr);
    REDIS_NOTUSED(el);
    REDIS_NOTUSED(privdata);
    REDIS_NOTUSED(mask);

    /* If this event fired after the user turned the instance into a master
     * with SLAVEOF NO ONE we must just return ASAP. */
    if (server.repl_state == REDIS_REPL_NONE) {
        close(fd);
        return;
    }
声明各种变量,并检查当前的同步状态。

    /* Check for errors in the socket. */
    if (getsockopt(fd, SOL_SOCKET, SO_ERROR, &sockerr, &errlen) == -1)
        sockerr = errno;
    if (sockerr) {
        aeDeleteFileEvent(server.el,fd,AE_READABLE|AE_WRITABLE);
        redisLog(REDIS_WARNING,"Error condition on socket for SYNC: %s",
            strerror(sockerr));
        goto error;
    }

检查socket错误

    /* If we were connecting, it's time to send a non blocking PING, we want to
     * make sure the master is able to reply before going into the actual
     * replication process where we have long timeouts in the order of
     * seconds (in the meantime the slave would block). */
    if (server.repl_state == REDIS_REPL_CONNECTING) {
        redisLog(REDIS_NOTICE,"Non blocking connect for SYNC fired the event.");
        /* Delete the writable event so that the readable event remains
         * registered and we can wait for the PONG reply. */
        aeDeleteFileEvent(server.el,fd,AE_WRITABLE);
        server.repl_state = REDIS_REPL_RECEIVE_PONG;
        /* Send the PING, don't check for errors at all, we have the timeout
         * that will take care about this. */
        syncWrite(fd,"PING\r\n",6,100);
        return;
    }

这个函数在建立连接的过程中,会被回调多次,根据当前的同步状态进行一些处理。如果处于REDIS_REPL_CONNECTING状态,即刚发起连接。这里会将连接socket的写事件处理函数删除掉,同时更新同步状态为REDIS_REPL_RECEIVE_PONG,然后同步的发送PING,并返回。

PING master主要是为了double check一下,保证接下来执行一些阻塞操作时master是活跃的。

    /* Receive the PONG command. */
    if (server.repl_state == REDIS_REPL_RECEIVE_PONG) {
        char buf[1024];

        /* Delete the readable event, we no longer need it now that there is
         * the PING reply to read. */
        aeDeleteFileEvent(server.el,fd,AE_READABLE);

        /* Read the reply with explicit timeout. */
        buf[0] = '\0';
        if (syncReadLine(fd,buf,sizeof(buf),
            server.repl_syncio_timeout*1000) == -1)
        {
            redisLog(REDIS_WARNING,
                "I/O error reading PING reply from master: %s",
                strerror(errno));
            goto error;
        }

        /* We accept only two replies as valid, a positive +PONG reply
         * (we just check for "+") or an authentication error.
         * Note that older versions of Redis replied with "operation not
         * permitted" instead of using a proper error code, so we test
         * both. */
        if (buf[0] != '+' &&
            strncmp(buf,"-NOAUTH",7) != 0 &&
            strncmp(buf,"-ERR operation not permitted",28) != 0)
        {
            redisLog(REDIS_WARNING,"Error reply to PING from master: '%s'",buf);
            goto error;
        } else {
            redisLog(REDIS_NOTICE,
                "Master replied to PING, replication can continue...");
        }
    }


如果同步状态为REDIS_REPL_RECEIVE_PONG,即刚发送PING命令,准备接受PONG。首先删除连接socket的读事件处理函数,同步读取响应。可能读到的正确响应是:+PONG或验证失败错误。如果读到了正确的响应,就继续进行操作。

    /* AUTH with the master if required. */
    if(server.masterauth) {
        err = sendSynchronousCommand(fd,"AUTH",server.masterauth,NULL);
        if (err[0] == '-') {
            redisLog(REDIS_WARNING,"Unable to AUTH to MASTER: %s",err);
            sdsfree(err);
            goto error;
        }
        sdsfree(err);
    }
如果配置了server.masterauth,则同步发送验证命令,并等待响应,如果验证失败,则退出。

    /* Set the slave port, so that Master's INFO command can list the
     * slave listening port correctly. */
    {
        sds port = sdsfromlonglong(server.port);
        err = sendSynchronousCommand(fd,"REPLCONF","listening-port",port,
                                         NULL);
        sdsfree(port);
        /* Ignore the error if any, not all the Redis versions support
         * REPLCONF listening-port. */
        if (err[0] == '-') {
            redisLog(REDIS_NOTICE,"(Non critical) Master does not understand REPLCONF listening-port: %s", err);
        }
        sdsfree(err);
    }
同步发送REPLCONF命令,将slave的端口号发送给master,用于设置master端INFO命令显示slave的信息。

2. slave开始同步

上面就是slave发起同步过程前的准备工作,接下来进入正式的发起同步请求操作。

在redis2.8中,引入了半同步(partial resynchonization),支持在主从同步断开,重新连接后不需要进行全同步,可以避免master做RDB、主从传输RDB、slave加载RDB等重量级操作。在master端会有一个buffer存储back log,buffer的大小可以配置,slave在发起同步请求时,会附带masterid和back log offset,master在接收到请求后,会根据其runid做校验,并检查对应的offset是否在buffer内。如果两个检验都通过的话,就可以执行半同步,直接从offset开始发送back log。如果没有满足上述两个条件,就需要执行一次全同步。下面看一下具体代码。

    /* Try a partial resynchonization. If we don't have a cached master
     * slaveTryPartialResynchronization() will at least try to use PSYNC
     * to start a full resynchronization so that we get the master run id
     * and the global offset, to try a partial resync at the next
     * reconnection attempt. */
    psync_result = slaveTryPartialResynchronization(fd);

调用slaveTryPartialResynchronization()函数尝试进行半同步。

    char *psync_runid;
    char psync_offset[32];
    sds reply;

    /* Initially set repl_master_initial_offset to -1 to mark the current
     * master run_id and offset as not valid. Later if we'll be able to do
     * a FULL resync using the PSYNC command we'll set the offset at the
     * right value, so that this information will be propagated to the
     * client structure representing the master into server.master. */
    server.repl_master_initial_offset = -1;

具体见注释

    if (server.cached_master) {
        psync_runid = server.cached_master->replrunid;
        snprintf(psync_offset,sizeof(psync_offset),"%lld", server.cached_master->reploff+1);
        redisLog(REDIS_NOTICE,"Trying a partial resynchronization (request %s:%s).", psync_runid, psync_offset);
    } else {
        redisLog(REDIS_NOTICE,"
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值