redis的主从同步用于实现高可用,并且可以实现读写分离,用于抗大量读请求。通过两种方式指定一个slave从master复制:
- 在配置文件中配置:slaveof <masterip> <masterport>
- 或者在客户端中执行命令:slaveof <master> <masterport>
这两种方式对应的操作很简单,都是将server.repl_state设置为REDIS_REPL_CONNECT,即将slave设置成同步的初始状态,并设置server的masterhost和masterport字段为对应master的ip和端口。
在主从同步过程中,实际上master端和slave端都是一个状态机,整个过程不是在一个函数中执行的,被事件循环打断成多次的函数调用,在函数中会根据当前状态执行相应的操作。
master端的状态有:
/* Slave replication state - from the point of view of the master.
* In SEND_BULK and ONLINE state the slave receives new updates
* in its output queue. In the WAIT_BGSAVE state instead the server is waiting
* to start the next background saving in order to send updates to it. */
#define REDIS_REPL_WAIT_BGSAVE_START 6 /* We need to produce a new RDB file. */
#define REDIS_REPL_WAIT_BGSAVE_END 7 /* Waiting RDB file creation to finish. */
#define REDIS_REPL_SEND_BULK 8 /* Sending RDB file to slave. */
#define REDIS_REPL_ONLINE 9 /* RDB file transmitted, sending just updates. */
slave端的状态:
/* Slave replication state - from the point of view of the slave. */
#define REDIS_REPL_NONE 0 /* No active replication */
#define REDIS_REPL_CONNECT 1 /* Must connect to master */
#define REDIS_REPL_CONNECTING 2 /* Connecting to master */
#define REDIS_REPL_RECEIVE_PONG 3 /* Wait for PING reply */
#define REDIS_REPL_TRANSFER 4 /* Receiving .rdb from master */
#define REDIS_REPL_CONNECTED 5 /* Connected to master */
在serverCron(默认10ms执行一次)中,会执行replicationCron函数用于维护同步的状态。
slave与master交互的时序图如下,由于交互的状态很复杂,这里只描述第一次同步成功的过程:
/* Replication cron function -- used to reconnect to master and
* to detect transfer failures. */
run_with_period(1000) replicationCron();
在serverCron函数中,会近似以大于1秒的时间间隔执行replicationCron函数,用来维护同步关系。主从同步很复杂,这里我们先看一下同步过程,然后再介绍维护同步过程的其他逻辑。
1. slave发起连接
在replicationCron函数中:
/* Check if we should connect to a MASTER */
if (server.repl_state == REDIS_REPL_CONNECT) {
redisLog(REDIS_NOTICE,"Connecting to MASTER %s:%d",
server.masterhost, server.masterport);
if (connectWithMaster() == REDIS_OK) {
redisLog(REDIS_NOTICE,"MASTER <-> SLAVE sync started");
}
}
如果slave的同步状态是REDIS_REPL_CONNECT,会调用connectWithMaster()函数发起对master的连接。
int connectWithMaster(void) {
int fd;
fd = anetTcpNonBlockConnect(NULL,server.masterhost,server.masterport);
if (fd == -1) {
redisLog(REDIS_WARNING,"Unable to connect to MASTER: %s",
strerror(errno));
return REDIS_ERR;
}
if (aeCreateFileEvent(server.el,fd,AE_READABLE|AE_WRITABLE,syncWithMaster,NULL) ==
AE_ERR)
{
close(fd);
redisLog(REDIS_WARNING,"Can't create readable event for SYNC");
return REDIS_ERR;
}
server.repl_transfer_lastio = server.unixtime;
server.repl_transfer_s = fd;
server.repl_state = REDIS_REPL_CONNECTING;
return REDIS_OK;
}
这个函数主要完成以下几件事:
- 非阻塞的调用connect,连接master
- 对该连接socket注册读、写事件的处理函数syncWithMaster
- 更新server.repl_transfer_lastio,避免超时,被干掉
- server.repl_transfer_s表示连接的socket
- 更新同步状态为REDIS_REPL_CONNECTING
当连接建立成功后,会回调syncWithMaster,接下来看一下这个函数。
char tmpfile[256], *err;
int dfd, maxtries = 5;
int sockerr = 0, psync_result;
socklen_t errlen = sizeof(sockerr);
REDIS_NOTUSED(el);
REDIS_NOTUSED(privdata);
REDIS_NOTUSED(mask);
/* If this event fired after the user turned the instance into a master
* with SLAVEOF NO ONE we must just return ASAP. */
if (server.repl_state == REDIS_REPL_NONE) {
close(fd);
return;
}
声明各种变量,并检查当前的同步状态。
/* Check for errors in the socket. */
if (getsockopt(fd, SOL_SOCKET, SO_ERROR, &sockerr, &errlen) == -1)
sockerr = errno;
if (sockerr) {
aeDeleteFileEvent(server.el,fd,AE_READABLE|AE_WRITABLE);
redisLog(REDIS_WARNING,"Error condition on socket for SYNC: %s",
strerror(sockerr));
goto error;
}
检查socket错误
/* If we were connecting, it's time to send a non blocking PING, we want to
* make sure the master is able to reply before going into the actual
* replication process where we have long timeouts in the order of
* seconds (in the meantime the slave would block). */
if (server.repl_state == REDIS_REPL_CONNECTING) {
redisLog(REDIS_NOTICE,"Non blocking connect for SYNC fired the event.");
/* Delete the writable event so that the readable event remains
* registered and we can wait for the PONG reply. */
aeDeleteFileEvent(server.el,fd,AE_WRITABLE);
server.repl_state = REDIS_REPL_RECEIVE_PONG;
/* Send the PING, don't check for errors at all, we have the timeout
* that will take care about this. */
syncWrite(fd,"PING\r\n",6,100);
return;
}
这个函数在建立连接的过程中,会被回调多次,根据当前的同步状态进行一些处理。如果处于REDIS_REPL_CONNECTING状态,即刚发起连接。这里会将连接socket的写事件处理函数删除掉,同时更新同步状态为REDIS_REPL_RECEIVE_PONG,然后同步的发送PING,并返回。
PING master主要是为了double check一下,保证接下来执行一些阻塞操作时master是活跃的。
/* Receive the PONG command. */
if (server.repl_state == REDIS_REPL_RECEIVE_PONG) {
char buf[1024];
/* Delete the readable event, we no longer need it now that there is
* the PING reply to read. */
aeDeleteFileEvent(server.el,fd,AE_READABLE);
/* Read the reply with explicit timeout. */
buf[0] = '\0';
if (syncReadLine(fd,buf,sizeof(buf),
server.repl_syncio_timeout*1000) == -1)
{
redisLog(REDIS_WARNING,
"I/O error reading PING reply from master: %s",
strerror(errno));
goto error;
}
/* We accept only two replies as valid, a positive +PONG reply
* (we just check for "+") or an authentication error.
* Note that older versions of Redis replied with "operation not
* permitted" instead of using a proper error code, so we test
* both. */
if (buf[0] != '+' &&
strncmp(buf,"-NOAUTH",7) != 0 &&
strncmp(buf,"-ERR operation not permitted",28) != 0)
{
redisLog(REDIS_WARNING,"Error reply to PING from master: '%s'",buf);
goto error;
} else {
redisLog(REDIS_NOTICE,
"Master replied to PING, replication can continue...");
}
}
如果同步状态为REDIS_REPL_RECEIVE_PONG,即刚发送PING命令,准备接受PONG。首先删除连接socket的读事件处理函数,同步读取响应。可能读到的正确响应是:+PONG或验证失败错误。如果读到了正确的响应,就继续进行操作。
/* AUTH with the master if required. */
if(server.masterauth) {
err = sendSynchronousCommand(fd,"AUTH",server.masterauth,NULL);
if (err[0] == '-') {
redisLog(REDIS_WARNING,"Unable to AUTH to MASTER: %s",err);
sdsfree(err);
goto error;
}
sdsfree(err);
}
如果配置了server.masterauth,则同步发送验证命令,并等待响应,如果验证失败,则退出。
/* Set the slave port, so that Master's INFO command can list the
* slave listening port correctly. */
{
sds port = sdsfromlonglong(server.port);
err = sendSynchronousCommand(fd,"REPLCONF","listening-port",port,
NULL);
sdsfree(port);
/* Ignore the error if any, not all the Redis versions support
* REPLCONF listening-port. */
if (err[0] == '-') {
redisLog(REDIS_NOTICE,"(Non critical) Master does not understand REPLCONF listening-port: %s", err);
}
sdsfree(err);
}
同步发送REPLCONF命令,将slave的端口号发送给master,用于设置master端INFO命令显示slave的信息。
2. slave开始同步
上面就是slave发起同步过程前的准备工作,接下来进入正式的发起同步请求操作。
在redis2.8中,引入了半同步(partial resynchonization),支持在主从同步断开,重新连接后不需要进行全同步,可以避免master做RDB、主从传输RDB、slave加载RDB等重量级操作。在master端会有一个buffer存储back log,buffer的大小可以配置,slave在发起同步请求时,会附带masterid和back log offset,master在接收到请求后,会根据其runid做校验,并检查对应的offset是否在buffer内。如果两个检验都通过的话,就可以执行半同步,直接从offset开始发送back log。如果没有满足上述两个条件,就需要执行一次全同步。下面看一下具体代码。
/* Try a partial resynchonization. If we don't have a cached master
* slaveTryPartialResynchronization() will at least try to use PSYNC
* to start a full resynchronization so that we get the master run id
* and the global offset, to try a partial resync at the next
* reconnection attempt. */
psync_result = slaveTryPartialResynchronization(fd);
调用slaveTryPartialResynchronization()函数尝试进行半同步。
char *psync_runid;
char psync_offset[32];
sds reply;
/* Initially set repl_master_initial_offset to -1 to mark the current
* master run_id and offset as not valid. Later if we'll be able to do
* a FULL resync using the PSYNC command we'll set the offset at the
* right value, so that this information will be propagated to the
* client structure representing the master into server.master. */
server.repl_master_initial_offset = -1;
具体见注释
if (server.cached_master) {
psync_runid = server.cached_master->replrunid;
snprintf(psync_offset,sizeof(psync_offset),"%lld", server.cached_master->reploff+1);
redisLog(REDIS_NOTICE,"Trying a partial resynchronization (request %s:%s).", psync_runid, psync_offset);
} else {
redisLog(REDIS_NOTICE,"