环境说明:redis源码版本 5.0.3;我在阅读源码过程做了注释,git地址:https://gitee.com/xiaoangg/redis_annotation
如有错误欢迎指正
参考书籍:《redis的设计与实现》
文章推荐:
redis源码阅读-一--sds简单动态字符串
redis源码阅读--二-链表
redis源码阅读--三-redis散列表的实现
redis源码浅析--四-redis跳跃表的实现
redis源码浅析--五-整数集合的实现
redis源码浅析--六-压缩列表
redis源码浅析--七-redisObject对象(下)(内存回收、共享)
redis源码浅析--八-数据库的实现
redis源码浅析--九-RDB持久化
redis源码浅析--十-AOF(append only file)持久化
redis源码浅析--十一.事件(上)文件事件
redis源码浅析--十一.事件(下)时间事件
redis源码浅析--十二.单机数据库的实现-客户端
redis源码浅析--十三.单机数据库的实现-服务端 - 时间事件
redis源码浅析--十三.单机数据库的实现-服务端 - redis服务器的初始化
redis源码浅析--十四.多机数据库的实现(一)--新老版本复制功能的区别与实现原理
redis源码浅析--十四.多机数据库的实现(二)--复制的实现SLAVEOF、PSYNY
redis源码浅析--十五.哨兵sentinel的设计与实现
redis源码浅析--十六.cluster集群的设计与实现
redis源码浅析--十七.发布与订阅的实现
redis源码浅析--十八.事务的实现
redis源码浅析--十九.排序的实现
redis源码浅析--二十.BIT MAP的实现
redis源码浅析--二十一.慢查询日志的实现
redis源码浅析--二十二.监视器的实现
推荐阅读
复制的搭建:https://blog.csdn.net/qq_16399991/article/details/99881319
复制的实现原理:https://blog.csdn.net/qq_16399991/article/details/109748991
一.复制的实现
1.1设置主服务器的地址和端口
通过向从服务器发送SLAVE命令,可以让一个从服务器去复制一个主服务器;
#复制主服务 127.0.0.1 6379端口
SLAVEOF 127.0.0.1 6379
slaveof要做的主要是给“从服务”设置的“主服务”地址和端口,会保存到从服务器的masterhost和masterport属性中;
slaveof是一个异步命令,完成设置后,会给客户端返回OK; 实际复制工作将在OK返回后真正开始执行;
slaveof命令的入口位于 replication.c/replicaofCommand:
/**
* 复制命令的实现
* slaveof
* replicaof
*/
void replicaofCommand(client *c) {
/**
* 集群模式下禁止使用复制功能
*/
/* SLAVEOF is not allowed in cluster mode as replication is automatically
* configured using the current address of the master node. */
if (server.cluster_enabled) {
addReplyError(c,"REPLICAOF not allowed in cluster mode.");
return;
}
/* The special host/port combination "NO" "ONE" turns the instance
* into a master. Otherwise the new master address is set. */
if (!strcasecmp(c->argv[1]->ptr,"no") &&
!strcasecmp(c->argv[2]->ptr,"one")) { // slaveof no one 命令断开复制
if (server.masterhost) {
replicationUnsetMaster();
sds client = catClientInfoString(sdsempty(),c);
serverLog(LL_NOTICE,"MASTER MODE enabled (user request from '%s')",
client);
sdsfree(client);
}
} else {
long port;
//读取参数中master的端口号
if ((getLongFromObjectOrReply(c, c->argv[2], &port, NULL) != C_OK))
return;
/**
* 检查我们是否已经连接到指定的主服务器
* 如果是,回复错误信息
*/
/* Check if we are already attached to the specified slave */
if (server.masterhost && !strcasecmp(server.masterhost,c->argv[1]->ptr)
&& server.masterport == port) {
serverLog(LL_NOTICE,"REPLICAOF would result into synchronization with the master we are already connected with. No operation performed.");
addReplySds(c,sdsnew("+OK Already connected to specified master\r\n"));
return;
}
/**
* 没有以前的主控形状或用户指定了其他主控形状,继续后续操作。
* (这函数将主服务的host和port属性存储到masterhost和masterport属性中 )
*/
/* There was no previous master or the user specified a different one,
* we can continue. */
replicationSetMaster(c->argv[1]->ptr, port);
//获取客户端信息; catClientInfoString是以字符串格式 获取客户端信息
sds client = catClientInfoString(sdsempty(),c);
serverLog(LL_NOTICE,"REPLICAOF %s:%d enabled (user request from '%s')",
server.masterhost, server.masterport, client);
sdsfree(client);
}
//回复ok
addReply(c,shared.ok);
}
1.2 建立套接字连接
slaveof命令执行完毕后,从服务器会根据设置的ip地址和端口连接到主服务;代码入口位于 server.c/serverCron > replication.c/replicationCron > replication.c/connectWithMaster ;
如果从服务和主服务器连接成功,从服务器会给这个套接字关联一个处理复制工作的文件处理器。这个处理器位于replication.c/syncWithMaster ;
replication.c/syncWithMaster用于处理 连接成功后的 复制工作 ; 例如接受RDB文件,接受主服务器传播来的写命令等;
replication.c/connectWithMaster如下:
/**
* slave 连接到 master服务器
*/
int connectWithMaster(void) {
int fd;
//尽最大努力连接到master
fd = anetTcpNonBlockBestEffortBindConnect(NULL,
server.masterhost,server.masterport,NET_FIRST_BIND_ADDR);
if (fd == -1) {
serverLog(LL_WARNING,"Unable to connect to MASTER: %s",
strerror(errno));
return C_ERR;
}
//关联用于处理复制工作的处理器
if (aeCreateFileEvent(server.el,fd,AE_READABLE|AE_WRITABLE,syncWithMaster,NULL) ==
AE_ERR)
{
close(fd);
serverLog(LL_WARNING,"Can't create readable event for SYNC");
return C_ERR;
}
server.repl_transfer_lastio = server.unixtime;
server.repl_transfer_s = fd;
server.repl_state = REPL_STATE_CONNECTING;
return C_OK;
}
1.3发送PING命令
从服务器连接主服务器后,做的第一件事就是向主服务发送一个PING命令;
- 通过ping命令可以检查套接字读写状态是否正常;
- 通过ping命令可以检查服务器是否正常;
void syncWithMaster(void) {
//......
//发送ping命令到master 检查master是否成成
/* Send a PING to check the master is able to reply without errors. */
if (server.repl_state == REPL_STATE_CONNECTING) {
serverLog(LL_NOTICE,"Non blocking connect for SYNC fired the event.");
/* Delete the writable event so that the readable event remains
* registered and we can wait for the PONG reply. */
aeDeleteFileEvent(server.el,fd,AE_WRITABLE);
server.repl_state = REPL_STATE_RECEIVE_PONG;
/* Send the PING, don't check for errors at all, we have the timeout
* that will take care about this. */
err = sendSynchronousCommand(SYNC_CMD_WRITE,fd,"PING",NULL);
if (err) goto write_error;
return;
}
//等待ping命令返回
/* Receive the PONG command. */
if (server.repl_state == REPL_STATE_RECEIVE_PONG) {
err = sendSynchronousCommand(SYNC_CMD_READ,fd,NULL);
/* We accept only two replies as valid, a positive +PONG reply
* (we just check for "+") or an authentication error.
* Note that older versions of Redis replied with "operation not
* permitted" instead of using a proper error code, so we test
* both. */
if (err[0] != '+' &&
strncmp(err,"-NOAUTH",7) != 0 &&
strncmp(err,"-ERR operation not permitted",28) != 0)
{
serverLog(LL_WARNING,"Error reply to PING from master: '%s'",err);
sdsfree(err);
goto error;
} else {
serverLog(LL_NOTICE,
"Master replied to PING, replication can continue...");
}
sdsfree(err);
server.repl_state = REPL_STATE_SEND_AUTH;
}
//.....
}
1.4 身份验证
从服务器收到pong返回后,下一步就是进行身份验证
如果服务器设置了masterauth,那么就想master发送AUTH命令进行身份验证;
slave和master之间AUTH交互流程如下:
1.5 发送端口信息
身份验证后,从服务将执行REPLCONF listen-port {port-number},向主服务器发送从服务监听的端口号;
主服务器接受到这个命令后,将从服务的端口号记录到客户端状态中的 slave_listening_port属性中;
void syncWithMaster(aeEventLoop *el, int fd, void *privdata, int mask) {
//.......
//向master 发送REPLCONF命令,设置从服务的端口号
/* Set the slave port, so that Master's INFO command can list the
* slave listening port correctly. */
if (server.repl_state == REPL_STATE_SEND_PORT) {
sds port = sdsfromlonglong(server.slave_announce_port ?
server.slave_announce_port : server.port);
err = sendSynchronousCommand(SYNC_CMD_WRITE,fd,"REPLCONF",
"listening-port",port, NULL);
sdsfree(port);
if (err) goto write_error;
sdsfree(err);
server.repl_state = REPL_STATE_RECEIVE_PORT;
return;
}
/* Receive REPLCONF listening-port reply. */
if (server.repl_state == REPL_STATE_RECEIVE_PORT) {
err = sendSynchronousCommand(SYNC_CMD_READ,fd,NULL);
/* Ignore the error if any, not all the Redis versions support
* REPLCONF listening-port. */
if (err[0] == '-') {
serverLog(LL_NOTICE,"(Non critical) Master does not understand "
"REPLCONF listening-port: %s", err);
}
sdsfree(err);
server.repl_state = REPL_STATE_SEND_IP;
}
//......
}
1.6 同步 & 命令传播
这一步从服务器向主服务器发送PSYNC命令;
同步完成后,主服务进入命令传播阶段,主服务器将自己执行的写命令发送给所有的从服务器,从服务执行接收到的写命令;
二 PSYNC命令的实现
PSYNC命令调用方式有两种:
- PSYNC ? -1 全量复制
当从服务没有复制过主服务器,或者从服务执行过SLAVEOF NO ONE命令(取消复制),那么从服务将发送PSYNC ?-1命令;
- PSYNC <runid> <offset> 部分复制
从服务已经复制过主服务器,那么从服务将向主服务器发送PSYNC <runid> <offset>, runid是主服务器的id,offset服务器当前的偏移量;
主服务器接受到PSYNC <runid> <offset> 命令后,主服务会判断是否能“部分同步”,向从服务回复相应的命令;
主服务向从服务的三种回复:
- +FULLRESYNC <runid> <offset> 执行完全重同步;
- +CONTINUE 执行部分重同步;
- -ERR 不支持psync同步操作,从服务将发送sync命令到主服务器;
psync命令的实现入口位于replication.c/syncCommand;
/**
* 这个函数用来处理psync命令
*
* 成功返回 c_ok。否侧返回c_err,并且执行完全同步
*/
/* This function handles the PSYNC command from the point of view of a
* master receiving a request for partial resynchronization.
*
* On success return C_OK, otherwise C_ERR is returned and we proceed
* with the usual full resync. */
int masterTryPartialResynchronization(client *c) {
//psync_offset :命令行录入的偏移量
long long psync_offset, psync_len;
//要复制的master的run id
char *master_replid = c->argv[1]->ptr;
char buf[128];
int buflen;
/**
* (
* psync 命令语法 PSYNC <MASTER_RUN_ID> <OFFSET>
* psync ? -1 将会全量复制
* )
* 解析slave请求的复制偏移量。
* 如果解析失败,则执行完成同步,(我们应该杜绝这种情况发生,但是我们应该用鲁棒的代码防止宕机)
*/
/* Parse the replication offset asked by the slave. Go to full sync
* on parse error: this should never happen but we try to handle
* it in a robust way compared to aborting. */
if (getLongLongFromObjectOrReply(c,c->argv[2],&psync_offset,NULL) !=
C_OK) goto need_full_resync;
/**
* 判断 PSYNC 传入的复制id 和 当前master的run id是否相同?
* 如果复制的id 更新了,这个主机具有不同的复制历史记录,就不能继续复制了
* (要全量复制)
*
* 请注意,有两个可能有效的复制标识:ID1和ID2。但是,ID2仅在特定偏移量下有效。
*
*/
/* Is the replication ID of this master the same advertised by the wannabe
* slave via PSYNC? If the replication ID changed this master has a
* different replication history, and there is no way to continue.
*
* Note that there are two potentially valid replication IDs: the ID1
* and the ID2. The ID2 however is only valid up to a specific offset. */
if (strcasecmp(master_replid, server.replid) && //“传入的复制id 和 当前master的run id不相同” 并且
(strcasecmp(master_replid, server.replid2) ||
psync_offset > server.second_replid_offset)) // 或者超出的复制缓冲区的范围,则要触发全量复制
{
/* Run id "?" is used by slaves that want to force a full resync. */
if (master_replid[0] != '?') {
if (strcasecmp(master_replid, server.replid) &&
strcasecmp(master_replid, server.replid2)) //psync 命令传入的id和 当前server存的复制id 不同
{
serverLog(LL_NOTICE,"Partial resynchronization not accepted: "
"Replication ID mismatch (Replica asked for '%s', my "
"replication IDs are '%s' and '%s')",
master_replid, server.replid, server.replid2);
} else {
serverLog(LL_NOTICE,"Partial resynchronization not accepted: "
"Requested offset for second ID was %lld, but I can reply "
"up to %lld", psync_offset, server.second_replid_offset);
}
} else {
serverLog(LL_NOTICE,"Full resync requested by replica %s",
replicationGetSlaveName(c));
}
goto need_full_resync;
}
//复制积压缓冲去 是否有我们要的数据 。没有则全量复制
/* We still have the data our slave is asking for? */
if (!server.repl_backlog ||
psync_offset < server.repl_backlog_off ||
psync_offset > (server.repl_backlog_off + server.repl_backlog_histlen)) // 要复制的偏移量不在【复制积压缓冲区中】
{
serverLog(LL_NOTICE,
"Unable to partial resync with replica %s for lack of backlog (Replica request was: %lld).", replicationGetSlaveName(c), psync_offset);
if (psync_offset > server.master_repl_offset) {
serverLog(LL_WARNING,
"Warning: replica %s tried to PSYNC with an offset that is greater than the master replication offset.", replicationGetSlaveName(c));
}
goto need_full_resync;
}
/**
* 如果可以运行到这里,则开始执行部分重同步
* 1)设置客户端状态是slave
* 2)通知客户端 我们可以“部分重同步”
* 3)发送 backlog中的数据 到slave
*/
/* If we reached this point, we are able to perform a partial resync:
* 1) Set client state to make it a slave.
* 2) Inform the client we can continue with +CONTINUE
* 3) Send the backlog data (from the offset to the end) to the slave. */
c->flags |= CLIENT_SLAVE; //标记客户端标记是 “slave”
c->replstate = SLAVE_STATE_ONLINE; //设置复制状态是 “rdb文件传输完毕”
c->repl_ack_time = server.unixtime; //复制确认时间
c->repl_put_online_on_ack = 0;
listAddNodeTail(server.slaves,c);
/**
* 我们不能使用连接缓冲区,因为它们在这个阶段用于积累新的命令。
* 但是我们确定套接字发送缓冲区是空的,所以这个写入永远不会失败。
*/
/* We can't use the connection buffers since they are used to accumulate
* new commands at this stage. But we are sure the socket send buffer is
* empty so this write will never fail actually. */
if (c->slave_capa & SLAVE_CAPA_PSYNC2) { //客户端支持psync2协议
buflen = snprintf(buf,sizeof(buf),"+CONTINUE %s\r\n", server.replid);
} else {
buflen = snprintf(buf,sizeof(buf),"+CONTINUE\r\n");
}
//发送到“CONTINUE”到客户端
if (write(c->fd,buf,buflen) != buflen) {
freeClientAsync(c);
return C_OK;
}
psync_len = addReplyReplicationBacklog(c,psync_offset);
serverLog(LL_NOTICE,
"Partial resynchronization request from %s accepted. Sending %lld bytes of backlog starting from offset %lld.",
replicationGetSlaveName(c),
psync_len, psync_offset);
/* Note that we don't need to set the selected DB at server.slaveseldb
* to -1 to force the master to emit SELECT, since the slave already
* has this state from the previous connection with the master. */
refreshGoodSlavesCount();
return C_OK; /* The caller can return, no full resync needed. */
need_full_resync:
/* We need a full resync for some reason... Note that we can't
* reply to PSYNC right now if a full SYNC is needed. The reply
* must include the master offset at the time the RDB file we transfer
* is generated, so we need to delay the reply to that moment. */
return C_ERR;
}
三 心跳检测
在命令传播阶段,从服务默认每秒一次的频率向从服务发送 REPLCONF ACK <replicaiotn_offset> ;replication_offset是当先从服务器的复制偏移量;
REPLICATION ACK的主要作用有:
- 检测与主服务的网络连接状态
主从服务器通过发送和接受REPLCONF 命令检查网络连接是否正常;
如果从服务器超过一秒没有接收到从服务的REPLCONF 命令,主服务器就知道从服务连接出了问题;
- 辅助实现min-slave选项
redis的min-slave-to-write和min-salve-max-lag可以防止主服务在不安全的情况下执行写命令;
例如 主服务的min-slave-to-write和min-salve-max-lag配置如下:
min-salve-max-lag 10
min-slave-to-write 3
那么从服务的数量少于3个,或者3个从服务的延时(lag)值大于等于10秒时,主服务都不能执行写命令;
- 检测命令丢失
如果因为网络原因,主服务传播给从服务的命令丢失了。那么当从服务向主服务器放松RELPCONF ACK 命令时,主服务会发觉 从服务 的复制偏移量少于主服务的复制偏移量;
然后主服务会将丢失的部分发送给从服务;
心跳检测的代码入口位于:server.c/serverCron > replication.c/replicationCron > replication.c/replicationSendAck
//复制定时函数
//每秒执行一次
void replicationCron(void) {
//..............
/**
* 时不时的向master发送 ACK
* 如果master不支持PSYNC和复制偏移;不发送ack
*/
/* Send ACK to master from time to time.
* Note that we do not send periodic acks to masters that don't
* support PSYNC and replication offsets. */
if (server.masterhost && server.master &&
!(server.master->flags & CLIENT_PRE_PSYNC))
replicationSendAck();
//..............
}
/* Send a REPLCONF ACK command to the master to inform it about the current
* processed offset. If we are not connected with a master, the command has
* no effects. */
void replicationSendAck(void) {
client *c = server.master;
if (c != NULL) {
c->flags |= CLIENT_MASTER_FORCE_REPLY;
addReplyMultiBulkLen(c,3);
addReplyBulkCString(c,"REPLCONF");
addReplyBulkCString(c,"ACK");
addReplyBulkLongLong(c,c->reploff);
c->flags &= ~CLIENT_MASTER_FORCE_REPLY;
}
}