前面几篇我们已经完全理解了redis的基本功能的实现了。
但单靠基本功能实现,往往还是称不上优秀的项目的。毕竟,我们现在面对的都是复杂的环境,高并发的场景,大数据量的可能。
简而言之,现在的系统一般都需要支持分布式部署,不存在单点问题,才算是一个合格的系统。
而redis作为一个存储系统,单点问题肯定是不行的。
最简单的,就是起码得支持读写分离功能,因为我们面临的许多问题,一般是面对大量的查询问题。而要做到读写分离功能,就是要把主节点的数据同步到从节点上。从而可以让从节点接受读请求,以减轻主节点的读压力。
就让我们来分析下 Redis 是如何进行主从同步数据的吧!主从同步,换个名称也就是数据复制。
0. 主从复制的作用
数据冗余:主从复制实现了数据的热备份,是持久化之外的一种数据冗余方式。
故障恢复:当主节点出现问题时,可以由从节点提供服务,实现快速的故障恢复;实际上是一种服务的冗余。
负载均衡:在主从复制的基础上,配合读写分离,可以由主节点提供写服务,由从节点提供读服务(即写Redis数据时应用连接主节点,读Redis数据时应用连接从节点),分担服务器负载;尤其是在写少读多的场景下,通过多个从节点分担读负载,可以大大提高Redis服务器的并发量。
读写分离:可以用于实现读写分离,主库写、从库读,读写分离不仅可以提高服务器的负载能力,同时可根据需求的变化,改变从库的数量;
高可用基石:除了上述作用以外,主从复制还是哨兵和集群能够实施的基础,因此说主从复制是Redis高可用的基础。
1. Redis 主从复制简介
在主从复制中,数据库分为两类,一类是主库(master),另一类是同步主库数据的从库(slave)。主库可以进行读写操作,当写操作导致数据变化时会自动同步到从库。而从库一般是只读的(特定情况也可以写,通过参数slave-read-only指定),并接受来自主库的数据,一个主库可拥有多个从库,而一个从库只能有一个主库。这样就使得redis的主从架构有了两种模式:一类是一主多从如下图1,二类是“链式主从复制”--主->从->主-从如下图2。
2. Redis 主从复制的操作步骤简略说明
1. 首先,你得有至少2个redis server 实例,单机多实例或者多机多实例皆可。
2. 配置主从关系,使用 slaveof master_host master_port; (config rewrite 可直接写入配置文件,避免每次都重新写)
3. 验证主从配置,使用 info Replication;
上面的操作步骤是进行实时操作的,也可以直接将 master/slave 配置放到 redis.conf 中,启动时直接加载。
当master需要使用密码进行访问时,可以使用命令 masterauth 进行授权。
masterauth 123456 # 写到redis.conf配置文件中
config set masterauth 123456 # 通过命令行进行授权
3. 主要同步的实现原理
主从复制大致流程为:
1. slaveof 是我们的开启方法,它会将master信息写入到从节点;
2. 然后与master进行建立连接;
3. 然后master决定复制方式是全量同步还是部分同步;
4. master进行数据准备;
5. 将需要同步的发送给slave节点;
6. 从节点执行发送过来的数据;
但是,我们需要进行深入理解。
3.1. slaveof 命令源码解析
slaveof 为我们操作开启主从复制开启了入口,其接口定义如下:
{"slaveof",slaveofCommand,3,"ast",0,NULL,0,0,0,0,0},
// 用法 slaveof <master_host> <master_port> 建立主从关系
// slaveof no one 取消主从同步
// replication.c
void slaveofCommand(client *c) {
/* SLAVEOF is not allowed in cluster mode as replication is automatically
* configured using the current address of the master node. */
if (server.cluster_enabled) {
addReplyError(c,"SLAVEOF not allowed in cluster mode.");
return;
}
/* The special host/port combination "NO" "ONE" turns the instance
* into a master. Otherwise the new master address is set. */
// slaveof no one, 取消主从同步
if (!strcasecmp(c->argv[1]->ptr,"no") &&
!strcasecmp(c->argv[2]->ptr,"one")) {
if (server.masterhost) {
// 取消当前的master关联,返回客户端目前状态信息,结束
replicationUnsetMaster();
sds client = catClientInfoString(sdsempty(),c);
serverLog(LL_NOTICE,"MASTER MODE enabled (user request from '%s')",
client);
sdsfree(client);
}
} else {
long port;
if ((getLongFromObjectOrReply(c, c->argv[2], &port, NULL) != C_OK))
return;
/* Check if we are already attached to the specified slave */
// 只能和一个 master 建立主从关系
if (server.masterhost && !strcasecmp(server.masterhost,c->argv[1]->ptr)
&& server.masterport == port) {
serverLog(LL_NOTICE,"SLAVE OF would result into synchronization with the master we are already connected with. No operation performed.");
addReplySds(c,sdsnew("+OK Already connected to specified master\r\n"));
return;
}
/* There was no previous master or the user specified a different one,
* we can continue. */
// 设置master信息
replicationSetMaster(c->argv[1]->ptr, port);
// 输出client状态信息
sds client = catClientInfoString(sdsempty(),c);
serverLog(LL_NOTICE,"SLAVE OF %s:%d enabled (user request from '%s')",
server.masterhost, server.masterport, client);
sdsfree(client);
}
addReply(c,shared.ok);
}
// 绑定新的master关联
/* Set replication to the specified master address and port. */
void replicationSetMaster(char *ip, int port) {
sdsfree(server.masterhost);
server.masterhost = sdsnew(ip);
server.masterport = port;
if (server.master) freeClient(server.master);
// slave 不进行阻塞客户端
disconnectAllBlockedClients(); /* Clients blocked in master, now slave. */
// 断开所有 slave 连接
disconnectSlaves(); /* Force our slaves to resync with us as well. */
// cacheMaster 丢弃
replicationDiscardCachedMaster(); /* Don't try a PSYNC. */
// 链式主从复制删除
freeReplicationBacklog(); /* Don't allow our chained slaves to PSYNC. */
// 断开正在连接slave请求
cancelReplicationHandshake();
server.repl_state = REPL_STATE_CONNECT;
server.master_repl_offset = 0;
server.repl_down_since = 0;
}
// 取消master关联
/* Cancel replication, setting the instance as a master itself. */
void replicationUnsetMaster(void) {
if (server.masterhost == NULL) return; /* Nothing to do. */
sdsfree(server.masterhost);
server.masterhost = NULL;
if (server.master) {
if (listLength(server.slaves) == 0) {
/* If this instance is turned into a master and there are no
* slaves, it inherits the replication offset from the master.
* Under certain conditions this makes replicas comparable by
* replication offset to understand what is the most updated. */
server.master_repl_offset = server.master->reploff;
freeReplicationBacklog();
}
freeClient(server.master);
}
replicationDiscardCachedMaster();
cancelReplicationHandshake();
server.repl_state = REPL_STATE_NONE;
}
// blocked.c, 解除所有的阻塞客户端
/* Mass-unblock clients because something changed in the instance that makes
* blocking no longer safe. For example clients blocked in list operations
* in an instance which turns from master to slave is unsafe, so this function
* is called when a master turns into a slave.
*
* The semantics is to send an -UNBLOCKED error to the client, disconnecting
* it at the same time. */
void disconnectAllBlockedClients(void) {
listNode *ln;
listIter li;
listRewind(server.clients,&li);
while((ln = listNext(&li))) {
client *c = listNodeValue(ln);
if (c->flags & CLIENT_BLOCKED) {
addReplySds(c,sdsnew(
"-UNBLOCKED force unblock from blocking operation, "
"instance state changed (master -> slave?)\r\n"));
unblockClient(c);
c->flags |= CLIENT_CLOSE_AFTER_REPLY;
}
}
}
// networking.c, 断开所有的 slave 连接
/* Close all the slaves connections. This is useful in chained replication
* when we resync with our own master and want to force all our slaves to
* resync with us as well. */
void disconnectSlaves(void) {
while (listLength(server.slaves)) {
listNode *ln = listFirst(server.slaves);
freeClient((client*)ln->value);
}
}
// replication.c
/* Free a cached master, called when there are no longer the conditions for
* a partial resync on reconnection. */
void replicationDiscardCachedMaster(void) {
if (server.cached_master == NULL) return;
serverLog(LL_NOTICE,"Discarding previously cached master state.");
server.cached_master->flags &= ~CLIENT_MASTER;
freeClient(server.cached_master);
server.cached_master = NULL;
}
// replication.c
void freeReplicationBacklog(void) {
serverAssert(listLength(server.slaves) == 0);
zfree(server.repl_backlog);
server.repl_backlog = NULL;
}
// replication.c
/* This function aborts a non blocking replication attempt if there is one
* in progress, by canceling the non-blocking connect attempt or
* the initial bulk transfer.
*
* If there was a replication handshake in progress 1 is returned and
* the replication state (server.repl_state) set to REPL_STATE_CONNECT.
*
* Otherwise zero is returned and no operation is perforemd at all. */
int cancelReplicationHandshake(void) {
if (server.repl_state == REPL_STATE_TRANSFER) {
replicationAbortSyncTransfer();
server.repl_state = REPL_STATE_CONNECT;
} else if (server.repl_state == REPL_STATE_CONNECTING ||
slaveIsInHandshakeState())
{
undoConnectWithMaster();
server.repl_state = REPL_STATE_CONNECT;
} else {
return 0;
}
return 1;
}
// networking.c
/* Concatenate a string representing the state of a client in an human
* readable format, into the sds string 's'. */
sds catClientInfoString(sds s, client *client) {
char flags[16], events[3], *p;
int emask;
p = flags;
if (client->flags & CLIENT_SLAVE) {
if (client->flags & CLIENT_MONITOR)
*p++ = 'O';
else
*p++ = 'S';
}
if (client->flags & CLIENT_MASTER) *p++ = 'M';
if (client->flags & CLIENT_MULTI) *p++ = 'x';
if (client->flags & CLIENT_BLOCKED) *p++ = 'b';
if (client->flags & CLIENT_DIRTY_CAS) *p++ = 'd';
if (client->flags & CLIENT_CLOSE_AFTER_REPLY) *p++ = 'c';
if (client->flags & CLIENT_UNBLOCKED) *p++ = 'u';
if (client->flags & CLIENT_CLOSE_ASAP) *p++ = 'A';
if (client->flags & CLIENT_UNIX_SOCKET) *p++ = 'U';
if (client->flags & CLIENT_READONLY) *p++ = 'r';
if (p == flags) *p++ = 'N';
*p++ = '\0';
emask = client->fd == -1 ? 0 : aeGetFileEvents(server.el,client->fd);
p = events;
if (emask & AE_READABLE) *p++ = 'r';
if (emask & AE_WRITABLE) *p++ = 'w';
*p = '\0';
// 可变参数定义: sds sdscatfmt(sds s, char const *fmt, ...)
return sdscatfmt(s,
"id=%U addr=%s fd=%i name=%s age=%I idle=%I flags=%s db=%i sub=%i psub=%i multi=%i qbuf=%U qbuf-free=%U obl=%U oll=%U omem=%U events=%s cmd=%s",
(unsigned long long) client->id,
getClientPeerId(client),
client->fd,
client->name ? (char*)client->name->ptr : "",
(long long)(server.unixtime - client->ctime),
(long long)(server.unixtime - client->lastinteraction),
flags,
client->db->id,
(int) dictSize(client->pubsub_channels),
(int) listLength(client->pubsub_patterns),
(client->flags & CLIENT_MULTI) ? client->mstate.count : -1,
(unsigned long long) sdslen(client->querybuf),
(unsigned long long) sdsavail(client->querybuf),
(unsigned long long) client->bufpos,
(unsigned long long) listLength(client->reply),
(unsigned long long) getClientOutputBufferMemoryUsage(client),
events,
client->lastcmd ? client->lastcmd->name : "NULL");
}
所以,slaveof 只是做简单的验证,然后设置了下 master 信息,然后就返回了。那么是谁在做同步的工作呢?
其实同步任务是由 cron 任务运行的。
3.2. 如何执行同步任务?
因为复制是比较耗性能的东西,如果和用户线程共享处理过程的话,将可能引起并发性能的。所以,redis使用异步 cron 任务的形式实现主从复制功能。
// server.c, 初始化server,注册 cron
void initServer(void) {
...
/* Create out timers, that's our main way to process background
* operations. */
// 添加 serverCron 到 eventLoop 中,以便后续可以执行定时脚本
if (aeCreateTimeEvent(server.el, 1, serverCron, NULL, NULL) == AE_ERR) {
serverPanic("Can't create event loop timers.");
exit(1);
}
...
}
// ae.c, 添加时间事件
long long aeCreateTimeEvent(aeEventLoop *eventLoop, long long milliseconds,
aeTimeProc *proc, void *clientData,
aeEventFinalizerProc *finalizerProc)
{
long long id = eventLoop->timeEventNextId++;
aeTimeEvent *te;
te = zmalloc(sizeof(*te));
if (te == NULL) return AE_ERR;
te->id = id;
aeAddMillisecondsToNow(milliseconds,&te->when_sec,&te->when_ms);
te->timeProc = proc;
te->finalizerProc = finalizerProc;
te->clientData = clientData;
te->next = eventLoop->timeEventHead;
eventLoop->timeEventHead = te;
return id;
}
// server.c, 主脚本运行入口, 每1秒运行1次
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
...
/* Replication cron function -- used to reconnect to master and
* to detect transfer failures. */
// 主从复制,连接 master,我们的入口
run_with_period(1000) replicationCron();
...
server.cronloops++;
return 1000/server.hz;
}
// 重点入口: replicationCron()
// replication.c, 主从复制定时脚本
/* Replication cron function, called 1 time per second. */
void replicationCron(void) {
static long long replication_cron_loops = 0;
/* Non blocking connection timeout? */
// 连接超时处理,取消重连
if (server.masterhost &&
(server.repl_state == REPL_STATE_CONNECTING ||
slaveIsInHandshakeState()) &&
(time(NULL)-server.repl_transfer_lastio) > server.repl_timeout)
{
serverLog(LL_WARNING,"Timeout connecting to the MASTER...");
cancelReplicationHandshake();
}
/* Bulk transfer I/O timeout? */
// 传输数据超时,取消重连
if (server.masterhost && server.repl_state == REPL_STATE_TRANSFER &&
(time(NULL)-server.repl_transfer_lastio) > server.repl_timeout)
{
serverLog(LL_WARNING,"Timeout receiving bulk data from MASTER... If the problem persists try to set the 'repl-timeout' parameter in redis.conf to a larger value.");
cancelReplicationHandshake();
}
/* Timed out master when we are an already connected slave? */
// slave 会话超时
if (server.masterhost && server.repl_state == REPL_STATE_CONNECTED &&
(time(NULL)-server.master->lastinteraction) > server.repl_timeout)
{
serverLog(LL_WARNING,"MASTER timeout: no data nor PING received...");
freeClient(server.master);
}
/* Check if we should connect to a MASTER */
// 3.2.1. 初次设置master时,一定会进行连接处理
if (server.repl_state == REPL_STATE_CONNECT) {
serverLog(LL_NOTICE,"Connecting to MASTER %s:%d",
server.masterhost, server.masterport);
if (connectWithMaster() == C_OK) {
serverLog(LL_NOTICE,"MASTER <-> SLAVE sync started");
}
}
/* Send ACK to master from time to time.
* Note that we do not send periodic acks to masters that don't
* support PSYNC and replication offsets. */
// 3.2.2. 每次定时任务执行,都会发生 ACK 给master
if (server.masterhost && server.master &&
!(server.master->flags & CLIENT_PRE_PSYNC))
replicationSendAck();
/* If we have attached slaves, PING them from time to time.
* So slaves can implement an explicit timeout to masters, and will
* be able to detect a link disconnection even if the TCP connection
* will not actually go down. */
listIter li;
listNode *ln;
robj *ping_argv[1];
/* First, send PING according to ping_slave_period. */
// 3.2.3. 发送 PING 请求
// 默认 repl_ping_slave_period: 10
if ((replication_cron_loops % server.repl_ping_slave_period) == 0) {
ping_argv[0] = createStringObject("PING",4);
replicationFeedSlaves(server.slaves, server.slaveseldb,
ping_argv, 1);
decrRefCount(ping_argv[0]);
}
/* Second, send a newline to all the slaves in pre-synchronization
* stage, that is, slaves waiting for the master to create the RDB file.
* The newline will be ignored by the slave but will refresh the
* last-io timer preventing a timeout. In this case we ignore the
* ping period and refresh the connection once per second since certain
* timeouts are set at a few seconds (example: PSYNC response). */
// 3.2.4. 向以当前节点为master的slaves 发送空行数据
listRewind(server.slaves,&li);
while((ln = listNext(&li))) {
client *slave = ln->value;
if (slave->replstate == SLAVE_STATE_WAIT_BGSAVE_START ||
(slave->replstate == SLAVE_STATE_WAIT_BGSAVE_END &&
server.rdb_child_type != RDB_CHILD_TYPE_SOCKET))
{
if (write(slave->fd, "\n", 1) == -1) {
/* Don't worry, it's just a ping. */
}
}
}
/* Disconnect timedout slaves. */
// 断开连接超时的 slaves
if (listLength(server.slaves)) {
listIter li;
listNode *ln;
listRewind(server.slaves,&li);
while((ln = listNext(&li))) {
client *slave = ln->value;
if (slave->replstate != SLAVE_STATE_ONLINE) continue;
if (slave->flags & CLIENT_PRE_PSYNC) continue;
if ((server.unixtime - slave->repl_ack_time) > server.repl_timeout)
{
serverLog(LL_WARNING, "Disconnecting timedout slave: %s",
replicationGetSlaveName(slave));
freeClient(slave);
}
}
}
/* If we have no attached slaves and there is a replication backlog
* using memory, free it after some (configured) time. */
// 如果没有slave 跟随当前节点,一段时间后将backlog 释放掉
if (listLength(server.slaves) == 0 && server.repl_backlog_time_limit &&
server.repl_backlog)
{
time_t idle = server.unixtime - server.repl_no_slaves_since;
if (idle > server.repl_backlog_time_limit) {