redis主从同步原理

1. 概述

整体过程概述如下:
1. 初始化
配置好主从后,无论slave是初次还是重新连接到master, slave都会发送PSYNC命令到master。
如果是重新连接,且满足增量同步的条件(3.1中详述),那么redis会将内存缓存队列中的命令发给slave, 完成增量同步(Partial resynchronization)。否则进行全量同步。
2. 正常同步开始
任何对master的写操作都会以redis命令的方式,通过网络发送给slave。

2. 全量同步(full resynchronization)

2.1 过程

  1. slave发送PSYNC
  2. master执行bgsave生成RDB快照文件,同时将这之后新的写命令记入缓冲区
  3. master向slave发送快照文件,并继续记录写命令
  4. slave接收并保存快照
  5. slave将快照文件载入内存
  6. slave开始接收master中缓冲区的命令完成同步

2.2 实例

环境:
- master 127.0.0.1:7779
- slave 127.0.0.1:9303 进程号10967 只有一个key

strace -p 10967 -s 1024 -o redis.strace.full

然后连接到slave, 执行slaveof 127.0.0.1 7779,从strace文件看到的同步过程中,slave侧的动作如下(只摘重要部分)

/*从client执行slaveof命令*/
read(6, "*3\r\n$7\r\nslaveof\r\n$9\r\n127.0.0.1\r\n$4\r\n7779\r\n", 16384) = 42
/*返回给client OK*/
write(6, "+OK\r\n", 5)
/*连接到master*/
connect(7, {sa_family=AF_INET, sin_port=htons(7779), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
/*以下判断master是活着的*/
write(7, "PING\r\n", 6) 
read(7, "+", 1)                         = 1
read(7, "P", 1)                         = 1
read(7, "O", 1)                         = 1
read(7, "N", 1)                         = 1
read(7, "G", 1)                         = 1
read(7, "\r", 1)                        = 1
read(7, "\n", 1)                        = 1
/*同步开始,向master发PSYNC*/
write(7, "PSYNC ? -1\r\n", 12)          = 12
/*master告诉salve要执行全量同步*/
read(7, "+", 1)                         = 1
read(7, "F", 1)                         = 1
read(7, "U", 1)                         = 1
read(7, "L", 1)                         = 1
read(7, "L", 1)                         = 1
read(7, "R", 1)                         = 1
read(7, "E", 1)                         = 1
read(7, "S", 1)                         = 1
read(7, "Y", 1)                         = 1
read(7, "N", 1)                         = 1
read(7, "C", 1)                         = 1
/*打开本地临时rdb文件*/
open("temp-1472206877.10967.rdb", O_WRONLY|O_CREAT|O_EXCL, 0644) = 8
/*接收master发来的rdb文件*/
read(7, "REDIS0006\376\0\0\4name\4xuan\376\1\r\16HOTEL_JUMP_NUM\33\33\0\0\0\30\0\0\0\4\0\0\320\325\2\220\6\6\365\2\320\334\230(\7\6\370\377\377\336\260\222\330\261\317\371\345", 77) = 77
/*将接收的rdb写入临时rdb*/
write(8, "REDIS0006\376\0\0\4name\4xuan\376\1\r\16HOTEL_JUMP_NUM\33\33\0\0\0\30\0\0\0\4\0\0\320\325\2\220\6\6\365\2\320\334\230(\7\6\370\377\377\336\260\222\330\261\317\371\345", 77) = 77
/*临时rdb文件重命名*/
rename("temp-1472206877.10967.rdb", "dump.rdb") = 0
/*打开本地rdb文件*/
open("dump.rdb", O_RDONLY) = 9
/* 从rdb文件加载数据到slave*/
read(9, "REDIS0006\376\0\0\4name\4xuan\376\1\r\16HOTEL_JUMP_NUM\33\33\0\0\0\30\0\0\0\4\0\0\320\325\2\220\6\6\365\2\320\334\230(\7\6\370\377\377\336\260\222\330\261\317\371\345", 4096) = 77
/*sync成功完成,记录日志*/
open("/tmp/redis.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 8
fstat(8, {st_mode=S_IFREG|0644, st_size=7627, ...}) = 0
write(8, "[10967] 26 Aug 18:21:17.450 * MASTER <-> SLAVE sync: Finished with success\n", 75) = 75

整个过程,与2.1所述一样,只是因为我们在同步过程中没对master做操作,所以strace没有体现出2.1中的第6步。

slave的redis.log也反应了上面的过程。

[10967] 26 Aug 18:21:17.250 * SLAVE OF 127.0.0.1:7779 enabled (user request)
[10967] 26 Aug 18:21:17.410 * Connecting to MASTER 127.0.0.1:7779
[10967] 26 Aug 18:21:17.413 * MASTER <-> SLAVE sync started
[10967] 26 Aug 18:21:17.415 * Non blocking connect for SYNC fired the event.
[10967] 26 Aug 18:21:17.418 * Master replied to PING, replication can continue...
[10967] 26 Aug 18:21:17.421 * Partial resynchronization not possible (no cached master)
[10967] 26 Aug 18:21:17.432 * Full resync from master: 1d13fbd06f644eeb4b50d65f11e65bffd9e596f6:43774
[10967] 26 Aug 18:21:17.444 * MASTER <-> SLAVE sync: receiving 77 bytes from master
[10967] 26 Aug 18:21:17.446 * MASTER <-> SLAVE sync: Flushing old data
[10967] 26 Aug 18:21:17.447 * MASTER <-> SLAVE sync: Loading DB in memory
[10967] 26 Aug 18:21:17.450 * MASTER <-> SLAVE sync: Finished with success

3. 增量同步(partial resynchronization)

3.1 增量同步的条件

几个重要概念:
- 内存缓存队列(in-memory backlog):用于记录连接断开时master收到的写操作
- 复制偏移量(replication offset):master, slave都有一个偏移,记录当前同步记录的位置
- master服务器id(master run ID):master唯一标识,2.2的redis.log中的1d13fbd06f644eeb4b50d65f11e65bffd9e596f6,就是一个master服务器id。

现网络连接断开后,slave将尝试重连master。当满足下列条件时,重连后会进行增量同步:
1. slave记录的master服务器id和当前要连接的master服务器id相同
2. slave的复制偏移量比master的偏移量靠前。比如slave是1000, master是1100
3. slave的复制偏移量所指定的数据仍然保存在主服务器的内存缓存队列中

3.2 同步过程

确认执行增量同步后,redis会将内存缓存队列中的命令通过网络发给slave, 完成增量同步

3.3 实例

环境:
- master 10.136.30.144:7779
- slave 10.136.31.213 9303 有一个key “h”

首先我们strace slave的进程,然后,为了模拟网络断线,我们在master机器上增加iptables规则,扔掉了所有发往slave的包。

/sbin/iptables -A OUTPUT -d 10.136.31.213 -j DROP

然后,在master上删除key h

del h

最后,我们删除iptables规则,模拟出网络恢复的状况。

/sbin/iptables -F

我们先来看slave的日志

[25667] 26 Aug 15:29:33.241 # Connection with master lost.
[25667] 26 Aug 15:29:33.241 * Caching the disconnected master state.
[25667] 26 Aug 15:29:33.241 * Connecting to MASTER 10.136.30.144:7779
[25667] 26 Aug 15:29:33.241 * MASTER <-> SLAVE sync started
[25667] 26 Aug 15:29:54.240 # Error condition on socket for SYNC: Connection timed out
[25667] 26 Aug 15:29:54.262 * Connecting to MASTER 10.136.30.144:7779
[25667] 26 Aug 15:29:54.263 * MASTER <-> SLAVE sync started
[25667] 26 Aug 15:30:15.270 # Error condition on socket for SYNC: Connection timed out
[25667] 26 Aug 15:30:15.726 * Connecting to MASTER 10.136.30.144:7779
[25667] 26 Aug 15:30:15.726 * MASTER <-> SLAVE sync started
[25667] 26 Aug 15:30:36.728 # Error condition on socket for SYNC: Connection timed out
[25667] 26 Aug 15:30:37.272 * Connecting to MASTER 10.136.30.144:7779
[25667] 26 Aug 15:30:37.279 * MASTER <-> SLAVE sync started
[25667] 26 Aug 15:30:37.282 * Non blocking connect for SYNC fired the event.
[25667] 26 Aug 15:30:37.289 * Master replied to PING, replication can continue...
[25667] 26 Aug 15:30:37.293 * Trying a partial resynchronization (request 1d13fbd06f644eeb4b50d65f11e65bffd9e596f6:29265).
[25667] 26 Aug 15:30:37.300 * Successful partial resynchronization with master.
[25667] 26 Aug 15:30:37.302 * MASTER <-> SLAVE sync: Master accepted a Partial Resynchronization.

slave发现与master断开后,一直尝试重新连接master,直到连接成功后尝试增量同步(partial resynchronization)并最终完成了增量同步。

starce的结果同样反应了上面的过程,摘要如下:

/*重新连接master*/
connect(6, {sa_family=AF_INET, sin_port=htons(7779), sin_addr=inet_addr("10.136.30.144")}, 16) = -1 EINPROGRESS (Operation now in progress)
/*以下判断master是活着的*/
write(6, "PING\r\n", 6) 
read(6, "+", 1)                         = 1
read(6, "P", 1)                         = 1
read(6, "O", 1)                         = 1
read(6, "N", 1)                         = 1
read(6, "G", 1)                         = 1
read(6, "\r", 1)                        = 1
read(6, "\n", 1)                        = 1
/*slave尝试增量同步,master表示同意*/
write(6, "PSYNC 1d13fbd06f644eeb4b50d65f11"..., 54) = 54
read(6, "+", 1)                         = 1
read(6, "C", 1)                         = 1
read(6, "O", 1)                         = 1
read(6, "N", 1)                         = 1
read(6, "T", 1)                         = 1
read(6, "I", 1)                         = 1
read(6, "N", 1)                         = 1
read(6, "U", 1)                         = 1
read(6, "E", 1)                         = 1
read(6, "\r", 1)                        = 1
read(6, "\n", 1)                        = 1
/*读取断线期间的增量命令: del h*/
read(6, "*1\r\n$4\r\nPING\r\n*2\r\n$3\r\ndel\r\n$1\r\nh"..., 16384) = 188

4. 备注

  1. 本文主要描述reids2.8及以上版本的同步过程,2.8之前的版本会略有不同。
  2. 参考 http://redis.io/topics/replication
  • 3
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值