redis第七课-AKF之X轴-主从复制-哨兵模式

最新推荐文章于 2023-05-08 18:03:34 发布

darkness0604

最新推荐文章于 2023-05-08 18:03:34 发布

阅读量836

点赞数

分类专栏： Redis

本文链接：https://blog.csdn.net/darkness0604/article/details/104685057

版权

Redis 专栏收录该内容

10 篇文章 2 订阅

订阅专栏

话不多说，本次我们就来实战一下主从复制下的redis的搭建。

环境搭建

本次搭建，我们选用共3台redis实例进行实验，除去原有的6379的实例，还需要2台，分别在6380和6381端口的实例。
其中，6379选为主机，6380和6381作为从机。

回到我们先前解压redis时生成的文件夹中，移动到utils目录里

[root@dream01 redis-5.0.5]# cd utils/
[root@dream01 utils]# ll
总用量 76
-rw-rw-r-- 1 root root  593 5月  16 2019 build-static-symbols.tcl
-rw-rw-r-- 1 root root 1303 5月  16 2019 cluster_fail_time.tcl
-rw-rw-r-- 1 root root 1098 5月  16 2019 corrupt_rdb.c
drwxrwxr-x 2 root root 4096 5月  16 2019 create-cluster
-rwxrwxr-x 1 root root 2149 5月  16 2019 generate-command-help.rb
drwxrwxr-x 3 root root 4096 5月  16 2019 graphs
drwxrwxr-x 2 root root 4096 5月  16 2019 hashtable
drwxrwxr-x 2 root root 4096 5月  16 2019 hyperloglog
-rwxrwxr-x 1 root root 9567 5月  16 2019 install_server.sh
drwxrwxr-x 2 root root 4096 5月  16 2019 lru
-rw-rw-r-- 1 root root 1277 5月  16 2019 redis-copy.rb
-rwxrwxr-x 1 root root 1352 5月  16 2019 redis_init_script
-rwxrwxr-x 1 root root 1047 5月  16 2019 redis_init_script.tpl
-rw-rw-r-- 1 root root 1762 5月  16 2019 redis-sha1.rb
drwxrwxr-x 2 root root 4096 5月  16 2019 releasetools
-rwxrwxr-x 1 root root 3787 5月  16 2019 speed-regression.tcl
-rwxrwxr-x 1 root root  693 5月  16 2019 whatisdoing.sh

执行install_server.sh分别生成6380和6381实例，流程和最初搭建6379时一样，不再赘述。

为了实验效果可以很方便的看到，我们需要把三台redis实例都配置为非后台运行：

daemonize no #设置为前台阻塞执行redis实例

为了演示某些内容，将每一台的aof日志暂时都关闭。

appendonly no #不使用aof日志

#logfile /var/log/redis_6379.log  #将logfile注释掉，让日志信息打印到控制台上

接下来，在三个窗口分别启动三个实例

[root@dream01 redis]# redis-server /etc/redis/6379.conf

[root@dream01 redis]# redis-server /etc/redis/6380.conf

[root@dream01 redis]# redis-server /etc/redis/6381.conf

接下来，分别使用client连接6380和6381，然后执行命令REPLICAOF将其绑定到要认主的机器：

[root@dream01 ~]# redis-cli -p 6380
127.0.0.1:6380> REPLICAOF 127.0.0.1 6379
OK

[root@dream01 ~]# redis-cli -p 6381
127.0.0.1:6381> REPLICAOF 127.0.0.1 6379
OK

此时观察6379主机上的日志输出：

1393:M 26 Feb 2020 20:36:29.638 * Replica 127.0.0.1:6380 asks for synchronization
1393:M 26 Feb 2020 20:36:29.638 * Partial resynchronization not accepted: Replication ID mismatch
 (Replica asked for 'c1168133916febe61b1783b7bf36a4ba1222534f', my replication IDs are 'a36fe53e087fa567c2d81439a1df723ddeefe2c7'
  and '0000000000000000000000000000000000000000')
1393:M 26 Feb 2020 20:36:29.651 * Starting BGSAVE for SYNC with target: disk
1393:M 26 Feb 2020 20:36:29.677 * Background saving started by pid 1489
1489:C 26 Feb 2020 20:36:29.680 * DB saved on disk
1489:C 26 Feb 2020 20:36:29.681 * RDB: 4 MB of memory used by copy-on-write
1393:M 26 Feb 2020 20:36:29.737 * Background saving terminated with success
1393:M 26 Feb 2020 20:36:29.738 * Synchronization with replica 127.0.0.1:6380 succeeded
1393:M 26 Feb 2020 20:37:01.214 * Replica 127.0.0.1:6381 asks for synchronization
1393:M 26 Feb 2020 20:37:01.214 * Partial resynchronization not accepted: Replication ID mismatch
 (Replica asked for '07a4093c63c2c4d1d8b6dc47276cdea8a30496d2', my replication IDs are '0d64fc65222db22f93c58a2339dfb968e617b6d7' 
 and '0000000000000000000000000000000000000000')
1393:M 26 Feb 2020 20:37:01.214 * Starting BGSAVE for SYNC with target: disk
1393:M 26 Feb 2020 20:37:01.240 * Background saving started by pid 1519
1519:C 26 Feb 2020 20:37:01.249 * DB saved on disk
1519:C 26 Feb 2020 20:37:01.249 * RDB: 4 MB of memory used by copy-on-write
1393:M 26 Feb 2020 20:37:01.315 * Background saving terminated with success
1393:M 26 Feb 2020 20:37:01.316 * Synchronization with replica 127.0.0.1:6381 succeeded

此时可以大致看出，已经有6380和6381来进行绑定的信息。
并且，可以看到，6379会fork出一个子进程去进行BGSAVE，落地一个RDB，然后将RDB传输给从机的过程。

此时先查看6379主机的key：

127.0.0.1:6379> keys *
1) "a"
2) "q"
3) "p"
4) "b"
5) "c"

然后分别看6380和6381的key：

127.0.0.1:6380> keys *
1) "a"
2) "q"
3) "c"
4) "b"
5) "p"

127.0.0.1:6381> keys *
1) "b"
2) "p"
3) "c"
4) "a"
5) "q"

可以看到，已经把数据都同步过来了。。。

主从数据同步

从机第一次绑定主机

从机第一次绑定主机后，会全量拉取主机的数据。
假如从机以前是有数据的呢？其实是这样：当主机将RDB传输过来后，从机会先进行清空本地原有的数据，然后再装载上主机传输过来的数据信息。
从机日志可以看出端倪：

1666:S 26 Feb 2020 21:05:52.406 * Partial resynchronization not possible (no cached master)
1666:S 26 Feb 2020 21:05:52.409 * Full resync from master: 9108c0c8452d3e680708a13e01b604801c30ab61:700
1666:S 26 Feb 2020 21:05:52.510 * MASTER <-> REPLICA sync: receiving 176 bytes from master
1666:S 26 Feb 2020 21:05:52.510 * MASTER <-> REPLICA sync: Flushing old data
1666:S 26 Feb 2020 21:05:52.510 * MASTER <-> REPLICA sync: Loading DB in memory
1666:S 26 Feb 2020 21:05:52.510 * MASTER <-> REPLICA sync: Finished with success

从机后续绑定主机

后续的重启后重新绑定主机时，将只进行”增量拉取“。
但这个行为不是一定的：
在redis主机中存储着一个队列，放置着隶属于当前主机的从机在挂掉之后，将这期间进行的操作进行累积，使得从机在重启连接主机之后，只需要增量同步一些增量数据即可。
但这个队列大小是有限制的，通过配置的repl-backlog-size进行配置：

 Set the replication backlog size. The backlog is a buffer that accumulates
# replica data when replicas are disconnected for some time, so that when a replica
# wants to reconnect again, often a full resync is not needed, but a partial
# resync is enough, just passing the portion of data the replica missed while
# disconnected.
#
# The bigger the replication backlog, the longer the time the replica can be
# disconnected and later be able to perform a partial resynchronization.
#
# The backlog is only allocated once there is at least a replica connected.
#
# repl-backlog-size 1mb

而如果在从机下机期间，发生了很多操作，以至于覆盖了一些操作时，此时从机重连时，会根据offset，如果必要的话，将不进行增量同步，而是全量。
在这里插入图片描述

怎么知道是不是第一次？

那怎么知道是第一次建立的绑定，还是后续从机重启时建立的连接呢？肯定从机要有个地方有记录它和主机的契约吧。
当然有的，我们解析任意一个从机的RDB文件：

[root@dream01 6380]# redis-check-rdb dump.rdb 
[offset 0] Checking RDB file dump.rdb
[offset 26] AUX FIELD redis-ver = '5.0.5'
[offset 40] AUX FIELD redis-bits = '64'
[offset 52] AUX FIELD ctime = '1582722035'
[offset 67] AUX FIELD used-mem = '1860928'
[offset 85] AUX FIELD repl-stream-db = '0'
[offset 135] AUX FIELD repl-id = '9108c0c8452d3e680708a13e01b604801c30ab61'
[offset 151] AUX FIELD repl-offset = '280'
[offset 167] AUX FIELD aof-preamble = '0'
[offset 176] Checksum OK
[offset 176] \o/ RDB looks OK! \o/
[info] 0 keys read
[info] 0 expires
[info] 0 already expired

发现有2个关键属性：
1、repl-id 集群id
2、repl-offset 当前的offset

就是靠repl-id来记录和主机的契约状态，靠repl-offset ，当连接主机时，会去看对应主机的累积队列中offset的操作，进而决定是否全量同步。

AOF的同步总是全量的

这里，不知道是官方的bug还是设计如此，如果redis实例使用的aof日志，那么从机的每次同步都将会是全量的。

实验：
将三台实例都置为aof模式，发现每次从机重启都会发起全量同步：

 1849:S 26 Feb 2020 21:41:57.751 * Full resync from master: a913622a8e9c173ae62e9ad6405d728f8d527cee:275

虽然aof的重写会先将RDB先写入，但我们通过分析aof文件，发现并没有先前看到的关于主机和offset的记录信息：

[root@dream01 6381]# redis-check-aof appendonly.aof 
The AOF appears to start with an RDB preamble.
Checking the RDB preamble to start:
[offset 0] Checking RDB file appendonly.aof
[offset 26] AUX FIELD redis-ver = '5.0.5'
[offset 40] AUX FIELD redis-bits = '64'
[offset 52] AUX FIELD ctime = '1582724517'
[offset 67] AUX FIELD used-mem = '1861048'
[offset 83] AUX FIELD aof-preamble = '1'
[offset 85] Selecting DB ID 0
[offset 112] Checksum OK
[offset 112] \o/ RDB looks OK! \o/
[info] 3 keys read
[info] 0 expires
[info] 0 already expired
RDB preamble is OK, proceeding with AOF tail...
AOF analyzed: size=112, ok_up_to=112, diff=0
AOF is valid

扩展设置

1、当主机数据比较大时，当从机进行全量同步，势必需要一个时间片，针对于此，redis可以通过设置来选择从机的行为：

replica-serve-stale-data yes #如果为yes，那么同步传输完成中，从机依旧以旧数据开启服务，直到完成后。 为no则从机直接停止服务

2、redis的从机中的默认是不可以进行写操作的，只能进行读。如果想要进行更改，需要进行设置：

replica-read-only yes #为yes，开启从机的写支持

3、主机向从机传输同步数据时的行为：
在这里插入图片描述

一、redis落入磁盘一个rdb文件，然后通过网络传输给另一个从机。（网络IO）
二、redis直接通过网络传输给从机。（磁盘IO+网络IO）

repl-diskless-sync no #为yes，则不落磁盘，直接通过网络传输到从机。 否则要先落磁盘，再通过网络传输

4、redis供开发者取舍”一致性“的强度，供开发者可向强一致性的方向发展

min-replicas-to-write 3 #最少要几个从节点要同时返回成功
min-replicas-max-lag 10  #每秒最大可延迟的请求

手动故障转移

主从模型，难免会有主机挂掉的情况，我们先进行手动实现这个过程：

我们手动将6379进行终止。
此时发现6380和6381会不断尝试寻找主，却又无可奈何，原地打转。

1836:S 26 Feb 2020 22:03:23.840 # Error condition on socket for SYNC: Connection refused
1836:S 26 Feb 2020 22:03:24.848 * Connecting to MASTER 127.0.0.1:6379
1836:S 26 Feb 2020 22:03:24.848 * MASTER <-> REPLICA sync started
1836:S 26 Feb 2020 22:03:24.848 # Error condition on socket for SYNC: Connection refused
1836:S 26 Feb 2020 22:03:25.857 * Connecting to MASTER 127.0.0.1:6379
1836:S 26 Feb 2020 22:03:25.858 * MASTER <-> REPLICA sync started
1836:S 26 Feb 2020 22:03:25.858 # Error condition on socket for SYNC: Connection refused

且因6380,6381只能读，而不能写，此时我们急需选出一个新的主机来。

这时，我们手动将6380转为主，6381追从6380：

先将6380取消追从主机：

127.0.0.1:6380> REPLICAOF no one
OK

可以看到6380日志：已经切换成主机模式

1873:M 26 Feb 2020 22:07:52.806 * MASTER MODE enabled (user request from 'id=3 addr=127.0.0.1:56873
 fd=8 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=36 qbuf-free=32732 obl=0 oll=0 omem=0 events=r cmd=replicaof')

然后让6381追从6380：

127.0.0.1:6381> REPLICAOF 127.0.0.1 6380
OK

至此，就完成了一次手动的故障切换。

自动故障转移哨兵（Sentienl）

人是懒惰的，自然需要研究一个代替人的东西。
在redis中也提供了这样的一个存在，哨兵。
接下来进行哨兵的配置

进入到redis的源码目录，可以看到哨兵的配置文件：sentinel.conf

但实际上，核心配置只需要有2个就可以了，我们这里从简实验，具体可以参照上面的配置文件。

既然是监控的，自然它本身也得是高可用的，因此，哨兵也一般是以集群的形式存在的。
接下来我们配置三个哨兵，分别监听 26379,26380,26381

每个的配置都可以是这样的：

port 26379 #哨兵监听端口号
sentinel monitor mymaster      127.0.0.1             6381       2 
#                哨兵集群名称   监听redis主机ip       主机port 哨兵个数

port 26380 #哨兵监听端口号
sentinel monitor mymaster      127.0.0.1             6381       2 
#                哨兵集群名称   监听redis主机ip       主机port 哨兵个数

port 26381 #哨兵监听端口号
sentinel monitor mymaster      127.0.0.1             6381       2 
#                哨兵集群名称   监听redis主机ip       主机port 哨兵个数

接下来进行启动哨兵：
启动哨兵和启动redis服务其实差不多，但此时要使用”–sentinel“指定它是一个哨兵使命存在的redis服务（当然还可以直接通过redis-sentinel 启动哨兵）

[root@dream01 ~]# redis-server /etc/redis/26379.conf --sentinel

[root@dream01 ~]# redis-server /etc/redis/26380.conf --sentinel

[root@dream01 ~]# redis-server /etc/redis/26381.conf --sentinel

如果正常出现redis的图形，证明哨兵都已就绪。
此时从各个哨兵的输出，都可以看到：

1905:X 26 Feb 2020 22:29:13.179 * +sentinel sentinel 1b457bdcc457c091e803c2be3ead404f55dc2356 127.0.0.1 26379 @ mymaster 127.0.0.1 6379
1905:X 26 Feb 2020 22:29:14.314 * +sentinel sentinel b0a46b73b519cc2db7195cfd18bc808775032dba 127.0.0.1 26380 @ mymaster 127.0.0.1 6379
1905:X 26 Feb 2020 22:29:22.906 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
1905:X 26 Feb 2020 22:29:22.908 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379

但我们回想刚刚配置哨兵时，并未指定其他哨兵的信息，以及其他从机redis的信息，这里怎么直接就打印了呢？
其实不难推出，redis的主机中肯定存储着从机的信息，哨兵通过主机，得到从机也不是什么神奇事。
那其他哨兵呢？其实，是redis主机会开启一个消息订阅通道，而哨兵之间就是通过主机开启的这个订阅通道进行互相通信的。
在这里插入图片描述
通过PSUBSCRIBE * 查看：

127.0.0.1:6381> PSUBSCRIBE *
Reading messages... (press Ctrl-C to quit)
1) "psubscribe"
2) "*"
3) (integer) 1
1) "pmessage"
2) "*"
3) "__sentinel__:hello"
4) "127.0.0.1,26379,1b457bdcc457c091e803c2be3ead404f55dc2356,1,mymaster,127.0.0.1,6381,1"
1) "pmessage"
2) "*"
3) "__sentinel__:hello"
4) "127.0.0.1,26381,39ecf05bd86f90db7e037bbf91ca32b1daa73927,1,mymaster,127.0.0.1,6381,1"
1) "pmessage"
2) "*"
3) "__sentinel__:hello"
4) "127.0.0.1,26380,b0a46b73b519cc2db7195cfd18bc808775032dba,1,mymaster,127.0.0.1,6381,1"
1) "pmessage"
2) "*"
3) "__sentinel__:hello"
4) "127.0.0.1,26379,1b457bdcc457c091e803c2be3ead404f55dc2356,1,mymaster,127.0.0.1,6381,1"
1) "pmessage"
2) "*"
3) "__sentinel__:hello"
4) "127.0.0.1,26381,39ecf05bd86f90db7e037bbf91ca32b1daa73927,1,mymaster,127.0.0.1,6381,1"
1) "pmessage"
2) "*"
3) "__sentinel__:hello"
4) "127.0.0.1,26380,b0a46b73b519cc2db7195cfd18bc808775032dba,1,mymaster,127.0.0.1,6381,1"
1) "pmessage"
2) "*"
3) "__sentinel__:hello"
4) "127.0.0.1,26379,1b457bdcc457c091e803c2be3ead404f55dc2356,1,mymaster,127.0.0.1,6381,1"
1) "pmessage"
2) "*"
3) "__sentinel__:hello"
4) "127.0.0.1,26381,39ecf05bd86f90db7e037bbf91ca32b1daa73927,1,mymaster,127.0.0.1,6381,1"
1) "pmessage"
2) "*"
3) "__sentinel__:hello"
4) "127.0.0.1,26380,b0a46b73b519cc2db7195cfd18bc808775032dba,1,mymaster,127.0.0.1,6381,1"
1) "pmessage"
2) "*"
3) "__sentinel__:hello"

发现，三台哨兵在里边疯狂交流。。。

一切准备就绪。

自动故障转移下的实践

此时我们像之前一样，手动挂掉6379主机，看看发生什么？
咦？好像其他两台机器还是像刚刚一样在疯狂找寻6379。。

1913:S 26 Feb 2020 22:36:33.134 # Error condition on socket for SYNC: Connection refused
1913:S 26 Feb 2020 22:36:34.140 * Connecting to MASTER 127.0.0.1:6379
1913:S 26 Feb 2020 22:36:34.140 * MASTER <-> REPLICA sync started
1913:S 26 Feb 2020 22:36:34.140 # Error condition on socket for SYNC: Connection refused
1913:S 26 Feb 2020 22:36:35.147 * Connecting to MASTER 127.0.0.1:6379

等待一阵，啊，发现好了。。
哨兵此时：

1905:X 26 Feb 2020 22:36:36.338 # +switch-master mymaster 127.0.0.1 6379 127.0.0.1 6381
1905:X 26 Feb 2020 22:36:36.338 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
1905:X 26 Feb 2020 22:36:36.338 * +slave slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381
1905:X 26 Feb 2020 22:37:06.389 # +sdown slave 127.0.0.1:6379 127.0.0.1 6379 @ mymaster 127.0.0.1 6381

6380：

1913:S 26 Feb 2020 22:36:35.286 * REPLICAOF 127.0.0.1:6381 enabled (user request from 'id=6 addr=127.0.0.1:57811 fd=10 name=sentinel-39ecf05b-cmd age=433 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=281 qbuf-free=32487 obl=36 oll=0 omem=0 events=r cmd=exec')
1913:S 26 Feb 2020 22:36:35.288 # CONFIG REWRITE executed with success.
1913:S 26 Feb 2020 22:36:36.155 * Connecting to MASTER 127.0.0.1:6381
1913:S 26 Feb 2020 22:36:36.156 * MASTER <-> REPLICA sync started
1913:S 26 Feb 2020 22:36:36.156 * Non blocking connect for SYNC fired the event.
1913:S 26 Feb 2020 22:36:36.156 * Master replied to PING, replication can continue...
1913:S 26 Feb 2020 22:36:36.157 * Trying a partial resynchronization (request 2ee5aa053e9810d2b64e3d6b93c09cd26456e810:80008).
1913:S 26 Feb 2020 22:36:36.157 * Successful partial resynchronization with master.
1913:S 26 Feb 2020 22:36:36.157 # Master replication ID changed to 5dbfabed33620b074f1e8cc5813afd10c4cf6083
1913:S 26 Feb 2020 22:36:36.157 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.

6381：

1919:M 26 Feb 2020 22:36:34.941 * MASTER MODE enabled (user request from 'id=6 addr=127.0.0.1:44213 fd=10 name=sentinel-39ecf05b-cmd age=432 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=140 qbuf-free=32628 obl=36 oll=0 omem=0 events=r cmd=exec')
1919:M 26 Feb 2020 22:36:34.943 # CONFIG REWRITE executed with success.
1919:M 26 Feb 2020 22:36:36.157 * Replica 127.0.0.1:6380 asks for synchronization
1919:M 26 Feb 2020 22:36:36.157 * Partial resynchronization request from 127.0.0.1:6380 accepted. Sending 422 bytes of backlog starting from offset 80008.

发现6381成功的主动变主机了。。

那为什么要经过这短暂的延迟呢？因为基于冯诺依曼体系和网络，有可能因为网络的波动导致的其他从机连接不到主机，但其实此时主机并不是真正挂掉。
所以，此时哨兵们都在等一个片刻，确保主机是真的死透了。。才开始故障转移。。

二探哨兵配置

我们再去看一下我们哨兵的配置：

port 26379
sentinel myid 1b457bdcc457c091e803c2be3ead404f55dc2356
# Generated by CONFIG REWRITE
dir "/root"
protected-mode no
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 127.0.0.1 6381 2
sentinel config-epoch mymaster 1
sentinel leader-epoch mymaster 1
sentinel known-replica mymaster 127.0.0.1 6380
sentinel known-replica mymaster 127.0.0.1 6379
sentinel known-sentinel mymaster 127.0.0.1 26381 39ecf05bd86f90db7e037bbf91ca32b1daa73927
sentinel known-sentinel mymaster 127.0.0.1 26380 b0a46b73b519cc2db7195cfd18bc808775032dba
sentinel current-epoch 1

发现我们之前写的东西都被重写了，除了端口号还保证原样。
此时监听的机器被自动改写成了此时作为主机的6381的redis实例。

至此，redis基于哨兵的主从复制就告一段落了。

darkness0604

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
2
评论
redis第七课-AKF之X轴-主从复制-哨兵模式

话不多说，本次我们就来实战一下主从复制下的redis的搭建。环境搭建本次搭建，我们选用共3台redis实例进行实验，除去原有的6379的实例，还需要2台，分别在6380和6381端口的实例。其中，6379选为主机，6380和6381作为从机。回到我们先前解压redis时生成的文件夹中，移动到utils目录里[root@dream01 redis-5.0.5]# cd utils/[ro...
复制链接

扫一扫