redis 流水笔记七主从复制

最新推荐文章于 2022-11-01 08:45:00 发布

大漠孤烟BLOG

最新推荐文章于 2022-11-01 08:45:00 发布

阅读量1.1k

点赞数

分类专栏： redis

本文链接：https://blog.csdn.net/u012842247/article/details/103338768

版权

redis 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

1、看官网是解释replication 的

At the base of Redis replication (excluding the high availability features provided as an additional layer by Redis Cluster or Redis Sentinel) there is a very simple to use and configure leader follower (master-slave) replication: it allows replica Redis instances to be exact copies of master instances. The replica will automatically reconnect to the master every time the link breaks, and will attempt to be an exact copy of it regardless of what happens to the master.

This system works using three main mechanisms:

When a master and a replica instances are well-connected, the master keeps the replica updated by sending a stream of commands to the replica, in order to replicate the effects on the dataset happening in the master side due to: client writes, keys expired or evicted, any other action changing the master dataset.
When the link between the master and the replica breaks, for network issues or because a timeout is sensed in the master or the replica, the replica reconnects and attempts to proceed with a partial resynchronization: it means that it will try to just obtain the part of the stream of commands it missed during the disconnection.
When a partial resynchronization is not possible, the replica will ask for a full resynchronization. This will involve a more complex process in which the master needs to create a snapshot of all its data, send it to the replica, and then continue sending the stream of commands as the dataset changes.

从官网上这句话，我们可以得出，redis的主从复制是为保证 redis的高可用，从机（slave）如何和master 保持通信的。

Redis uses by default asynchronous replication, which being low latency and high performance, is the natural replication mode for the vast majority of Redis use cases. However Redis replicas asynchronously acknowledge the amount of data they received periodically with the master. So the master does not wait every time for a command to be processed by the replicas

redis的主从复制是异步，异步复制有哪些好处呢？

1、低延迟和高性能

2、如果数据量比较大，可以直接发送，不需要等待从机命令在同步下一批次。

注意一点，异步相当于不没有检查，即不要得到从机的确认，发生网络延迟或者其他的故障的时，丢失的数据可能性比较大。

有异步复制，那就有同步复制。

Synchronous replication of certain data can be requested by the clients using the WAIT command. However WAIT is only able to ensure that there are the specified number of acknowledged copies in the other Redis instances, it does not turn a set of Redis instances into a CP system with strong consistency: acknowledged writes can still be lost during a failover, depending on the exact configuration of the Redis persistence. However with WAIT the probability of losing a write after a failure event is greatly reduced to certain hard to trigger failure modes.

同步复制需要发送WAIT 指令确认当前指定复制部分是否OK，并不能保证redis CP系统，比异步减少数据出错的概率，但是仍然可能出错，依赖redis的持久化配置。

换句话说同步的复制策略极可能保证数据不丢失。

如何配置主从复制？现在我们有三台机器，配置一主二从

假设hadoop03 为主机,其他两台为从机。

分别在hadoop04 /hadoop 05 的配置文件中设置

slaveof host port 命令。

127.0.0.1:6379> info replication
# Replication
role:slave
master_host:hadoop03.fandong.com
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:1
master_link_down_since_seconds:1575205872
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:feeef2e7694384567c95e7ca438dc247eaf3fff2
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

发现 master_link_status： down 发现master 主机down了。

去master 上查看过如下：

# Replication
role:master
connected_slaves:0
master_replid:e94dff2aded37852932095f61b5e1be56ff34ec3
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
127.0.0.1:6379> shutdown

connected_slaves 连接数为0. 这是什么原因造成的呢，

首先想到的是测试网络是否OK ？，给出一种测试结果。

[root@hadoop03 bin]# ping hadoop04.fandong.com
PING hadoop04.fandong.com (192.168.1.122) 56(84) bytes of data.
64 bytes from hadoop04.fandong.com (192.168.1.122): icmp_seq=1 ttl=64 time=0.874 ms
64 bytes from hadoop04.fandong.com (192.168.1.122): icmp_seq=2 ttl=64 time=0.433 ms
^C
--- hadoop04.fandong.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.433/0.653/0.874/0.221 ms

发现并不是网络原因，当然不是，

查看防火墙。

为了方便，防火墙也关闭了。

[root@hadoop03 bin]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

那到低是什么原因导致的？

我去看下配置文件？

#
# By default protected mode is enabled. You should disable it only if
# you are sure you want clients from other hosts to connect to Redis
# even if no authentication is configured, nor a specific set of interfaces
# are explicitly listed using the "bind" directive.
protected-mode yes

通过查看配置文件，我们发现，原来是redis 设定了保护模式，导致我们无法从其他的主机进行连接。

解决办法：

protected-mode no

关闭这个模式，虽然关闭这个，我们还是要说下安全的问题，如果我们只是想指定的机器来连接，最后是指定联机的机器ip。

bind  127.0.0.1 hadoop04.fandong.com hadoop05.fandong.com hadoop03.fandong.com

再次查看，发现确实OK了

# Replication
role:master
connected_slaves:2
slave0:ip=192.168.1.122,port=6379,state=online,offset=154,lag=0
slave1:ip=192.168.1.121,port=6379,state=online,offset=154,lag=0
master_replid:d9e3def869b7dc3b01ce89ca3d810fadcfe55347
master_replid2:0000000000000000000000000000000000000000

至此问题，完美的解决了。

运行。

# master 写数据
hadoop03.fandong.com:6379> set key1 val1
OK
hadoop03.fandong.com:6379> 

#slave
hadoop04.fandong.com:6379> get key1
"val1"
hadoop04.fandong.com:6379> 
#slave
hadoop05.fandong.com:6379> get key1
"val1"
hadoop05.fandong.com:6379>

没有一番的风顺，故障来临。

1、主机shutdown了，这是从机是什么表现呢？

hadoop03.fandong.com:6379> set key1 val1
OK
hadoop03.fandong.com:6379> shutdown
not connected> exit
[root@hadoop03 bin]# 

#  slave 
hadoop04.fandong.com:6379> info replication
# Replication
role:slave
master_host:hadoop03.fandong.com
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:714
master_link_down_since_seconds:9
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:d9e3def869b7dc3b01ce89ca3d810fadcfe55347
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:714
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:714
hadoop04.fandong.com:6379>

发现确实，从机监测到了master，稍等一会，我们在查看下从机问题。

hadoop04.fandong.com:6379> info replication
# Replication
role:slave
master_host:hadoop03.fandong.com
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:714
master_link_down_since_seconds:98

发现hadoop04 仍然是从机的角色，并没有发生任何改变。

意味着是 master必须通过手动启动才可以。

手动启动master，可以发现我们的master又回来了。

hadoop04.fandong.com:6379> info replication
# Replication
role:slave
master_host:hadoop03.fandong.com
master_port:6379
master_link_status:up
master_last_io_seconds_ago:5
master_sync_in_progress:0
slave_repl_offset:0

---这个可真是糟糕的设计，如果工作日还可以手动，如果不是呢，哦，no！！！

我们可以设定自动重新启动。这是一种解决办法。看官网给出了我们一个例子，我们使用主从复制一个原因是减轻一个服务器读写压力，另一个原因也可以是备份数据的作用。

那我们采用了主从复制的机制，是否还仍然需要在master上备份数据呢？

1、如果不备份数据，假设主机挂掉了，那么我们需要重启主机，导致主机内存的数据全部丢失，那么从机的数据也会丢失！！！

所以主机也是要采用备份机制的。

---

现在我们的hadoop04 hadoop05 两个从机都是从master 上同步数据，如果更多的从机都从一台机上复制数据，会导致master的压力。

主从复制的另一模式：主从链模式

什么意思呢，就是从机也可以成为另一从机的主机。(会造成一定的延迟)

hadoop03 -> hadoop04 -> hadoop05

如何设置这种模式呢？

同样是使用 slaveof host port 的命令进行设置，只需要把hadoop05 的master 指向 hadoop04 就可以了。

hadoop05.fandong.com:6379> slaveof hadoop04.fandong.com 6379
OK
hadoop05.fandong.com:6379> info replication
# Replication
role:slave
master_host:hadoop04.fandong.com
master_port:6379
master_link_status:up
# 原来的master
hadoop03.fandong.com:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.1.122,port=6379,state=online,offset=1092,lag=0
master_replid:c429f5b1957f4aa5e1e486e575b8c94213c51e13
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1092

发现原来的master 就只有一个从机了，从机链就此配置OK了。

hadoop03.fandong.com:6379> set key3 val3
OK
hadoop03.fandong.com:6379> 
# hadoop05
hadoop05.fandong.com:6379> get key3
"val3"

还是原来的问题，master 主机挂掉了，会发生什么情况，hadoop04 会变成主机吗？

hadoop04.fandong.com:6379> info replication
# Replication
role:slave
master_host:hadoop03.fandong.com
master_port:6379
master_link_status:down

发现hadoop04 仍然在忠实的等待master的回归。

问题，hadoop03挂了，我们能否hadoop04写数据呢，毕竟还有hadoop05呢。

hadoop04.fandong.com:6379> set key5 val5
(error) READONLY You can't write against a read only replica.

发现从机模式是不能写入数据的。

--恢复master就是重新启动它。

这是发现master 这货老是挂掉，是不是出问题了，这是就需要一个新的master了。

slaveof no one 命令。

hadoop04.fandong.com:6379> slaveof no one
OK
hadoop04.fandong.com:6379> info replication
# Replication
role:master
connected_slaves:1

发现hadoop04 role 已经从slave 变为 master了，也就成为了新的领导。

---前面我们说了master 失败后，重启，仍然是master，但现在有新的master了，重启hadoop03 会发生什么事情呢？

hadoop03.fandong.com:6379> info replication
# Replication
role:master
connected_slaves:0

发现下面已经没有追随者了。

上面介绍一主从复制，数据是如何复制的呢？

第一次是全量的复制，而且是必须master有写入数据，触发。后面为增量复制。

从上面的介绍，我们发现master 主机shutdown 了，必须需要手动触发处理。

解决办法：

哨兵模式：

Redis Sentinel provides high availability for Redis. In practical terms this means that using Sentinel you can create a Redis deployment that resists without human intervention certain kinds of failures.

Redis Sentinel also provides other collateral tasks such as monitoring, notifications and acts as a configuration provider for clients.

This is the full list of Sentinel capabilities at a macroscopical level (i.e. the big picture):

Monitoring. Sentinel constantly checks if your master and replica instances are working as expected.
Notification. Sentinel can notify the system administrator, or other computer programs, via an API, that something is wrong with one of the monitored Redis instances.
Automatic failover. If a master is not working as expected, Sentinel can start a failover process where a replica is promoted to master, the other additional replicas are reconfigured to use the new master, and the applications using the Redis server are informed about the new address to use when connecting.
Configuration provider. Sentinel acts as a source of authority for clients service discovery: clients connect to Sentinels in order to ask for the address of the current Redis master responsible for a given service. If a failover occurs, Sentinels will report the new address.

---我们来根据官网的介绍一步步配置吧。

简单配置快速启动：

sentinel monitor <master-name> <ip> <redis-port>
#  
sentinel monitor hadoop03.fandong.com 6379  1
sentinel down-after-milliseconds hadoop03.fandong.com 6000
sentinel failover-timeout hadoop03.fandong.com 6000
sentinel parallel-syncs hadoop03.fandong.com 1

[root@hadoop03 bin]# ./redis-sentinel /etc/sentinel.conf 

*** FATAL CONFIG FILE ERROR ***
Reading the configuration file, at line 1
>>> 'sentinel monitor hadoop03.fandong.com 6379  1'
Unrecognized sentinel configuration statement.

发现启动失败,再次检查我们的配置，发现缺少主机名称ip。

sentinel monitor hadoop03 hadoop03.fandong.com 6379  1
sentinel down-after-milliseconds hadoop03 6000
sentinel failover-timeout hadoop03 6000
sentinel parallel-syncs hadoop03 1

再次启动,

[root@hadoop03 bin]# ./redis-sentinel /etc/sentinel.conf 
7524:X 01 Dec 2019 22:47:07.628 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
7524:X 01 Dec 2019 22:47:07.628 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=7524, just started
7524:X 01 Dec 2019 22:47:07.628 # Configuration loaded
7524:X 01 Dec 2019 22:47:07.632 * Increased maximum number of open files to 10032 (it was originally set to 1024).
                _._                                                  
           _.-``__ ''-._                                             
      _.-``    `.  `_.  ''-._           Redis 5.0.7 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._                                   
 (    '      ,       .-`  | `,    )     Running in sentinel mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 26379
 |    `-._   `._    /     _.-'    |     PID: 7524
  `-._    `-._  `-./  _.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |           http://redis.io        
  `-._    `-._`-.__.-'_.-'    _.-'                                   
 |`-._`-._    `-.__.-'    _.-'_.-'|                                  
 |    `-._`-._        _.-'_.-'    |                                  
  `-._    `-._`-.__.-'_.-'    _.-'                                   
      `-._    `-.__.-'    _.-'                                       
          `-._        _.-'                                           
              `-.__.-'                                               

7524:X 01 Dec 2019 22:47:07.634 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
7524:X 01 Dec 2019 22:47:07.634 # Sentinel ID is b0f22f7eef032a7061f8c13185e48672b9c616c4
7524:X 01 Dec 2019 22:47:07.634 # +monitor master hadoop03 192.168.1.123 6379 quorum 1
7524:X 01 Dec 2019 22:48:18.002 * +fix-slave-config slave 192.168.1.122:6379 192.168.1.122 6379 @ hadoop03 192.168.1.123 6379

# Replication
role:master
connected_slaves:1
slave0:ip=192.168.1.122,port=6379,state=online,offset=21426,lag=0
master_replid:531f9c7886850f616c81905b7ab4ed98ab21095f
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:21426
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:21426

这是一个很奇怪的问题，为什么我有两台服务器从机，只有一台连接上主机了呢。

发现是一台从机网路状态不好，导致的从新连接即可,应该先启动redis-server 在启动哨兵模式。

# Replication
role:master
connected_slaves:2
slave0:ip=192.168.1.122,port=6379,state=online,offset=45361,lag=1
slave1:ip=192.168.1.121,port=6379,state=online,offset=45361,lag=1
master_replid:531f9c7886850f616c81905b7ab4ed98ab21095f
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:45361
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:45361
hadoop03.fandong.com:6379>

现在问题来了，shutdown 主机，会发生什么情况呢？

hadoop03.fandong.com:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=192.168.1.121,port=6379,state=online,offset=56138,lag=1
slave1:ip=192.168.1.122,port=6379,state=online,offset=56138,lag=0
master_replid:531f9c7886850f616c81905b7ab4ed98ab21095f
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:56138
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:56138
hadoop03.fandong.com:6379> shutdown
not connected> exit

---发现hadoop04 变为master模式，但是却没有从机？？？？

hadoop04.fandong.com:6379> info replication
# Replication
role:master
connected_slaves:0
master_replid:152a9c62dc9aa5e983d5e0a48c4b753d6fcfc685
master_replid2:531f9c7886850f616c81905b7ab4ed98ab21095f
master_repl_offset:62321
second_repl_offset:56437
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:14178
repl_backlog_histlen:48144

哨兵模式确实hadoop04 转变为了master，如果开启主机hadoop03 也确实成为了hadoop04的从机。

这个问题是需要解决的。

如果这时候我们把hadoop04 宕机了，会发生什么呢，hadoo04并没有启动一个哨兵监测。

我们发现有变成了开始的样子了，所以我们仍然需要在hadoop04 上启动哨兵模式。

--问题明天解决？

通过观察日志发现，

这个问题，可能导致的原因是hadoop05的机器原因，因为我去掉hadoop05,启用hadoop02的机器进行这个切换，却没有发现这个问题。

# Replication
role:slave
master_host:192.168.1.124
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:17347
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:933828913a993fbcd8f18786cc0e85903e7939a4
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:17347
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:16747
repl_backlog_histlen:601
hadoop04.fandong.com:6379>

--现在shutdown hadoop02 机器。

8271:X 02 Dec 2019 21:53:08.604 # +new-epoch 3
8271:X 02 Dec 2019 21:53:08.604 # +try-failover master hadoop03 192.168.1.124 6379
8271:X 02 Dec 2019 21:53:08.674 # +vote-for-leader 202b09cb8e8ce6fc7bbcf0cff93155a7d0fbeed4 3
8271:X 02 Dec 2019 21:53:08.755 # 760729de011902c15cdcd2c21e039b08e2df62d8 voted for 202b09cb8e8ce6fc7bbcf0cff93155a7d0fbeed4 3
8271:X 02 Dec 2019 21:53:08.793 # +elected-leader master hadoop03 192.168.1.124 6379
8271:X 02 Dec 2019 21:53:08.793 # +failover-state-select-slave master hadoop03 192.168.1.124 6379
8271:X 02 Dec 2019 21:53:08.848 # +selected-slave slave 192.168.1.122:6379 192.168.1.122 6379 @ hadoop03 192.168.1.124 6379
8271:X 02 Dec 2019 21:53:08.848 * +failover-state-send-slaveof-noone slave 192.168.1.122:6379 192.168.1.122 6379 @ hadoop03 192.168.1.124 6379
8271:X 02 Dec 2019 21:53:08.907 * +failover-state-wait-promotion slave 192.168.1.122:6379 192.168.1.122 6379 @ hadoop03 192.168.1.124 6379
8271:X 02 Dec 2019 21:53:09.819 # +promoted-slave slave 192.168.1.122:6379 192.168.1.122 6379 @ hadoop03 192.168.1.124 6379
8271:X 02 Dec 2019 21:53:09.819 # +failover-state-reconf-slaves master hadoop03 192.168.1.124 6379
8271:X 02 Dec 2019 21:53:09.865 * +slave-reconf-sent slave 192.168.1.123:6379 192.168.1.123 6379 @ hadoop03 192.168.1.124 6379
8271:X 02 Dec 2019 21:53:10.291 * +slave-reconf-inprog slave 192.168.1.123:6379 192.168.1.123 6379 @ hadoop03 192.168.1.124 6379
8271:X 02 Dec 2019 21:53:10.291 * +slave-reconf-done slave 192.168.1.123:6379 192.168.1.123 6379 @ hadoop03 192.168.1.124 6379
8271:X 02 Dec 2019 21:53:10.374 # +failover-end master hadoop03 192.168.1.124 6379
8271:X 02 Dec 2019 21:53:10.374 # +switch-master hadoop03 192.168.1.124 6379 192.168.1.122 6379
8271:X 02 Dec 2019 21:53:10.374 * +slave slave 192.168.1.123:6379 192.168.1.123 6379 @ hadoop03 192.168.1.122 6379
8271:X 02 Dec 2019 21:53:10.374 * +slave slave 192.168.1.124:6379 192.168.1.124 6379 @ hadoop03 192.168.1.122 6379

现在我是启动两个哨兵在侦查，hadoop03 /hadoop04 机器上启动了。从选举发现hadoop02 有成了新的leader。

hadoop04.fandong.com:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=192.168.1.123,port=6379,state=online,offset=46992,lag=0
master_replid:dd91b5dda256e30b4ed6947b8ddcfc0b1c2aa1d2
master_replid2:933828913a993fbcd8f18786cc0e85903e7939a4
master_repl_offset:46992
second_repl_offset:35578
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:16747
repl_backlog_histlen:30246
hadoop04.fandong.com:6379>

可以看到，确实只有一个hadoop03 的节点了。

现在重启启动hadoop02

总结：

1、哨兵模式尽可能的单独配置到一个主机中。

2、master 虽然做了 master-slave 切换，在保证性能的情况下，必须做备份。

3、如果没有做备份，建议不要设置自动重启机制，因为哨兵可能还没有发现master 宕机，又重启了，内存丢失，slave也会去复制的，导致所有的东西丢失。

大漠孤烟BLOG

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
redis 流水笔记七主从复制

1、看官网是解释replication 的At the base of Redis replication (excluding the high availability features provided as an additional layer by Redis Cluster or Redis Sentinel) there is a very simple to use and ...
复制链接

扫一扫