Redis主从复制 I/O error trying to sync with MASTER: connection lost 异常

最新推荐文章于 2024-05-09 13:49:36 发布

my_tiantian

最新推荐文章于 2024-05-09 13:49:36 发布

阅读量6.9k

点赞数 2

分类专栏： Redis 文章标签： Reids 主从复制

本文链接：https://blog.csdn.net/my_tiantian/article/details/89402507

版权

Redis 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

原本一个4主4从节点的redis集群，由于服务器另作他用，将其中一台服务器上的1主1从两个节点迁移到新服务器上，但在新服务器上的新从节点加入集群进行全量同步时出现异常。

1. redis集群节点迁移

将旧服务器上主节点进行手动故障转移(见Redis 集群教程的手动故障转移部分)
在新服务器上搭建2个redis节点环境，并加入原集群

redis-cli --cluster add-node new_host:new_port existing_host:existing_port --cluster-slave --cluster-master-id <arg>

从集群中删除旧服务器的2个节点，由于两个节点都变为了从节点，可以直接删除
redis-cli --cluster del-node host:port node_id

2. 从节点全量同步异常

新从节点加入集群后会连接主节点进行全量同步，但出现了连接Master异常。
从节点日志如下：

31201:S 19 Apr 2019 11:14:55.338 * Connecting to MASTER 10.20.2.17:7010
31201:S 19 Apr 2019 11:14:55.338 * MASTER <-> REPLICA sync started
31201:S 19 Apr 2019 11:14:55.345 * Non blocking connect for SYNC fired the event.
31201:S 19 Apr 2019 11:14:55.369 * Master replied to PING, replication can continue...
31201:S 19 Apr 2019 11:14:55.382 * Partial resynchronization not possible (no cached master)
31201:S 19 Apr 2019 11:14:55.469 * Full resync from master: c585002345618b2cc92fe14ce4a1faa35afbce0c:149918
00664323
31201:S 19 Apr 2019 11:15:46.229 * MASTER <-> REPLICA sync: receiving 1748244779 bytes from master
31201:S 19 Apr 2019 11:16:47.556 # I/O error trying to sync with MASTER: connection lost

或者是

# I/O error reading bulk count from MASTER: Resource temporarily unavailable

主节点日志：

22985:M 19 Apr 2019 11:18:37.426 * Replica 10.20.2.10:7010 asks for synchronization
22985:M 19 Apr 2019 11:18:37.426 * Full resync requested by replica 10.20.2.10:7010
22985:M 19 Apr 2019 11:18:37.426 * Starting BGSAVE for SYNC with target: disk
22985:M 19 Apr 2019 11:18:37.507 * Background saving started by pid 32045
32045:C 19 Apr 2019 11:19:40.360 * DB saved on disk
32045:C 19 Apr 2019 11:19:40.442 * RDB: 915 MB of memory used by copy-on-write
22985:M 19 Apr 2019 11:19:40.683 * Background saving terminated with success
22985:M 19 Apr 2019 11:19:47.250 # Client id=128297 addr=10.20.2.10:35039 fd=26 name= age=70 idle=70 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=16202 oll=13
092 omem=268438368 events=rw cmd=psync scheduled to be closed ASAP for overcoming of output buffer limits.
22985:M 19 Apr 2019 11:19:47.283 # Connection with replica 10.20.2.10:7010 lost.

从日志中我们可以看到在redis主节点在接到从节点要求重新同步数据的时候先生成一个rdb文件，再通过psync来做部分同步，但由于client-output-buffer-limit值设置太小，导致数据发送失败，从节点连接断开，因此可以通过增大主节点的client-output-buffer-limit配置值来解决此问题。

3. 输出缓冲限制 client-output-buffer-limit

对于Redis服务器的输出（也就是命令的返回值）来说，其大小通常是不可控制的。有可能一个简单的命令，能够产生体积庞大的返回数据。另外也有可能因为执行了太多命令，导致产生返回数据的速率超过了往客户端发送的速率，这是也会导致服务器堆积大量消息，从而导致输出缓冲区越来越大，占用过多内存，甚至导致系统崩溃。

Redis设置了一些保护机制来避免这种情况的出现，不同类型的客户端有不同的限制参数。限制方式有如下两种：

（1） 硬限制，当某一个客户端的缓冲区超过某一个大小值时，直接关闭这个客户端的连接；

（2） 软限制，当某一个客户端的缓冲区持续一段时间占用过大空间时，会直接关闭客户端连接。

具体介绍可以看 Redis如何处理客户端连接的”输出缓冲限制“部分。
redis.conf中的client-output-buffer-limit默认配置：

client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60

不同客户端有不同策略，策略如下：

对于普通客户端来说，限制为0，也就是不限制。因为普通客户端通常采用阻塞式的消息应答模式，何谓阻塞式呢？如：发送请求，等待返回，再发送请求，再等待返回。这种模式下，通常不会导致Redis服务器输出缓冲区的堆积膨胀；
对于Pub/Sub客户端（也就是发布/订阅模式），大小限制是8M，当输出缓冲区超过8M时，会关闭连接。持续性限制是，当客户端缓冲区大小持续60秒超过2M，则关闭客户端连接；
对于slave客户端来说，大小限制是256M，持续性限制是当客户端缓冲区大小持续60秒超过64M，则关闭客户端连接。
也可以从节点中查看当前配置

    127.0.0.1:6379> config get client-output-buffer-limit 
    1) "client-output-buffer-limit"
    2) "normal 0 0 0 slave 268435456 67108864 60 pubsub 33554432 8388608 60"

可以看到，目前的限制是最大256M和60s内不超过64M,从我们上面日志信息可以看到psync的数据明显是大于256M的。
我们可以加大复制输出缓冲区的大小
config set client-output-buffer-limit 'slave 1073741824 268435456 60'

也可以关闭复制输出缓冲区的限制：
config set client-output-buffer-limit 'slave 0 0 0'

修改到合适大小后，可以看到全量同步执行成功了。
从节点日志

34979:S 19 Apr 2019 11:34:38.594 * Connecting to MASTER 10.20.2.17:7010
34979:S 19 Apr 2019 11:34:38.594 * MASTER <-> REPLICA sync started
34979:S 19 Apr 2019 11:34:38.594 * Non blocking connect for SYNC fired the event.
34979:S 19 Apr 2019 11:34:38.595 * Master replied to PING, replication can continue...
34979:S 19 Apr 2019 11:34:38.595 * Partial resynchronization not possible (no cached master)
34979:S 19 Apr 2019 11:34:38.705 * Full resync from master: c585002345618b2cc92fe14ce4a1faa35afbce0c:14995139415394
34979:S 19 Apr 2019 11:36:40.515 * MASTER <-> REPLICA sync: receiving 3871428383 bytes from master
34979:S 19 Apr 2019 11:37:15.582 * MASTER <-> REPLICA sync: Flushing old data
34979:S 19 Apr 2019 11:37:15.582 * MASTER <-> REPLICA sync: Loading DB in memory
34979:S 19 Apr 2019 11:37:59.763 * MASTER <-> REPLICA sync: Finished with success

主节点日志：

22985:M 19 Apr 2019 11:34:38.336 * Replica 10.20.2.18:7010 asks for synchronization
22985:M 19 Apr 2019 11:34:38.336 * Full resync requested by replica 10.20.2.18:7010
22985:M 19 Apr 2019 11:34:38.336 * Starting BGSAVE for SYNC with target: disk
22985:M 19 Apr 2019 11:34:38.446 * Background saving started by pid 40539
40539:C 19 Apr 2019 11:36:39.765 * DB saved on disk
40539:C 19 Apr 2019 11:36:39.851 * RDB: 2355 MB of memory used by copy-on-write
22985:M 19 Apr 2019 11:36:40.256 * Background saving terminated with success
22985:M 19 Apr 2019 11:37:15.302 * Synchronization with replica 10.20.2.18:7010 succeeded

参考：
全量同步过程：Redis复制的“Redis 复制功能是如何工作的“部分
深入剖析Redis主从复制
 主从复制
 Redis缓冲区设置

my_tiantian

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Redis主从复制 I/O error trying to sync with MASTER: connection lost 异常

原本一个4主4从节点的redis集群，由于服务器另作他用，将其中一台服务器上的1主1从两个节点迁移到新服务器上，但在新服务器上的新从节点加入集群进行全量同步时出现异常。redis集群节点迁移将旧服务器上主节点进行手动故障转移(见Redis 集群教程的手动故障转移部分)在新服务器上搭建2个redis节点环境，并加入原集群redis-cli --cluster add-node new_h...
复制链接

扫一扫

专栏目录