redisson 重连失败问题分析

1 问题背景
本司在技术框架使用的spring cloud,redis client为redisson,部署在腾讯云上,通过专线连接的本地redis。
问题原因:因为未知原因,腾讯云专线断了不少时间,期间使用VPN进行连接,发现了有发送redis失败的情况,在中午的时候专线好了,但是redisson没有进行重连;理论上,redisson是有重连机制的,并且在日常开发测试中,是遇到过这种情况的。
redisson的心跳机制和连接监控是分开发的,心跳机制用的是发送cluster nodes命令获取集群的信息,ClusterConnectionManager类中实现的,跟本问题没有关系,不在详细分析。
redisson的连接监控,是在ConnectionWatchdog类中实现的,而且是分析本问题的重点,下边会详细分析。

2 问题的复现与解决
2.1 问题的复现,焦虑
case 1 :开发环境,本地起服务,连接远程的redis,通过防火墙将本地出站的redis端口关闭,模拟专线断开的情况;jmeter进行多线程请求,模拟访问。
结论:不论断开多久,redisson会一直尝试重连,最长隔8s,定时重连,这个时候一直都是重连失败的,redisson操作也是失败的;一旦端口恢复,redisson会立即会重新连接,redis操作也会立即成功。
关闭掉端口的日志:

2019-12-06 14:12:33.745 DEBUG [purchase,,,] 27168 --- [isson-timer-4-1] o.r.client.handler.ConnectionWatchdog    : reconnecting RedisConnection@1819010245 [redisClient=[addr=redis://xxxxxxxxxx:7002], channel=[id: 0xc410d918, L:/xxxxxxxxxx:9988 ! R:xxxxxxxxxx/xxxxxxxxxx:7002], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@5f8e3be2(failure: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (PING), params: [], Redis client: [addr=redis://xxxxxxxxxx:7002])], command=(PING), params=[], codec=org.redisson.client.codec.StringCodec]] to xxxxxxxxxx/xxxxxxxxxx:7002 
2019-12-06 14:12:33.745 DEBUG [purchase,,,] 27168 --- [isson-timer-4-1] o.r.client.handler.ConnectionWatchdog    : reconnecting RedisConnection@283731205 [redisClient=[addr=redis://xxxxxxxxxx:7003], channel=[id: 0xfaa1deac, L:/xxxxxxxxxx:9982 ! R:xxxxxxxxxx/xxxxxxxxxx:7003], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@49f8ba52(failure: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (PING), params: [], Redis client: [addr=redis://xxxxxxxxxx:7003])], command=(PING), params=[], codec=org.redisson.client.codec.StringCodec]] to xxxxxxxxxx/xxxxxxxxxx:7003 
2019-12-06 14:12:33.745 DEBUG [purchase,,,] 27168 --- [isson-timer-4-1] o.r.client.handler.ConnectionWatchdog    : reconnecting RedisConnection@2015605067 [redisClient=[addr=redis://xxxxxxxxxx:7002], channel=[id: 0xbcc837b0, L:/xxxxxxxxxx:9972 ! R:xxxxxxxxxx/xxxxxxxxxx:7002], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@3bbeb7fc(failure: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (PING), params: [], Redis client: [addr=redis://xxxxxxxxxx:7002])], command=(PING), params=[], codec=org.redisson.client.codec.StringCodec]] to xxxxxxxxxx/xxxxxxxxxx:7002 
2019-12-06 14:12:33.745 DEBUG [purchase,,,] 27168 --- [isson-timer-4-1] o.r.client.handler.ConnectionWatchdog    : reconnecting RedisConnection@915955785 [redisClient=[addr=redis://xxxxxxxxxx:7003], channel=[id: 0xce8b4081, L:/xxxxxxxxxx:9985 ! R:xxxxxxxxxx/xxxxxxxxxx:7003], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@5c02d3d6(failure: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (PING), params: [], Redis client: [addr=redis://xxxxxxxxxx:7003])], command=(PING), params=[], codec=org.redisson.client.codec.StringCodec]] to xxxxxxxxxx/xxxxxxxxxx:7003 
2019-12-06 14:12:33.745 DEBUG [purchase,,,] 27168 --- [isson-timer-4-1] o.r.client.handler.ConnectionWatchdog    : reconnecting RedisConnection@2066783684 [redisClient=[addr=redis://xxxxxxxxxx:7001], channel=[id: 0x61be598f, L:/xxxxxxxxxx:9990 ! R:xxxxxxxxxx/xxxxxxxxxx:7001], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@15c1df8d(failure: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (PING), params: [], Redis client: [addr=redis://xxxxxxxxxx:7001])], command=(PING), params=[], codec=org.redisson.client.codec.StringCodec]] to xxxxxxxxxx/xxxxxxxxxx:7001 
2019-12-06 14:12:33.745 DEBUG [purchase,,,] 27168 --- [isson-timer-4-1] o.r.client.handler.ConnectionWatchdog    : reconnecting RedisConnection@1739843965 [redisClient=[addr=redis://xxxxxxxxxx:7002], channel=[id: 0xfeec7583, L:/xxxxxxxxxx:9996 ! R:xxxxxxxxxx/xxxxxxxxxx:7002], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@12afb4(failure: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (PING), params: [], Redis client: [addr=redis://xxxxxxxxxx:7002])], command=(PING), params=[], codec=org.redisson.client.codec.StringCodec]] to xxxxxxxxxx/xxxxxxxxxx:7002 
2019-12-06 14:12:33.746 DEBUG [purchase,,,] 27168 --- [isson-timer-4-1] o.r.client.handler.ConnectionWatchdog    : reconnecting RedisPubSubConnection@1144371954 [redisClient=[addr=redis://xxxxxxxxxx:7003], channel=[id: 0xd3df3700, L:/xxxxxxxxxx:9974 ! R:xxxxxxxxxx/xxxxxxxxxx:7003], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@3718a0f3(failure: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (PING), params: [], Redis client: [addr=redis://xxxxxxxxxx:7003])], command=(PING), params=[], codec=org.redisson.client.codec.StringCodec]] to xxxxxxxxxx/xxxxxxxxxx:7003 
2019-12-06 14:12:34.003 ERROR [purchase,,,] 27168 --- [sson-netty-2-32] o.r.cluster.ClusterConnectionManager     : Can't update cluster state

io.netty.channel.AbstractChannel$AnnotatedSocketException: Permission denied: no further information: xxxxxxxxxx/xxxxxxxxxx:7002
Caused by: java.net.SocketException: Permission denied: no further information
   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
   at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
   at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
   at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
   at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:688)
   at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:635)
   at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:552)
   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:514)
   at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
   at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
   at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
   at java.lang.Thread.run(Thread.java:748)

恢复端口的日志,立即重连,且心跳正常;本块日志有彩蛋:

2019-12-06 14:16:25.755 DEBUG [purchase,,,] 27168 --- [isson-netty-2-3] o.r.client.handler.ConnectionWatchdog    : RedisConnection@305992631 [redisClient=[addr=redis://ip:7001], channel=[id: 0x9b9a991d, L:/{ip}:13512 - R:ip/ip:7001], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@1b1b8a27(success: PONG)], command=(PING), params=[], codec=null]] connected to ip/ip:7001, command: CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@1b1b8a27(success: PONG)], command=(PING), params=[], codec=null]
2019-12-06 14:16:25.759 DEBUG [purchase,,,] 27168 --- [isson-netty-2-2] o.r.client.handler.ConnectionWatchdog    : RedisConnection@1521128039 [redisClient=[addr=redis://ip:7002], channel=[id: 0xa976eb0b, L:/{ip}:13511 - R:ip/ip:7002], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@47ff9179(success: PONG)], command=(PING), params=[], codec=null]] connected to ip/ip:7002, command: CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@47ff9179(success: PONG)], command=(PING), params=[], codec=null]
2019-12-06 14:16:25.759 DEBUG [purchase,,,] 27168 --- [sson-netty-2-26] o.r.client.handler.ConnectionWatchdog    : RedisConnection@2069687678 [redisClient=[addr=redis://ip:7001], channel=[id: 0xf18e877a, L:/{ip}:13514 - R:ip/ip:7001], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@2bf05b3e(success: PONG)], command=(PING), params=[], codec=null]] connected to ip/ip:7001, command: CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@2bf05b3e(success: PONG)], command=(PING), params=[], codec=null]
2019-12-06 14:16:25.759 DEBUG [purchase,,,] 27168 --- [isson-netty-2-4] o.r.client.handler.ConnectionWatchdog    : RedisConnection@1259875329 [redisClient=[addr=redis://ip:7003], channel=[id: 0x60e1fbd7, L:/{ip}:13515 - R:ip/ip:7003], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@27b6c723(success: PONG)], command=(PING), params=[], codec=null]] connected to ip/ip:7003, command: CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@27b6c723(success: PONG)], command=(PING), params=[], codec=null]
2019-12-06 14:16:25.761 DEBUG [purchase,,,] 27168 --- [sson-netty-2-25] o.r.client.handler.ConnectionWatchdog    : RedisConnection@211705632 [redisClient=[addr=redis://ip:7003], channel=[id: 0x07310fb8, L:/{ip}:13513 - R:ip/ip:7003], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@eab1f09(success: PONG)], command=(PING), params=[], codec=null]] connected to ip/ip:7003, command: CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@eab1f09(success: PONG)], command=(PING), params=[], codec=null]
2019-12-06 14:16:25.846 DEBUG [purchase,,,] 27168 --- [isson-timer-4-1] o.r.client.handler.ConnectionWatchdog    : reconnecting RedisConnection@1387010270 [redisClient=[addr=redis://ip:7003], channel=[id: 0xe0f4f9b0, L:/{ip}:9994 ! R:ip/ip:7003], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@9035980(failure: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (PING), params: [], Redis client: [addr=redis://ip:7003])], command=(PING), params=[], codec=org.redisson.client.codec.StringCodec]] to ip/ip:7003 
2019-12-06 14:16:25.846 DEBUG [purchase,,,] 27168 --- [isson-timer-4-1] o.r.client.handler.ConnectionWatchdog    : reconnecting RedisPubSubConnection@746972028 [redisClient=[addr=redis://ip:7002], channel=[id: 0x06a7c3a7, L:/{ip}:9971 ! R:ip/ip:7002], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@3e0a6fe2(failure: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (PING), params: [], Redis client: [addr=redis://ip:7002])], command=(PING), params=[], codec=org.redisson.client.codec.StringCodec]] to ip/ip:7002 
2019-12-06 14:16:25.846 DEBUG [purchase,,,] 27168 --- [isson-timer-4-1] o.r.client.handler.ConnectionWatchdog    : reconnecting RedisConnection@1110446254 [redisClient=[addr=redis://ip:7003], channel=[id: 0x0a9d6658, L:/{ip}:9963 ! R:ip/ip:7003], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@2ac9d9e4(failure: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (PING), params: [], Redis client: [addr=redis://ip:7003])], command=(PING), params=[], codec=org.redisson.client.codec.StringCodec]] to ip/ip:7003 
2019-12-06 14:16:25.856 DEBUG [purchase,,,] 27168 --- [isson-netty-2-5] o.r.client.handler.ConnectionWatchdog    : RedisConnection@1387010270 [redisClient=[addr=redis://ip:7003], channel=[id: 0xfedc76c2, L:/{ip}:13516 - R:ip/ip:7003], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@7f2d1d8e(success: PONG)], command=(PING), params=[], codec=null]] connected to ip/ip:7003, command: CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@7f2d1d8e(success: PONG)], command=(PING), params=[], codec=null]
2019-12-06 14:16:25.856 DEBUG [purchase,,,] 27168 --- [isson-netty-2-6] o.r.client.handler.ConnectionWatchdog    : RedisPubSubConnection@746972028 [redisClient=[addr=redis://ip:7002], channel=[id: 0xf669e17e, L:/{ip}:13517 - R:ip/ip:7002], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@27700b56(success: PONG)], command=(PING), params=[], codec=null]] connected to ip/ip:7002, command: CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@27700b56(success: PONG)], command=(PING), params=[], codec=null]
2019-12-06 14:16:25.858 DEBUG [purchase,,,] 27168 --- [sson-netty-2-27] o.r.client.handler.ConnectionWatchdog    : RedisConnection@1110446254 [redisClient=[addr=redis://ip:7003], channel=[id: 0xd19e4add, L:/{ip}:13518 - R:ip/ip:7003], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@4e7b0c7d(success: PONG)], command=(PING), params=[], codec=null]] connected to ip/ip:7003, command: CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@4e7b0c7d(success: PONG)], command=(PING), params=[], codec=null]
2019-12-06 14:16:25.947 DEBUG [purchase,,,] 27168 --- [isson-timer-4-1] o.r.client.handler.ConnectionWatchdog    : reconnecting RedisConnection@625245493 [redisClient=[addr=redis://ip:7001], channel=[id: 0xaf910bb0, L:/{ip}:9986 ! R:ip/ip:7001], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@1134c15b(failure: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (PING), params: [], Redis client: [addr=redis://ip:7001])], command=(PING), params=[], codec=org.redisson.client.codec.StringCodec]] to ip/ip:7001 
2019-12-06 14:16:25.956 DEBUG [purchase,,,] 27168 --- [isson-netty-2-7] o.r.client.handler.ConnectionWatchdog    : RedisConnection@625245493 [redisClient=[addr=redis://ip:7001], channel=[id: 0x326768c2, L:/{ip}:13520 - R:ip/ip:7001], currentCommand=CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@479241e2(success: PONG)], command=(PING), params=[], codec=null]] connected to ip/ip:7001, command: CommandData [promise=RedissonPromise [promise=ImmediateEventExecutor$ImmediatePromise@479241e2(success: PONG)], command=(PING), params=[], codec=null]
2019-12-06 14:16:32.589  WARN [purchase,,,] 27168 --- [nfoReplicator-0] c.c.c.ConfigServicePropertySourceLocator : Could not locate PropertySource: I/O error on GET request for "http://{ip}:8761/config/purchase/dev/master": Connection refused: connect; nested exception is java.net.ConnectException: Connection refused: connect
2019-12-06 14:16:34.203 DEBUG [purchase,,,] 27168 --- [sson-netty-2-29] o.r.cluster.ClusterConnectionManager     : cluster nodes state got from ip/ip:7001:
d17e9a73e2040a9ecf74566806e3b871f4043d8d ip:7001@17001 myself,master - 0 1575612989000 9 connected 0-5460
4c7e244a4f42c7fabaeda9f9ad59d082bc9c0b2d ip:7002@17002 master - 0 1575612993908 10 connected 10923-16383
a4c6b0dd5be624a72400be616c9debd766215e9b ip:7003@17003 slave d17e9a73e2040a9ecf74566806e3b871f4043d8d 0 1575612992907 9 connected
423dfe3fb3f320739a5d5de85fabe3df01b86046 ip:7001@17001 slave 9e573b72fe699f9370a676844a73697bf702192c 0 1575612992000 8 connected
4921cac2344377458d9f8ec49df23f84e55e25ac ip:7002@17002 slave 4c7e244a4f42c7fabaeda9f9ad59d082bc9c0b2d 0 1575612991906 10 connected
9e573b72fe699f9370a676844a73697bf702192c ip:7003@17003 master - 0 1575612991000 8 connected 5461-10922

case 2 : 测试环境, 服务在腾讯云,redis在本地idc机房(重新部署的测试redis),通过防火墙将测试redis端口关闭,模拟专线断开的情况;jmeter进行多线程请求,模拟访问。
结论: 同case1,日志也一样。

case 3 : 预发环境(预发环境与线上环境是非常相似的), 服务在腾讯云,redis在本地idc机房(重新部署的测试redis),通过防火墙将测试redis端口关闭,模拟专线断开的情况;jmeter进行多线程请求,模拟访问。
结论: 通case1,日志也一样。在此时,通过jmeter设置不同的线程数、不同的操作都结果都没有影响,这个时候已经没有任何想法了。不论断开多久,在恢复之后都会有理解重连。

这个时候因为一直没有复现问题,时间比较长,运维的同学开始模拟线上真实的环境(预发环境只是使用了测试redis,没有真正的切换专线,会切断整个所有的专线),会将服务拉倒一个特定的测试环境中,重新通了另外的专线。

case 4: 预发环境(测试服务环境),服务在腾讯云,redis在本地idc机房(预发的redis),断开专线,jmeter进行多线程请求,模拟访问。
结论:通过jmeter设置不同的线程数、不同的操作都结果都没有影响,不论断开多久,在恢复之后都会有理解重连。虽有零星的超时问题,都是偶然的,么有规律可言。

case 5:预发环境(测试服务环境),服务在腾讯云,redis在本地idc机房(预发的redis),断开专线,使用vpn(断开一晚上),第二天再启用专线,jmeter进行多线程请求,模拟访问。
结论: 在vpn的环境中,报错的明显增多,但是在切换专线之后,错误就没有了。这个时候开始怀疑在vpn上,但是没有证据,有些问题没有对上。

case 6:预发环境(测试服务环境),服务在腾讯云,redis在本地idc机房(预发的redis),断开专线(断开5-7分钟),使用vpn,然后再启用专线,jmeter进行多线程请求,模拟访问。
结论:这个时候问题复现了,在从断开5分钟到vpn的时候复现了,之前一直猜测vpn到专线的切换;

总结:断开的状态到vpn的环境导致了重连的失败(具体的原因在下变),case5么有出现猜测是因为且vpn比较平滑,vpn与专线的切换在网络底层做的,应用层感知不到。

case 7 :问题复现了就要解决啊,呵呵,升级就好了,从3.7.5到3.11.5就好了,fix代码下节介绍。

3 代码分析
理论上,复现了问题,解决了问题,应该就可以了,但是老板认为,没有找到代码层级的问题,不能叫修复了。好吧,找原因。
在百度、google好久之后也没有找到问题的原因,不得以开始阅读源码,代码里充斥着各种futrue、promise,从应用到netty,看到netty就头大了。
ConnectionWatchdog这个类是监控conection的主要类,redisson使用的是netty来连接server端;

在根据代码看:
在这里插入图片描述

在网络断开之后,会执行channelInactive方法,进行重连,进而执行tryconnection方法;

在这里插入图片描述
如果重连失败会一直执行这个方案,如果一旦网络恢复(专线或者vpn)会执行1处的listener,连接成功之后会执行netty的初始化的handler,
其中BaseConnectionHandler是初始换链接的channelActive方法,会发送出初始命令到server,执行结果会执行2处的listener,如果成功会更新connecton和channle,如果失败则打印日志,则connection会一直是这个状态失败,发送的命令也会失败,即使恢复专线也不行了。

在新的版本中,修复了该bug:
在这里插入图片描述
对出现之前的情况,关闭了channel,然后进行重连。

  • 4
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值