MongoDB 经常报 “Closed connection“ 的问题排查修复

现象

通过消息队列(RabbitMQ)将队列内的消息消费存入MongoDB,因为本来内容就不多所以基本应该是实时的,但是却不知道为什么会偶发消息消费推迟20~30分钟的情况,排查了RabbitMQ和MongoDB,网上没有找到相关问题的帖子,之后排查推测是 MongoDB 连接的问题

报错日志

2023-12-20 15:03:38.051 [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#3-1] WARN  org.mongodb.driver.connection - Got socket exception on connection [connectionId{localValue:109, serverValue:1993}] to test-mongodb03:17017. All connections to test-mongodb03:17017 will be closed.
2023-12-20 15:03:38.051 [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#3-1] INFO  org.mongodb.driver.connection - Closed connection [connectionId{localValue:109, serverValue:1993}] to test-mongodb03:17017 because there was a socket exception raised by this connection.
2023-12-20 15:03:38.052 [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#3-1] INFO  org.mongodb.driver.cluster - No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, serverDescriptions=[ServerDescription{address=test-mongodb02:17017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[6, 0, 10]}, minWireVersion=0, maxWireVersion=17, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=39535116, setName='test', canonicalAddress=test-mongodb02:17017, hosts=[test-mongodb03:17017, test-mongodb02:17017, test-mongodb01:17017], passives=[], arbiters=[], primary='test-mongodb03:17017', tagSet=TagSet{[]}, electionId=null, setVersion=1, lastWriteDate=Wed Dec 20 15:03:29 GMT+08:00 2023, lastUpdateTimeNanos=5350705627747041}, ServerDescription{address=test-mongodb03:17017, type=UNKNOWN, state=CONNECTING}, ServerDescription{address=test-mongodb01:17017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[6, 0, 10]}, minWireVersion=0, maxWireVersion=17, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=39518862, setName='test', canonicalAddress=test-mongodb01:17017, hosts=[test-mongodb03:17017, test-mongodb02:17017, test-mongodb01:17017], passives=[], arbiters=[], primary='test-mongodb03:17017', tagSet=TagSet{[]}, electionId=null, setVersion=1, lastWriteDate=Wed Dec 20 15:03:29 GMT+08:00 2023, lastUpdateTimeNanos=5350705837909970}]}. Waiting for 30000 ms before timing out
2023-12-20 15:03:38.091 [cluster-ClusterId{value='65797632227a450fe86e23d5', description='null'}-test-mongodb03:17017] INFO  org.mongodb.driver.cluster - Discovered replica set primary test-mongodb03:17017
2023-12-20 15:03:38.091 [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#3-1] INFO  org.mongodb.driver.connection - Closed connection [connectionId{localValue:110, serverValue:1992}] to test-mongodb03:17017 because there was a socket exception raised on another connection from this pool.
2023-12-20 15:03:38.091 [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#3-1] INFO  org.mongodb.driver.connection - Closed connection [connectionId{localValue:111, serverValue:1994}] to test-mongodb03:17017 because there was a socket exception raised on another connection from this pool.
2023-12-20 15:03:38.362 [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#3-1] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:112, serverValue:2004}] to test-mongodb03:17017
2023-12-20 15:03:38.683 [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#3-1] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:113, serverValue:2005}] to test-mongodb03:17017
2023-12-20 15:03:40.218 [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#3-1] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:114, serverValue:2008}] to test-mongodb03:17017

解决方式

根据MongoDB官方文档

FAQ: MongoDB Diagnostics — MongoDB Manual

sysctl net.ipv4.tcp_keepalive_time

获得的值小于等于120(秒)(2分钟)会比较好,但是目前设置的是1200(秒)(20分钟)

也基本符合测试那边的问题,就是数据差不多延迟了二三十分钟(虽然官方文档中说大于5分钟的都无效,会被设置为5分钟)

修改语句

sudo sysctl -w net.ipv4.tcp_keepalive_time=120

要防止重启之后配置变回原来的还得改配置文件,这边没有权限后续就没有操作了,不过步骤写的很清楚,交给运维就行

另外根据官方文档

FAQ: MongoDB Diagnostics — MongoDB Manual

还需要修改 net.ipv4.tcp_retries2 配置,通过以下语句查询,小于等于5会比较好

sysctl net.ipv4.tcp_retries2

可以看到系统原来设置的是15,这边通过

sudo sysctl -w net.ipv4.tcp_retries2=5

进行修改,修改后可以看到值已经变成5了

要防止重启之后配置变回原来的还得改配置文件,跟上面一样是同一个文件,交给运维就行

经过修复之后,该异常已经一周并未出现,推测问题已经解决

  • 10
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值