我们使用rabbitmq作为消息传递中间件运行许多服务 . 当服务正在运行时,我们看到服务停止收听请求,要求我们重新启动服务以使其再次运行 . 我们在同一个JVM(Oracle Weblogic应用程序服务器)上运行多个服务,但并非所有服务都停止,通常只有一个,但并不总是相同 .
它似乎与负载无关,因为服务具有非常不同的负载配置文件 .
我们设置了心跳协议,但它没有解决问题 . 由于问题也发生在我们的测试环境中,我们将日志级别设置为debug,希望这会揭示原因并提示解决方案 .
遗憾的是,它没有或我们缺乏对正在发生的事情的理解(不太熟悉spring-amqp和amqp库,但我们阅读了spring文档) .
我们在日志中看到以下异常,之后在重新启动服务之前没有任何反应:
;2015-05-08 00:35:54,578; DEBUG; servicename=; clusterid=; username=; msguuid=; org.springframework.amqp.rabbit.core.RabbitAdmin - Declarations finished
;2015-05-08 00:37:12,015; ERROR; servicename=; clusterid=; username=; msguuid=; nl.pharmapartners.amqp.connectors.AmqpMessageListenerContainer - Consumer received fatal exception on startup
;org.springframework.amqp.rabbit.listener.FatalListenerStartupException: Authentication failure
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.start(BlockingQueueConsumer.java:367)
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:963)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.springframework.amqp.AmqpAuthenticationException: com.rabbitmq.client.PossibleAuthenticationFailureException: Possibly caused by authentication failure
at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:57)
at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:195)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createConnection(CachingConnectionFactory.java:359)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createBareChannel(CachingConnectionFactory.java:309)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getCachedChannelProxy(CachingConnectionFactory.java:283)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getChannel(CachingConnectionFactory.java:276)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.access$600(CachingConnectionFactory.java:69)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory$ChannelCachingConnectionProxy.createChannel(CachingConnectionFactory.java:614)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils$1.createChannel(ConnectionFactoryUtils.java:85)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.doGetTransactionalResourceHolder(ConnectionFactoryUtils.java:134)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.getTransactionalResourceHolder(ConnectionFactoryUtils.java:67)
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.start(BlockingQueueConsumer.java:363)
... 2 more
Caused by: com.rabbitmq.client.PossibleAuthenticationFailureException: Possibly caused by authentication failure
at com.rabbitmq.client.impl.AMQConnection.start(AMQConnection.java:373)
at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:516)
at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:545)
at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:191)
... 12 more
Caused by: com.rabbitmq.client.AlreadyClosedException: clean connection shutdown; reason: Attempt to use closed channel
at com.rabbitmq.client.impl.AMQChannel.ensureIsOpen(AMQChannel.java:190)
at com.rabbitmq.client.impl.AMQChannel.rpc(AMQChannel.java:223)
at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:209)
at com.rabbitmq.client.impl.AMQChannel.rpc(AMQChannel.java:202)
at com.rabbitmq.client.impl.AMQConnection.start(AMQConnection.java:355)
... 15 more
2015-05-08 00:37:12,032; ERROR; servicename=; clusterid=; username=; msguuid=; nl.pharmapartners.amqp.connectors.AmqpMessageListenerContainer - Stopping container from aborted consumer
我们没有真正的线索发生了什么,在此之前我们看到每一分钟都重新宣布了一次排队,队列和绑定 . 该日志发生在重新声明完成后约1.5分钟 . 没有记录表明rabbitmq有问题,所以我们不知道报告的尝试使用已经关闭的连接 . 这些都来自amqp-client或spring-amqp库 .
我无法在Rabbit集群上找到相关的错误日志记录,但我只查看了/ var / log / rabbitmq目录 .
日志似乎表明存在身份验证问题,但使用相同身份验证参数在同一节点上运行的其他服务没有问题(并且服务本身已经运行了很长时间(最多几天)而没有任何问题 . 原始异常表示库获取新通道,但在使用之前它已经关闭 .
有人可以解释发生了什么以及可能如何解决这个问题吗?
我们运行RabbitMQ 3.3.3 / Erlang 17,Spring amqp版本1.3.2,amqp-client 3.2.4 .
附加信息:我们使用默认配置各种组件,但以下情况除外:
#The waiting time (ms) for a response in amqp (for rpc calls)
amqp.timeout=30000
prefetch.count=1
max.concurrent.consumers=5
start.consumer.min.interval=500
stop.consumer.min.interval=5000
consecutive.active.trigger=3
consecutive.idle.trigger=3
amqp.message.ttl=900000
amqp.heartbeat=5
RabbitMQ的策略将删除没有消费者的队列超过120秒 . 所有队列都镜像到兔群集中的所有节点(3个节点) .
我们已经发现我们使用了一个CachingConnectionFactory,其默认缓存大小为1(我们将对此进行更改,因为这对于5个并发消费者来说似乎不合适,我们认为这应该至少等于最大消费者) .
提前致谢,
Wim Veldhuis .