1. 问题描述与解决方案
- 问题: 查看业务日志发现, 生产者和消费者都被"卡死", 没有看到错误日志. 查看 RabbitMQ 控制台发现 connection 都被 blocked. 同时看到磁盘空间警告.
- 解决方案: 修改配置文件把 disk_free_limit.absolute 从 20 GB 降低到 10GB, 重启 RabbitMQ, 解决问题.
2. 原因探究
在运行期间, RabbitMQ 节点将根据工作负载消耗不同数量的内存和磁盘空间. 当使用率激增时, 内存和可用磁盘空间都可能达到潜在的危险水平. 比如内存的不断消
耗, 可能导致节点被“操作系统的内存不足进程终止机制”杀掉.
为了降低这些情况的可能性, RabbitMQ 具有两个可配置的资源水位警戒线. 当触发警戒线时, RabbitMQ将阻止发布消息的 connection.
- 当内存使用量超过配置的警戒线时.
- 当可用磁盘空间降到配置的警戒线以下时.
通俗点理解, 内存观察的是水位线上升, 磁盘观察的是水位线下降.
2.1 Client Notifications
Modern client libraries support connection.blocked notification (a protocol extension), so
applications can monitor when they are blocked.
2.2 集群中的警报
在 rabbitmq 集群模式下, 内存和磁盘的告警是整个集群范围内的, 如果一个节点出现了告警, 则整个集群的所有节点都会阻塞 connection. 目的是阻止生产者, 消费
者可以继续消费而不受影响. 虽然协议允许生产者和消费者在单个 connection 的同一个 channel 或不同 channel 上进行操作, 但是这个逻辑是不完美的. 在实践上建
议仅将单个 connection 用于生产或者消费.
2.3 Effects on Data Safety
When an alarm is in effect, publishing connections will be blocked by TCP back pressure. In practice this means that publish operations will
eventually time out of fail outright. Application developers must be prepared to handle such failures and use publisher confirms to keep
track of what messages have been successfully handled and processed by RabbitMQ.
** It is strongly recommended that OS swap or page files are enabled. **
2.4 配置内存阈值
- 百分比配置
0.4 指内存的40%.
# new style config format, recommended
vm_memory_high_watermark.relative = 0.4
- 固定值配置
vm_memory_high_watermark.absolute = 2GB
2.5 配置磁盘可用空间限制
- 内存大小 * 1.0
disk_free_limit.relative = 1.0
- 固定值配置
disk_free_limit.absolute = 1GB
3. 总结
RabbitMQ connection 的创建, 消费者与生产者要分开, 如果你使用 SpringAMQP 的话, 它已经提供了这样的功能:
// To avoid deadlocked connections, it is generally recommended to use a separate connection for publishers
// and consumers (except when a publisher is participating in a consumer transaction).
rabbitTemplate.setUsePublisherConnection(true);
如果是ssm的话,需要手动配置两个factory
private CachingConnectionFactory getCachingConnectionFactory() {
CachingConnectionFactory cachingConnectionFactory = new CachingConnectionFactory();
cachingConnectionFactory.setAddresses(rabbitProperties.getAddresses());
cachingConnectionFactory.setUsername(rabbitProperties.getUsername());
cachingConnectionFactory.setPassword(rabbitProperties.getPassword());
cachingConnectionFactory.setVirtualHost(rabbitProperties.getVirtualHost());
cachingConnectionFactory.setPublisherConfirms(rabbitProperties.isPublisherConfirms());
cachingConnectionFactory.setPublisherReturns(rabbitProperties.isPublisherReturns());
cachingConnectionFactory.addChannelListener(rabbitChannelListener);
cachingConnectionFactory.addConnectionListener(rabbitConnectionListener);
cachingConnectionFactory.setRecoveryListener(rabbitRecoveryListener);
return cachingConnectionFactory;
}
@Bean("test-consumer-connection-factory")
public CachingConnectionFactory consumerCachingConnectionFactory() {
return getCachingConnectionFactory();
}
@Bean
@Primary
public CachingConnectionFactory cachingConnectionFactory() {
return getCachingConnectionFactory();
}