rabbitmq集群故障: Application rabbit exited with reason: {{incompatible_feature_flags

现象描述:
三节机器,存在一台机器处于stop_app状态,没有正常启动rabbit应用,最终日志包含

2023-12-13 16:37:12.696566+08:00 [notice] <0.44.0> Application rabbit exited with reason: {{incompatible_feature_flags,{badrpc,{'EXIT',{{badmap,undefined},[{maps,get,[depends_on,undefined,[]],[{file,"maps.erl"},{line,188}]},{rabbit_feature_flags,enable_dependencies,2,[{file,"rabbit_feature_flags.erl"},{line,1564}]},{rabbit_feature_flags,do_enable_locally,1,[{file,"rabbit_feature_flags.erl"},{line,1544}]},{rabbit_feature_flags,do_sync_feature_flags_with_node,1,[{file,"rabbit_feature_flags.erl"},{line,2174}]}]}}}},{rabbit,start,[normal,[]]}}
2023-12-13 16:37:12.700546+08:00 [notice] <0.44.0> Application osiris exited with reason: stopped
2023-12-13 16:37:12.703750+08:00 [notice] <0.44.0> Application sysmon_handler exited with reason: stopped
2023-12-13 16:37:12.710128+08:00 [notice] <0.44.0> Application ra exited with reason: stopped
2023-12-13 16:37:12.713783+08:00 [notice] <0.44.0> Application os_mon exited with reason: stopped
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0> rabbit_outside_app_process:
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0> {error,
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>     {rabbit,
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>         {{incompatible_feature_flags,
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>              {badrpc,
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>                  {'EXIT',
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>                      {{badmap,undefined},
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>                       [{maps,get,
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>                            [depends_on,undefined,[]],
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>                            [{file,"maps.erl"},{line,188}]},
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>                        {rabbit_feature_flags,enable_dependencies,2,
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>                            [{file,"rabbit_feature_flags.erl"},{line,1564}]},
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>                        {rabbit_feature_flags,do_enable_locally,1,
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>                            [{file,"rabbit_feature_flags.erl"},{line,1544}]},
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>                        {rabbit_feature_flags,do_sync_feature_flags_with_node,
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>                            1,
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>                            [{file,"rabbit_feature_flags.erl"},
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>                             {line,2174}]}]}}}},
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>          {rabbit,start,[normal,[]]}}}}
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0> [{rabbit,start_it,1,[{file,"rabbit.erl"},{line,421}]},
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>  {rabbit_node_monitor,do_run_outside_app_fun,1,
2023-12-13 16:37:12.713994+08:00 [error] <0.24895.0>                       [{file,"rabbit_node_monitor.erl"},{line,752}]}]

原因分析:
rabbitmq启动时会从其他在线节点同步feature_flag,例如drop_unroutable_metric和empty_basic_get_metric,且在启动后、同步前本地会先enable。
如果同步时另一个节点刚好启动,那么可能出现以下情况:
以AB为例,A本地已经enable,尝试从B同步,此时B刚刚启动处于disable状态,因此会出现incompatible_feature_flags报错,两边不一致,最终A启动失败。

触发场景分析:
1.在正常通过rabbitmq-server启动时服务器时,遇到该异常会直接退出,可以感知到异常进行二次启动修复。
2.autoheal模式下大部分场景rabbitmq不会因断网或者其他机器停止而始终处于stop_app状态,因此不太可能出现A启动时B刚好在启动的问题。
3.在三节点pause_minority模式下,多数分区机器reboot后剩下机器也会stop_app,例如ABC停掉多数分区BC,A会停止rabbit app。在reboot结束时pause_minority模式的start_app可能会因feature_flag不一致启动失败,如果是使用keepalived进行监管那么无法做到自动重启恢复。

解决方式:
通过脚本识别上述特征错误日志,重新启动rabbitmq

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值