1)故障现象
云平台各种操作变慢
2)故障原因
一个组件突然断掉,当期再次加入集群中时 ,不是复用原先的队列,而是创建新的队列。而原来的队列依然绑定在exchange上,这样,从exchange 路由过来的消息依然会发送到老队列上,老队列上没有“consumer”与之对应,导致消息队列的堆积。
3)故障解决
一个cinder-scheduler 进程对应一个cinder-scheduler_fanout_{uuid}队列。我们现有两个api 节点,含有两个cinder-scheduler进程,对应两个cinder-scheduler_fanout_{uuid}队列,如下所示,含有三个队列,说明有一个僵尸队列。(一般是堆积消息的队列为僵尸队列)
[root@VM1~ ]$ rabbitmqctl list_queues |grep cinder-scheduler
cinder-scheduler 0
cinder-scheduler.cinder 0
cinder-scheduler_fanout_5720c0511f654740bb639de7282a3ed0 43
cinder-scheduler_fanout_89ec88c1f9ce404089d17e68250505bb 0
cinder-scheduler_fanout_ee07d2cb126c4378b99bf11007aa879b 0
查询队列对应的consume,注意uuid。
[root@VM1 ~ ]$ rabbitmqctl list_consumers |grep cinder-scheduler_fanout_
cinder-scheduler_fanout_89ec88c1f9ce404089d17e68250505bb <rabbit@server-44.3.8145.28> 3 true 0 []
cinder-scheduler_fanout_ee07d2cb126c4378b99bf11007aa879b <rabbit@server-45.2.16539.60> 3 true 0 []
结果发现队列cinder-scheduler_fanout_5720c0511f654740bb639de7282a3ed0没有对应的consumer。故确定为僵尸队列
处理方法是直接删除该队列即可
rabbitmqadmin delete queue name='cinder-scheduler_fanout_89ec88c1f9ce404089d17e68250505bb 0'