Rabbitmq由于queen数据持久化错误,导致无法启动的解决办法
环境
- os:centos 6.5
- rabbitmq: 3.6.5
故障描述
- 由于磁盘满,导致rabbitmq崩溃,无法提供服务
- 清理磁盘空间后,rabbitmq无法自动恢复正常
处理办法
- service rabbitmq-server stop 无法正确关闭rabbitmq
- ps -AF | grep rabbitmq,将rabbitmq相关进程全部kill
- 使用 service rabbitmq-server start 重启,显示启动错误
- 查看服务口状态:service rabbitmq-server status
Status of node rabbit@gitlab ...
Error: unable to connect to node rabbit@gitlab: nodedown
DIAGNOSTICS
===========
attempted to contact: [rabbit@gitlab]
rabbit@gitlab:
* connected to epmd (port 4369) on gitlab
* epmd reports: node 'rabbit' not running at all
other nodes on gitlab: ['rabbitmq-cli-11']
* suggestion: start the node
current node details:
- node name: 'rabbitmq-cli-11@localhost'
- home dir: /var/lib/rabbitmq
- cookie hash: qib2bCkQ8XIJmRtJP4qxFg==
- 查看错误日总:cat /var/log/rabbitmq/rabbit@gitlab-sasl.log
=ERROR REPORT==== 24-Aug-2017::17:33:47 ===
** Generic server <0.215.0> terminating
** Last message in was {'$gen_cast',
{submit_async,
#Fun<rabbit_queue_index.32.103862237>}}
** When Server state == undefined
** Reason for termination ==
** {{case_clause,undefined},
[{rabbit_queue_index,add_segment_relseq_entry,3,
[{file,"src/rabbit_queue_index.erl"},{line,1091}]},
{rabbit_queue_index,parse_segment_entries,3,
[{file,"src/rabbit_queue_index.erl"},{line,1075}]},
{rabbit_queue_index,'-recover_journal/1-fun-0-',1,
[{file,"src/rabbit_queue_index.erl"},{line,863}]},
{lists,map,2,[{file,"lists.erl"},{line,1239}]},
{rabbit_queue_index,segment_map,2,
[{file,"src/rabbit_queue_index.erl"},{line,989}]},
{rabbit_queue_index,recover_journal,1,
[{file,"src/rabbit_queue_index.erl"},{line,856}]},
{rabbit_queue_index,scan_segments,3,
[{file,"src/rabbit_queue_index.erl"},{line,676}]},
{rabbit_queue_index,queue_index_walker_reader,2,
[{file,"src/rabbit_queue_index.erl"},{line,664}]}]}
=INFO REPORT==== 24-Aug-2017::17:44:51 ===
Error description:
{could_not_start,rabbit,
{{badmatch,
{error,
{{{{case_clause,undefined},
[{rabbit_queue_index,add_segment_relseq_entry,3,
[{file,"src/rabbit_queue_index.erl"},{line,1091}]},
{rabbit_queue_index,parse_segment_entries,3,
[{file,"src/rabbit_queue_index.erl"},{line,1075}]},
{rabbit_queue_index,'-recover_journal/1-fun-0-',1,
[{file,"src/rabbit_queue_index.erl"},{line,863}]},
{lists,map,2,[{file,"lists.erl"},{line,1239}]},
{rabbit_queue_index,segment_map,2,
[{file,"src/rabbit_queue_index.erl"},{line,989}]},
{rabbit_queue_index,recover_journal,1,
[{file,"src/rabbit_queue_index.erl"},{line,856}]},
{rabbit_queue_index,scan_segments,3,
[{file,"src/rabbit_queue_index.erl"},{line,676}]},
{rabbit_queue_index,queue_index_walker_reader,2,
[{file,"src/rabbit_queue_index.erl"},{line,664}]}]},
{gen_server2,call,[<0.266.0>,out,infinity]}},
{child,undefined,msg_store_persistent,
{rabbit_msg_store,start_link,
[msg_store_persistent,
"/var/lib/rabbitmq/mnesia/rabbit@gitlab",[],
{#Fun<rabbit_queue_index.2.103862237>,
{start,
[{resource,<<"yun">>,queue,
<<"com.yun.app.api.internal.AppUserEvents:1.0.1:app-server">>},
{resource,<<"/">>,queue,
<<"com.yun.kcbp.finance.api.TicketService:1.0.1">>},
{resource,<<"yun">>,queue,
<<"com.yun.park.api.internal.ParkRecordEvents:1.0.1:park-server">>},
{resource,<<"yun">>,queue,
<<"com.yun.park.api.internal.ParkBusinessEvents:1.0.1:app-server">>},
{resource,<<"yun">>,queue,
<<"com.yun.park.api.internal.ParkRecordEvents:1.0.1:monitor-server">>},
{resource,<<"yun">>,queue,
<<"com.yun.sys.api.internal.SysLogEvents:1.0.1:sys-server">>},
{resource,<<"yun">>,queue,
<<"com.yun.park.api.internal.ParkRecordEvents:1.0.1:app-server">>}]}}]},
transient,30000,worker,
[rabbit_msg_store]}}}},
[{rabbit_variable_queue,start_msg_store,2,
[{file,"src/rabbit_variable_queue.erl"},{line,454}]},
{rabbit_variable_queue,start,1,
[{file,"src/rabbit_variable_queue.erl"},{line,436}]},
{rabbit_priority_queue,start,1,
[{file,"src/rabbit_priority_queue.erl"},{line,92}]},
{rabbit_amqqueue,recover,0,
[{file,"src/rabbit_amqqueue.erl"},{line,239}]},
{rabbit,recover,0,[{file,"src/rabbit.erl"},{line,652}]},
{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,
[{file,"src/rabbit_boot_steps.erl"},{line,49}]},
{rabbit_boot_steps,run_step,2,
[{file,"src/rabbit_boot_steps.erl"},{line,49}]},
{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,
[{file,"src/rabbit_boot_steps.erl"},{line,26}]}]}}
Log files (may contain more information):
/var/log/rabbitmq/rabbit@gitlab.log
/var/log/rabbitmq/rabbit@gitlab-sasl.log
最终办法
- 搜索百度、google无果
- 仔细看日志内容,发现应该是mq启动时候,恢复队列内容数据时候发生了错误
- 可能是由于磁盘满了,队列数据写磁盘时候发生日常,导致数据文件格式错误了
- 进入 cd /var/lib/rabbitmq/mnesia/rabbit@gitlab/queues
- 该目录是存放队列的数据文件的地方
- 找到子目录中的数据文件,把可能损坏的文件删除。(或者全部删掉,不过这些队列中的数据就丢失了)
- 然后重启,就正常了
- service rabbitmq-server start
PS:还好只是测试服务器,数据丢掉点问题不大。