Ceph monitor故障恢复
查看ceph健康状态
[root@bgw-os-node151 ~]# ceph health
HEALTH_OK
[root@bgw-os-node151 ~]# ceph health detail
HEALTH_OK
[root@bgw-os-node151 ~]# ceph mon stat
e2: 3 mons at{bgw-os-node151=10.240.216.151:6789/0,bgw-os-node152=10.240.216.152:6789/0,bgw-os-node153=10.240.216.153:6789/0},election epoch 12, quorum 0,1,2 bgw-os-node151,bgw-os-node152,bgw-os-node153
故障一:Ceph mon进程异常退出且系统运行正常
故障错误信息
[root@bgw-os-node151 ~]# ceph health detail
HEALTH_WARN 1 mons down, quorum 0,1bgw-os-node151,bgw-os-node152
mon.bgw-os-node153 (rank 2) addr10.240.216.153:6789/0 is down (out of quorum)
解决办法
这类故障重启相应的mon进程即可恢复
[root@bgw-os-node153 ceph]# service ceph -c/etc/ceph/ceph.conf start mon.bgw-os-node153
=== mon.bgw-os-node153 ===
Starting Ceph mon.bgw-os-node153 onbgw-os-node153...
Starting ceph-create-keys onbgw-os-node153...
[root@bgw-os-node153 ceph]# ps aux |grepmon
dbus 2215 0.0 0.0 21588 2448 ? Ss May08 0:00 dbus-daemon --system
root 18516 0.1 0.0 151508 15612 pts/0 Sl 14:57 0:00 /usr/bin/ceph-mon -ibgw-os-node153 --pid-file /var/run/ceph/mon.bgw-os-node153.pid -c/etc/ceph/ceph.conf --cluster ceph
root 18544 0.0 0.0 103308 2092 pts/0 S+ 14:57 0:00 grep mon
[root@bgw-os-node153 ceph]# ceph healthdetail
HEALTH_OK
故障二:Ceph集群中有超过半数的mon进程挂掉
一般来说,在实际运行中,ceph monitor的个数是2n+1(n>=0)个&#