ceph 多mon 挂掉后的快速恢复

最新推荐文章于 2024-01-24 11:43:32 发布

kong62

最新推荐文章于 2024-01-24 11:43:32 发布

阅读量3.6k

点赞数 1

分类专栏： Ceph

本文链接：https://blog.csdn.net/kong62/article/details/76998969

版权

Ceph 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

# ceph -s
cluster 0fbf2746-8132-4944-af64-e29e24e871bb
health HEALTH_WARN
1 mons down, quorum 1,2 ceph01,ceph03
monmap e3: 3 mons at {ceph01=172.28.13.58:6789/0,ceph02=172.28.13.59:6789/0,ceph03=172.28.13.60:6789/0}
election epoch 616, quorum 1,2 ceph01,ceph03
…….

上面看出ceph02挂掉了.

移除该mon
# ceph mon remove ceph02

清理该mon的文件
# rm -rf /var/lib/ceph/mon/ceph-ceph02

为ceph02 mon 初始化数据库 store.db
# ceph-mon –mkfs -i ceph02 –keyring /etc/ceph/ceph.mon.keyring

补充done空文件
# touch /var/lib/ceph/mon/ceph-ceph02/done

添加ceph02 mon的keyring
# ceph auth get-or-create mon.ceph02 mon ‘allow rwx’ osd ‘allow *’ -o /var/lib/ceph/mon/ceph-ceph02/keyring

补充服务相关空文件：
如果是sysvinit来管理的用这个：
# touch /var/lib/ceph/mon/ceph-ceph02/sysvinit
如果是systemctl来管理的用这个：
# touch /var/lib/ceph/mon/ceph-ceph02/systemd

修改文件权限
# chown -R ceph:ceph /var/lib/ceph/mon/ceph-ceph02

添加mon到集群
# ceph mon add ceph02 172.28.13.59:6789

重启服务
# systemctl restart ceph-mon@ceph02.service

报错了：
Apr 26 17:14:48 ceph02 systemd[1]: ceph-mon@ceph02.service failed.
Apr 26 17:14:48 ceph02 polkitd[741]: Unregistered Authentication Agent for unix-process:29956:253026270 (system bus name :1.10026, object path /org/freedesktop/PolicyKit1/AuthenticationAg
Apr 26 17:14:54 ceph02 polkitd[741]: Registered Authentication Agent for unix-process:29988:253026886 (system bus name :1.10027 [/usr/bin/pkttyagent –notify-fd 5 –fallback], object path
Apr 26 17:14:54 ceph02 systemd[1]: start request repeated too quickly for ceph-mon@ceph02.service
Apr 26 17:14:54 ceph02 systemd[1]: Failed to start Ceph cluster monitor daemon.
— Subject: Unit ceph-mon@ceph02.service has failed

reload下，再次重启正常
# systemctl daemon-reload
# systemctl restart ceph-mon@ceph02.service

# ceph -s
cluster 0fbf2746-8132-4944-af64-e29e24e871bb
health HEALTH_OK
monmap e7: 3 mons at {ceph01=172.28.13.58:6789/0,ceph02=172.28.13.59:6789/0,ceph03=172.28.13.60:6789/0}
election epoch 626, quorum 0,1,2 ceph01,ceph02,ceph03
fsmap e98: 1/1/1 up {0=ceph01=up:active}
osdmap e280: 6 osds: 6 up, 6 in
flags sortbitwise,require_jewel_osds
pgmap v41214: 594 pgs, 16 pools, 2197 MB data, 854 objects
7669 MB used, 592 GB / 599 GB avail
594 active+clean

验证下是否有文件丢失
# ceph osd pool ls
# rados -p rbd ls

kong62

关注

1
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
ceph 多mon 挂掉后的快速恢复

# ceph -scluster 0fbf2746-8132-4944-af64-e29e24e871bbhealth HEALTH_WARN1 mons down, quorum 1,2 ceph01,ceph03monmap e3: 3 mons at {ceph01=172.28.13.58:6789/0,ceph02=172.28.13.59:6789/0,ceph03=1
复制链接

扫一扫

专栏目录