ceph集群重启之后节点掉线，健康检查出错的问题

最新推荐文章于 2024-08-08 11:11:59 发布

为了馒头争口气

最新推荐文章于 2024-08-08 11:11:59 发布

阅读量3.2k

点赞数 1

分类专栏： ceph 存储文章标签：排错 linux ceph

本文链接：https://blog.csdn.net/Lfwthotpt/article/details/104781874

版权

存储同时被 2 个专栏收录

6 篇文章

订阅专栏

ceph

3 篇文章

订阅专栏

一：重启之后出现报错

[root@ct ~(keystone_admin)]# systemctl list-units --type=service|grep ceph
  ceph-crash.service                        loaded active running Ceph crash dump collector
● ceph-mgr@comp2.service                    loaded failed failed  Ceph cluster manager daemon
  ceph-mgr@ct.service                       loaded active running Ceph cluster manager daemon
● ceph-mon@comp1.service                    loaded failed failed  Ceph cluster monitor daemon
● ceph-mon@comp2.service                    loaded failed failed  Ceph cluster monitor daemon
  ceph-mon@ct.service                       loaded active running Ceph cluster monitor daemon
  ceph-osd@0.service                        loaded active running Ceph object storage daemon osd.0
● ceph-osd@comp2.service                    loaded failed failed  Ceph object storage daemon osd.comp2
● ceph-osd@ct.service                       loaded failed failed  Ceph object storage daemon osd.ct
[root@ct ~(keystone_admin)]# systemctl reset-failed ceph-mgr@comp2.service
[root@ct ~(keystone_admin)]# systemctl reset-failed ceph-mon@comp1.service
[root@ct ~(keystone_admin)]# systemctl reset-failed ceph-mon@comp2.service
[root@ct ~(keystone_admin)]# systemctl reset-failed ceph-osd@comp2.service
[root@ct ~(keystone_admin)]# systemctl reset-failed ceph-osd@ct.service

然后再查看状态

[root@ct ~(keystone_admin)]# ceph osd status
+----+------+-------+-------+--------+---------+--------+---------+-----------+
| id | host |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
+----+------+-------+-------+--------+---------+--------+---------+-----------+
| 0  |      |    0  |    0  |    0   |     0   |    0   |     0   | exists,up |
| 1  |      |    0  |    0  |    0   |     0   |    0   |     0   | exists,up |
| 2  |      |    0  |    0  |    0   |     0   |    0   |     0   |   exists  |
+----+------+-------+-------+--------+---------+--------+---------+-----------+
[root@ct ~(keystone_admin)]# ceph -s
  cluster:
    id:     15200f4f-1a57-46c5-848f-9b8af9747e54
    health: HEALTH_WARN
            Reduced data availability: 192 pgs inactive, 192 pgs peering
            1 slow ops, oldest one blocked for 584 sec, mon.ct has slow ops
 
  services:
    mon: 3 daemons, quorum ct,comp1,comp2
    mgr: ct(active), standbys: comp1
    osd: 3 osds: 2 up, 2 in
 
  data:
    pools:   3 pools, 192 pgs
    objects: 406  objects, 1.8 GiB
    usage:   8.3 GiB used, 3.0 TiB / 3.0 TiB avail
    pgs:     100.000% pgs not active
             192 peering

重启服务，哪个节点的服务掉了，就到哪个节点去重启对应的服务

systemctl stop ceph-mon.target
systemctl restart ceph-mon.target
systemctl status ceph-mon.target
systemctl enable ceph-mon.target

systemctl stop ceph-mgr.target
systemctl restart ceph-mgr.target
systemctl status ceph-mgr.target
systemctl enable ceph-mgr.target

systemctl restart ceph-osd.target
systemctl status ceph-osd.target
systemctl enable ceph-osd.target

ceph osd pool application enable vms mon
ceph osd pool application enable images mon
ceph osd pool application enable volumes mon