一 摘要
本文主要介绍从ceph 某台节点上卸载一块或者多块osd(硬盘)
二 环境信息
2.1 操作系统版本
[root@proceph05 ~]# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)
[root@proceph05 ~]#
2.2 ceph 版本
[cephadmin@proceph05 ~]$ ceph -v
ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)
[cephadmin@proceph05 ~]$
2.3 ceph 集群概况
目前有5台ceph 节点,每台上各6块硬盘。
三 实施
本文参考官方文档https://docs.ceph.com/en/nautilus/rados/operations/add-or-rm-osds/
3.1 查看osd 状态
ceph osd tree
3.2 Take the OSD out of the Cluster
ceph osd out osd.29
3.3 Observe the Data Migration
[cephadmin@proceph05 ~]$ ceph -w
cluster:
id: 9cdee1f8-f168-4151-82cd-f6591855ccbe
health: HEALTH_WARN
10 nearfull osd(s)
1 pool(s) nearfull
Low space hindering backfill (add storage if this doesn't resolve itself): 19 pgs backfill_toofull
4 pgs not deep-scrubbed in time
8 pgs not scrubbed in time
services:
mon: 5 daemons, quorum proceph01,proceph02,proceph03,proceph04,proceph05 (age 5w)
mgr: proceph01(active, since 16M), standbys: proceph03, proceph02, proceph04, proceph05
osd: 30 osds: 30 up (since 9M), 27 in (since 41h); 20 remapped pgs
data:
pools: 1 pools, 512 pgs
objects: 13.89M objects, 53 TiB
usage: 158 TiB used, 61 TiB / 218 TiB avail
pgs: 541129/41679906 objects misplaced (1.298%)
486 active+clean
19 active+remapped+backfill_toofull
6 active+clean+scrubbing+deep
1 active+remapped+backfilling
io:
client: 1018 KiB/s rd, 41 MiB/s wr, 65 op/s rd, 2.83k op/s wr
recovery: 13 MiB/s, 3 objects/s
progress:
Rebalancing after osd.28 marked out
[=========================.....]
Rebalancing after osd.29 marked out
[==========================....]
Rebalancing after osd.27 marked out
[=========================.....]
3.4 Stopping the OSD
在待卸载osd 服务器上用root 用户操作 ,停止osd 服务
[root@proceph05 ~]# systemctl stop ceph-osd@28
[root@proceph05 ~]# systemctl status ceph-osd@28
● ceph-osd@28.service - Ceph object storage daemon osd.28
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: disabled)
Active: inactive (dead) since 四 2023-04-20 09:52:55 CST; 1min 18s ago
Process: 32987 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=0/SUCCESS)
Main PID: 32987 (code=exited, status=0/SUCCESS)
检查 osd up 数量已经变少,但是还是有30个osd
[cephadmin@proceph05 ~]$ ceph -s
cluster:
id: 9cdee1f8-f168-4151-82cd-f6591855ccbe
health: HEALTH_WARN
10 nearfull osd(s)
1 pool(s) nearfull
Low space hindering backfill (add storage if this doesn't resolve itself): 14 pgs backfill_toofull
Degraded data redundancy: 377301/41316627 objects degraded (0.913%), 14 pgs degraded, 14 pgs undersized
12 pgs not deep-scrubbed in time
9 pgs not scrubbed in time
services:
mon: 5 daemons, quorum proceph01,proceph02,proceph03,proceph04,proceph05 (age 5w)
mgr: proceph01(active, since 16M), standbys: proceph03, proceph02, proceph04, proceph05
osd: 30 osds: 27 up (since 5m), 27 in (since 2d); 14 remapped pgs
3.5 Removing the OSD
cephadmin 用户操作
[cephadmin@proceph05 ~]$ ceph osd purge 29 --yes-i-really-mean-it
purged osd.29
检查
[cephadmin@proceph05 ~]$ ceph -s
cluster:
id: 9cdee1f8-f168-4151-82cd-f6591855ccbe
health: HEALTH_WARN
10 nearfull osd(s)
1 pool(s) nearfull
Low space hindering backfill (add storage if this doesn't resolve itself): 13 pgs backfill_toofull
Degraded data redundancy: 350543/41316651 objects degraded (0.848%), 13 pgs degraded, 13 pgs undersized
12 pgs not deep-scrubbed in time
9 pgs not scrubbed in time
services:
mon: 5 daemons, quorum proceph01,proceph02,proceph03,proceph04,proceph05 (age 5w)
mgr: proceph01(active, since 16M), standbys: proceph03, proceph02, proceph04, proceph05
osd: 29 osds: 27 up (since 7m), 27 in (since 2d); 93 remapped pgs
可见osd 总数量已经由30减少为29了。
也可以用ceph osd tree 检查。
3.6 从服务器上把卸载的osd 对应硬盘拆下
我是从服务器bmc, 将硬盘offline ,然后去ceph 里看看有没有osd down
,若没有则 offline 没问题,若有请立即online,然后重启该osd即可。
[cephadmin@proceph05 ~]$ sudo systemctl reset-failed ceph-osd@25
[cephadmin@proceph05 ~]$ sudo systemctl start ceph-osd@25
[cephadmin@proceph05 ~]$ sudo systemctl status ceph-osd@25
● ceph-osd@25.service - Ceph object storage daemon osd.25
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: disabled)
Active: active (running) since Thu 2023-04-20 10:27:59 CST; 33s ago
Process: 276984 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
Main PID: 276990 (ceph-osd)
CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@25.service
└─276990 /usr/bin/ceph-osd -f --cluster ceph --id 25 --setuser ceph --setgroup ceph