一、场景
集群当前状态
# ceph -s
cluster e6ccdfaa-a729-4638-bcde-e539b1e7a28d
health HEALTH_OK
monmap e1: 3 mons at {bdc2=172.16.251.2:6789/0,bdc3=172.16.251.3:6789/0,bdc4=172.16.251.4:6789/0}
election epoch 82, quorum 0,1,2 bdc2,bdc3,bdc4
osdmap e3132: 27 osds: 26 up, 26 in
flags sortbitwise
pgmap v13259021: 4096 pgs, 4 pools, 2558 GB data, 638 kobjects
7631 GB used, 89048 GB / 96680 GB avail
4096 active+clean
client io 34720 kB/s wr, 0 op/s rd, 69 op/s wr
可以看到集群的状态为OK,但是可以27个osd有一个状态为down+up
补充知识:osd状态
up:守护进程运行中,能够提供IO服务;
down:守护进程不在运行,无法提供IO服务;
in:包含数据;
out:不包含数据
# ceph osd tree |grep down
0 3.63129 osd.0 down 0 1.00000
意味着osd.0进程不在运行而且也不包含数据了,这里的数据是指ceph集群存储的数据,可以进行验证。
- 首先是守护进程
# systemctl status ceph-osd@0
● ceph-osd@0.service - Ceph object storage daemon
Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since 四 2017-04-06 09:26:04 CST; 1h 2min ago
Process: 480723 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Process: 480669 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
Main PID: 480723 (code=exited, status=1/FAILURE)
4月 06 09:26:04 bdc2 systemd[1]: Unit ceph-osd@0.service entered failed state.
4月 06 09:26:04 bdc2 systemd[1]: ceph-osd@0.service failed.
4月 06 09:26:04 bdc2 systemd[1]: ceph-osd@0.service holdoff time over, scheduling restart.
4月 06 09:26:04 bdc2 systemd[1]: start request repeated too quickly for ceph-osd@0.service
4月 06 09:26:04 bdc2 systemd[1]: Failed to start Ceph object storage daemon.
4月 06 09:26:04 bdc2 systemd[1]: Unit ceph-osd@0.service entered failed state.
4月 06 09:26:04 bdc2 systemd[1]: ceph-osd@0.service failed.
查看osd.0的日志
# tail -f /var/log/ceph/ceph-osd.0.log
2017-04-06 09:26:04.531004 7f75f33d5800 0 filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
2017-04-06 09:26:04.531520 7f75f33d5800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2017-04-06 09:26:04.531528 7f75f33d5800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' co
nfig option
2017-04-06 09:26:04.531548 7f75f33d5800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice is supported
2017-04-06 09:26:04.532318 7f75f33d5800 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2017-04-06 09:26:04.532384 7f75f33d5800 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is disabled by conf
2017-04-06 09:26:04.730841 7f75f33d5800 -1 filestore(/var/lib/ceph/osd/ceph-0) Error initializing leveldb : IO error: /var/lib/ceph/osd/ceph-0/current/omap/MANIFEST-004467: In
put/output error
2017-04-06 09:26:04.730870 7f75f33d5800 -1 osd.0 0 OSD:init: unable to mount object store
2017-04-06 09:26:04.730879 7f75f33d5800 -1 ** ERROR: osd init failed: (1) Operation not permitted
再次是检查数据
# cd /var/lib/ceph/osd/ceph-0/current
# ls -lrt |tail -10
drwxr-xr-x 2 ceph ceph 58 4月 2 00:45 4.2e9_TEMP
drwxr-xr-x 2 ceph ceph 58 4月 2 00:45 4.355_TEMP
drwxr-xr-x 2 ceph ceph 58 4月 2 00:45 4.36c_TEMP
drwxr-xr-x 2 ceph ceph 58 4月 2 00:45 4.3ae_TEMP
drwxr-xr-x 2 ceph ceph 58 4月 2 00:46 4.3b2_TEMP
drwxr-xr-x 2 ceph ceph 58 4月 2 00:46 4.3e8_TEMP
drwxr-xr-x 2 ceph ceph 58 4月 2 00:46 4.3ea_TEMP
-rw-r--r--. 1 ceph ceph 10 4月 2 08:53 commit_op_seq
drwxr-xr-x. 2 ceph ceph 349 4月 5 10:01 omap
-rw-r--r--. 1 ceph ceph 0 4月 6 09:26 nosnap
任意挑选两个个pg进行查看
# ceph pg dump|grep 4.3ea
dumped all in format plain
4.3ea 2 0 0 0 0 8388608 254 254 active+clean2017-04-06 01:55:04.754593 1322'2543132:122[26,2,12] 26[
26,2,12]26 1322'2542017-04-06 01:55:04.754546 1322'2542017-04-02 00:46:12.611726
# ceph pg dump|grep 4.3e8
dumped all in format plain
4.3e8 1 0 0 0 0 4194304 12261226active+clean2017-04-06 01:26:43.827061 1323'1226 3132:127[2,15,5]2 [
2,15,5] 2 1323'1226 2017-04-06 01:26:43.827005 1323'1226 2017-04-06 01:26:43.827005
可以看到4.3ea很4.3e8的三个副本分别在 [26,2,12]和[2,15,5]这三个osd上了
- 小结:
意思很明显,ceph集群现在状态正常,已经把osd.0踢出了集群了,osd.0的守护进程无法启动,重启服务会报日志中的错误(解决办法未知),其中存储的数据也不在ceph集群中了,也就是说现在这个osd很鸡肋,所以打算将其彻底踢出集群,擦洗(zap)之后再重新加入集群。
二、移除osd
- 移出集群(管理节点执行)
# ceph osd out 0 (ceph osd tree中,REWEIGHT值变为0)
- 停止服务(目标节点执行)
# systemctl stop ceph-osd@0 (ceph osd tree中,状态变为DOWN)
该osd已经是out且down状态了,跳过1,2两步 3. 移出crush
# ceph osd crush remove osd.0
- 删除key
# ceph auth del osd.0
- 移除osd
# ceph osd rm 0
- 卸载挂载的目录
# df -h |grep ceph-0
/dev/sdc1 3.7T 265G 3.4T 8% /var/lib/ceph/osd/ceph-0
# umount /var/lib/ceph/osd/ceph-0
umount: /var/lib/ceph/osd/ceph-0:目标忙。
(有些情况下通过 lsof(8) 或 fuser(1) 可以
找到有关使用该设备的进程的有用信息)
提示无法卸载,使用fuser命令查看是什么占用了
# fuser -m -v /var/lib/ceph/osd/ceph-0
用户 进程号 权限 命令
/var/lib/ceph/osd/ceph-0:
root kernel mount /var/lib/ceph/osd/ceph-0
root 212444 ..c.. bash
终止掉这个占用的bash进程
# kill -9 212444
或者使用fuser来杀掉
# fuser -m -v -i -k /var/lib/ceph/osd/ceph-0
再进行卸载
# umount /var/lib/ceph/osd/ceph-0
卸载成功
7.擦除磁盘 上面df -h的时候已经看到ceph-0对应得是/dev/sdc这块盘(也可以使用ceph-disk list查看)
# ceph-disk zap /dev/sdc
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.
****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
查看此时ceph状态
# ceph -s
cluster e6ccdfaa-a729-4638-bcde-e539b1e7a28d
health HEALTH_WARN
170 pgs backfill_wait
10 pgs backfilling
362 pgs degraded
362 pgs recovery_wait
436 pgs stuck unclean
recovery 5774/2136302 objects degraded (0.270%)
recovery 342126/2136302 objects misplaced (16.015%)
monmap e1: 3 mons at {bdc2=172.16.251.2:6789/0,bdc3=172.16.251.3:6789/0,bdc4=172.16.251.4:6789/0}
election epoch 82, quorum 0,1,2 bdc2,bdc3,bdc4
osdmap e3142: 26 osds: 26 up, 26 in; 180 remapped pgs
flags sortbitwise
pgmap v13264634: 4096 pgs, 4 pools, 2558 GB data, 639 kobjects
7651 GB used, 89029 GB / 96680 GB avail
5774/2136302 objects degraded (0.270%)
342126/2136302 objects misplaced (16.015%)
3554 active+clean
362 active+recovery_wait+degraded
170 active+remapped+wait_backfill
10 active+remapped+backfilling
recovery io 354 MB/s, 89 objects/s
client io 1970 kB/s wr, 0 op/s rd, 88 op/s wr
等待集群重新recovery结束恢复到OK状态再添加新的OSD。(至于为什么要等到OK了再添加,我也不知道。)
三、添加osd
上面已经将卸载掉的osd对应磁盘擦除干净了,这里添加就比较方便了,直接使用ceph-deploy工具添加即可
# ceph-deploy --overwrite-conf osd create bdc2:/dev/sdc
命令执行结束,可以看到osd又重新添加了,而且id还是0
# df -h |grep ceph-0
/dev/sdc1 3.7T 74M 3.7T 1% /var/lib/ceph/osd/ceph-0
初始的很新鲜,没使用多少空间
# ceph-disk list |grep osd
/dev/sdc1 ceph data, active, cluster ceph, osd.0, journal /dev/sdc2
/dev/sdd1 ceph data, active, cluster ceph, osd.1, journal /dev/sdd2
/dev/sde1 ceph data, active, cluster ceph, osd.2, journal /dev/sde2
/dev/sdf1 ceph data, active, cluster ceph, osd.3, journal /dev/sdf2
查看此时的ceph状态,重新恢复到27个osd,静待集群恢复到ok状态
# ceph -s
cluster e6ccdfaa-a729-4638-bcde-e539b1e7a28d
health HEALTH_WARN
184 pgs backfill_wait
6 pgs backfilling
374 pgs degraded
374 pgs recovery_wait
83 pgs stuck unclean
recovery 4605/2114056 objects degraded (0.218%)
recovery 298454/2114056 objects misplaced (14.118%)
monmap e1: 3 mons at {bdc2=172.16.251.2:6789/0,bdc3=172.16.251.3:6789/0,bdc4=172.16.251.4:6789/0}
election epoch 82, quorum 0,1,2 bdc2,bdc3,bdc4
osdmap e3501: 27 osds: 27 up, 27 in; 190 remapped pgs
flags sortbitwise
pgmap v13275552: 4096 pgs, 4 pools, 2558 GB data, 639 kobjects
7647 GB used, 92751 GB / 100398 GB avail
4605/2114056 objects degraded (0.218%)
298454/2114056 objects misplaced (14.118%)
3532 active+clean
374 active+recovery_wait+degraded
184 active+remapped+wait_backfill
6 active+remapped+backfilling
recovery io 264 MB/s, 67 objects/s
client io 1737 kB/s rd, 63113 kB/s wr, 60 op/s rd, 161 op/s wr ```
参考链接 :[http://www.cnblogs.com/sammyliu/p/5555218.html](http://www.cnblogs.com/sammyliu/p/5555218.html)