ceph osd 磁盘损坏处理

14 篇文章 5 订阅

环境:cenot7,ceph luminious,服务器为Proliant DL380 Gen9 安装了 hp ilo4

(一) 从 ceph 删除该 osd

1、登陆 ceph mon 节点,查看坏掉的 osd

2、mon 上执行 out osd.x

ceph osd out osd.x

3、从 crush map 中删除 osd.x,防止它再接受数据

ceph osd crush remove osd.x
ceph auth del osd.x
ceph osd rm osd.x
[root@bakmtr01 ~]# ceph -s
  cluster:
    id:     0e38e7c6-a704-4132-b0e3-76b87f18d8fa
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum bakmtr01,bakmtr02,bakmtr03
    mgr: bakmtr03(active), standbys: bakmtr01, bakmtr02
    osd: 99 osds: 99 up, 99 in
    rgw: 3 daemons active
...

确认已经删除

ceph osd destroy osd.x --yes-i-really-mean-it

这些步骤相当于

ceph osd purge osd.x --yes-i-really-mean-it

4、osd 节点执行 umount /var/lib/ceph/osd/ceph-x

umount /var/lib/ceph/osd/ceph-x

5、查找 osd.x 对应的 device,lv、pv、vg

[root@bakcmp31 ~]# ceph-volume inventory /dev/sdt

====== Device report /dev/sdt ======

     available                 False
     rejected reasons          locked
     path                      /dev/sdt
     scheduler mode            deadline
     rotational                1
     vendor                    HP
     human readable size       1.64 TB
     sas address               
     removable                 0
     model                     LOGICAL VOLUME
     ro                        0
    --- Logical Volume ---
     cluster name              ceph
     name                      osd-data-3f2e912c-f327-4221-b350-a4b3de4376b6
     osd id                    1
     cluster fsid              0e38e7c6-a704-4132-b0e3-76b87f18d8fa
     type                      block
     block uuid                V8RGFc-omqm-B1E2-mKz1-TXfl-2lK3-CF2d0L
     osd fsid                  2f1aaa8a-f50d-4335-a812-5dd86e8042a3

也可以查看所有磁盘对应的 osd_id

ceph-volume inventory --format json-pretty

还可以通过 ceph-volume lvm list

[root@bakcmp31 ~]# ceph-volume lvm list | grep -A 16 "osd.1 "
====== osd.1 =======

  [block]    /dev/ceph-757f4a80-60e2-425b-a8fd-629a735a5acd/osd-data-3f2e912c-f327-4221-b350-a4b3de4376b6

      type                      block
      osd id                    1
      cluster fsid              0e38e7c6-a704-4132-b0e3-76b87f18d8fa
      cluster name              ceph
      osd fsid                  2f1aaa8a-f50d-4335-a812-5dd86e8042a3
      encrypted                 0
      cephx lockbox secret      
      block uuid                V8RGFc-omqm-B1E2-mKz1-TXfl-2lK3-CF2d0L
      block device              /dev/ceph-757f4a80-60e2-425b-a8fd-629a735a5acd/osd-data-3f2e912c-f327-4221-b350-a4b3de4376b6
      vdo                       0
      crush device class        None
      devices                   /dev/sdt

6、查看 osd1 对应的 lv、vg

[root@bakcmp31 ~]# ceph-volume lvm list /dev/ceph-757f4a80-60e2-425b-a8fd-629a735a5acd/osd-data-3f2e912c-f327-4221-b350-a4b3de4376b6
====== osd.1 =======
  [block]    /dev/ceph-757f4a80-60e2-425b-a8fd-629a735a5acd/osd-data-3f2e912c-f327-4221-b350-a4b3de4376b6
...
      block device              /dev/ceph-757f4a80-60e2-425b-a8fd-629a735a5acd/osd-data-3f2e912c-f327-4221-b350-a4b3de4376b6
...
      devices                   /dev/sdt

7、删除 lv 、vg

[root@bakcmp31 ~]# lvremove ceph-757f4a80-60e2-425b-a8fd-629a735a5acd/osd-data-3f2e912c-f327-4221-b350-a4b3de4376b6
Do you really want to remove active logical volume ceph-757f4a80-60e2-425b-a8fd-629a735a5acd/osd-data-3f2e912c-f327-4221-b350-a4b3de4376b6? [y/n]: y
  Logical volume "osd-data-3f2e912c-f327-4221-b350-a4b3de4376b6" successfully removed
[root@bakcmp31 ~]# vgremove ceph-757f4a80-60e2-425b-a8fd-629a735a5acd
  Volume group "ceph-757f4a80-60e2-425b-a8fd-629a735a5acd" successfully removed

8、删除 pv

找到 osd 对应的 lvs,删除,没有报错的话,删除对应的 vg、pv

[root@cmp17 ~]# lvremove /dev/ceph-a090a75a-bd1c-4c41-9505-55e9919c54c7/osd-data-c9e93977-654c-48ff-9c94-f92ffd1def69
  WARNING: Device for PV Eeuf0S-XkKi-UwwB-35C8-Eozs-YNFR-0CUSw8 not found or rejected by a filter.
  Couldn't find device with uuid Eeuf0S-XkKi-UwwB-35C8-Eozs-YNFR-0CUSw8.
Do you really want to remove active logical volume ceph-a090a75a-bd1c-4c41-9505-55e9919c54c7/osd-data-c9e93977-654c-48ff-9c94-f92ffd1def69? [y/n]: y
  Aborting vg_write: No metadata areas to write to!

报错, 刷新 pv

pvscan --cache

手工删除pv是不行的,这里需要用到一个pvscan --cache命令去刷新缓存,之后再看pv、vg、lv通通都被清理掉了 (感觉不出来有啥变化)

[root@bakcmp31 ~]# pvscan --cache
[root@bakcmp31 ~]# pvs
  PV         VG                                        Fmt  Attr PSize  PFree 
  /dev/sdb   ceph-7db7008f-5eea-40b6-b289-6ae7d8a8ed91 lvm2 a--  <1.64t     0 
...
  /dev/sdt                                             lvm2 ---  <1.64t <1.64t
  /dev/sdu   ceph-02a02c1e-018b-4ea0-8c08-a4fb58547818 lvm2 a--  <1.64t     0

9、依旧可能遇到删除不彻底的问题,如何操作呢?

查看操作

[root@cmp39 ~]# dmsetup ls

删除操作

[root@cmp39 ~]# dmsetup remove ***
(二)更换硬盘后,重建 raid0

根据 ilo 查看对应物理 drive,记录1I:1:20

或者使用 hpssacli 查看 pd 对应的 ld

[root@cmp17 ~]# hpssacli ctrl slot=0 show config detail

安装 hpssacli,从 hp 官网下载 https://support.hpe.com/hpsc/swd/public/detail?swItemId=MTX_04bffb688a73438598fef81ddd

rpm -ivh hpssacli-2.40-13.0.x86_64.rpm

hpssacli 常用命令

hpssacli ctrl slot=0 pd all show
hpssacli ctrl slot=0 pd all show status
hpssacli ctrl slot=0 ld all show
hpssacli ctrl slot=0 ld all show status

物理 drive 都没问题

[root@cmp17 ~]# hpssacli ctrl slot=0 pd all show status

逻辑 drive 19 error

[root@cmp17 ~]# hpssacli ctrl slot=0 ld all show status
...
   logicaldrive 19 (1.6 TB, 0): Failed
   logicaldrive 20 (1.6 TB, 0): OK
   logicaldrive 21 (1.6 TB, 0): O

查看逻辑 drive 19 对应的设备名,没有显示,说明还没有做 raid

[root@cmp17 ~]# hpssacli ctrl slot=0 ld xx show

删除逻辑 drive 19

[root@cmp17 ~]# hpssacli ctrl slot=0 ld xx delete

创建逻辑 drive 19

[root@cmp17 ~]# hpssacli ctrl slot=0 create type=ld drives=1I:1:xx raid=0
(三) 节点加入 osd

也可以查看所有磁盘对应的 osd_id

ceph-volume inventory /dev/sdx --format json-pretty

批量创建 osd

batch Creates OSDs from a list of devices using a filestore or bluestore (default) setup

[root@bakcmp31 ~]# ceph-volume lvm batch --bluestore /dev/sdx

ceph-volume lvm activate

[root@bakcmp31 ~]# ceph-volume lvm activate --all

检查

osd 节点

[root@bakcmp31 ~]# systemctl status ceph-osd@x

mon 节点

[root@bakmtr01 ~]# ceph -s
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值