Openstack 异常断电导致存储在ceph集群中的虚拟机起不来

原文链接:https://www.cpweb.top/1768

一、问题回顾

openstack 集群所有机器再经过一次异常断电后,存储在 ceph 集群中虚拟机起不来,ceph集群为双副本。
在断电之前关机的虚拟机是正常的,新建的虚拟机也正常。只有在断电时候处于运行的虚拟机是起不来的。
ceph集群状态:

[root@controller ~]# ceph -s
  cluster:
    id:     6abf44d1-8ad2-4155-88db-8df0e79d576b
    health: HEALTH_OK
 
  services:
    mon: 2 daemons, quorum controller,compute (age 18m)
    mgr: controller(active, since 17m), standbys: compute
    osd: 12 osds: 12 up (since 17m), 12 in (since 3h)
 
  data:
    pools:   3 pools, 768 pgs
    objects: 3.17k objects, 12 GiB
    usage:   36 GiB used, 87 TiB / 87 TiB avail
    pgs:     768 active+clean

centos 虚拟机:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-nim9dSL5-1627025321802)(https://www.cpweb.top/wp-content/uploads/2021/07/centos202177.png)]

ubuntu 虚拟机:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VJpW6bU9-1627025321805)(https://www.cpweb.top/wp-content/uploads/2021/07/ubunut202177.png)]

二、问题解决

需要修复磁盘,因为 centos 和 ubuntu 文件系统不一样,所以修复的方式也不相同。
修复之前,先确认 ceph 集群是健康的,然后关闭需要修复的虚拟机。

1、centos 虚拟机

先将虚拟机的块设备映射到出来。块设备映射到操作系统的命令是:rbd map {image-name} --pool {pool-name}

# 查看
[root@controller ~]# openstack server list
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
| ID                                   | Name  | Status  | Networks           | Image           | Flavor |
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
| 6fa6b48a-d5df-49b7-98e2-f0d6aadb9820 | test2 | SHUTOFF | int-net=10.0.0.33  | ubuntu-20.04    | 2核4G  |
| b8ee1532-cf05-41e5-93cc-3ea3de0c96c9 | test1 | SHUTOFF | int-net=10.0.0.114 | centos-7.6.1810 | 2核4G  |
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
[root@controller ~]# rbd ls vms
6fa6b48a-d5df-49b7-98e2-f0d6aadb9820_disk
b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk

# 禁用当前系统内核不支持的feature
[root@controller ~]# rbd feature disable  b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk exclusive-lock, object-map, fast-diff, deep-flatten --pool vms

# 映射RBD MAP
[root@controller ~]# rbd map b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk --pool vms
/dev/rbd0

# 查看映射列表
[root@controller ~]# rbd showmapped
id pool namespace image                                     snap device    
0  vms            b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk -    /dev/rbd0 
[root@controller ~]# lsblk
......
rbd0            252:0    0   200G  0 disk 
└─rbd0p1        252:1    0   200G  0 part 

# xfs文件系统修复
[root@controller ~]# xfs_repair -L /dev/rbd0p1

# 修复完取消映射
[root@controller ~]# rbd unmap b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk --pool vms

修复完成,在开启虚拟机,正常。

2、ubuntu 虚拟机

同样,先将虚拟机的块设备映射到出来。

# 查看
[root@controller ~]# openstack server list
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
| ID                                   | Name  | Status  | Networks           | Image           | Flavor |
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
| 6fa6b48a-d5df-49b7-98e2-f0d6aadb9820 | test2 | SHUTOFF | int-net=10.0.0.33  | ubuntu-20.04    | 2核4G  |
| b8ee1532-cf05-41e5-93cc-3ea3de0c96c9 | test1 | SHUTOFF | int-net=10.0.0.114 | centos-7.6.1810 | 2核4G  |
+--------------------------------------+-------+---------+--------------------+-----------------+--------+
[root@controller ~]# rbd ls vms
6fa6b48a-d5df-49b7-98e2-f0d6aadb9820_disk
b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk

# 禁用当前系统内核不支持的feature
[root@controller ~]# rbd feature disable 6fa6b48a-d5df-49b7-98e2-f0d6aadb9820_disk exclusive-lock, object-map, fast-diff, deep-flatten --pool vms

# 映射RBD MAP
[root@controller ~]# rbd map 6fa6b48a-d5df-49b7-98e2-f0d6aadb9820_disk --pool vms
/dev/rbd0

# 查看映射列表
[root@controller ~]# rbd showmapped
id pool namespace image                                     snap device    
0  vms            b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk -    /dev/rbd0 
[root@controller ~]# lsblk
......
rbd0                 252:0    0   200G  0 disk 
├─rbd0p14            252:14   0     4M  0 part 
├─rbd0p15            252:15   0   106M  0 part 
└─rbd0p1             252:1    0 199.9G  0 part 

# ext4文件系统修复(系统自带版本太老,需要新版版,在修复前先编译新版e2fsck)
[root@controller ~]# cd e2fsprogs-1.46.2/e2fsck/
[root@controller e2fsck]# ./e2fsck -f -c -v /dev/rbd0p1

# 修复完取消映射
[root@controller ~]# rbd unmap b8ee1532-cf05-41e5-93cc-3ea3de0c96c9_disk --pool vms

编译新版e2fsck,e2fsck最新版下载地址: https://sourceforge.net/projects/e2fsprogs/

[root@controller ~]# wget https://nchc.dl.sourceforge.net/project/e2fsprogs/e2fsprogs/v1.46.2/e2fsprogs-1.46.2.tar.gz
[root@controller ~]# tar zxvf e2fsprogs-1.46.2.tar.gz
[root@controller ~]# cd e2fsprogs-1.46.2
[root@controller e2fsprogs-1.46.2]# ./configure
[root@controller e2fsprogs-1.46.2]# make -j 30
[root@controller e2fsprogs-1.46.2]# cd e2fsck/
[root@controller e2fsck]# ./e2fsck
Usage: ./e2fsck [-panyrcdfktvDFV] [-b superblock] [-B blocksize]
                [-l|-L bad_blocks_file] [-C fd] [-j external_journal]
                [-E extended-options] [-z undo_file] device

Emergency help:
 -p                   Automatic repair (no questions)
 -n                   Make no changes to the filesystem
 -y                   Assume "yes" to all questions
 -c                   Check for bad blocks and add them to the badblock list
 -f                   Force checking even if filesystem is marked clean
 -v                   Be verbose
 -b superblock        Use alternative superblock
 -B blocksize         Force blocksize when looking for superblock
 -j external_journal  Set location of the external journal
 -l bad_blocks_file   Add to badblocks list
 -L bad_blocks_file   Set badblocks list
 -z undo_file         Create an undo file

参考文章:
https://azhegit.gitee.io/2019/08/16-openstack停电故障修复/
https://blog.frognew.com/2017/02/ceph-rbd.html

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值