硬盘热插拔实验:
做这个实验,是为了以后安装完ceph集群后,碰到硬盘故障,如果支持热插拔,就可以方便更换硬盘,并重新把磁盘加入ceph存储。
如果,硬盘不支持热插拔,当有磁盘故障,更换磁盘后,系统不能识别到磁盘,需要重启系统后,让磁盘重新识别上。
硬件:
机器型号:dell 630 ; 系统:centos 7.4 ;
raid卡:PERC H730P Mini ,4个1T的磁盘设置为 non-raid ;
试验结果:支持磁盘热插拔,当更换硬盘后,系统自动识别新插入的硬盘。
实验过程:
开机状态下拔掉一个硬盘:sdd
[root@localhost ~]# tail /var/log/messages -f
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#7 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#7 CDB: Write(10) 2a 00 20 81 07 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545326848
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#6 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#6 CDB: Write(10) 2a 00 20 81 05 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545326336
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#5 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#5 CDB: Write(10) 2a 00 20 81 03 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545325824
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#4 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#4 CDB: Write(10) 2a 00 20 81 01 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545325312
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#3 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#3 CDB: Write(10) 2a 00 20 80 ff 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545324800
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#2 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#2 CDB: Write(10) 2a 00 20 80 fd 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545324288
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#1 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#1 CDB: Write(10) 2a 00 20 80 fb 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545323776
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Nov 21 00:43:44 localhost kernel: sd 0:0:3:0: [sdd] tag#0 CDB: Write(10) 2a 00 20 80 f9 00 00 02 00 00
Nov 21 00:43:44 localhost kernel: blk_update_request: I/O error, dev sdd, sector 545323264
Nov 21 00:43:44 localhost kernel: Aborting journal on device sdd1-8.
Nov 21 00:43:44 localhost kernel: JBD2: Error -5 detected when updating journal superblock for sdd1-8.
Nov 21 00:43:44 localhost systemd: Unmounting /data-d...
Nov 21 00:43:44 localhost kernel: EXT4-fs error (device sdd1): ext4_put_super:791: Couldn't clean up the journal
Nov 21 00:43:44 localhost kernel: EXT4-fs (sdd1): Remounting filesystem read-only
Nov 21 00:43:44 localhost systemd: Unmounted /data-d.
查看块设备信息,发现sdd已经不存在:
[root@localhost /]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.1T 0 disk
├─sda1 8:1 0 500M 0 part /boot
└─sda2 8:2 0 1.1T 0 part
├─centos-root 253:0 0 50G 0 lvm /
├─centos-swap 253:1 0 31.4G 0 lvm [SWAP]
└─centos-home 253:2 0 1T 0 lvm /home
sdb 8:16 0 1.1T 0 disk
└─sdb1 8:17 0 1.1T 0 part /data-b
sdc 8:32 0 1.1T 0 disk
└─sdc1 8:33 0 1.1T 0 part /data-c
sr0 11:0 1 1024M 0 rom
重新插上硬盘:sdd ,并查看日志:
[root@localhost ~]# tail /var/log/messages -f
Nov 21 00:46:37 localhost systemd: Started Network Manager Script Dispatcher Service.
Nov 21 00:46:37 localhost nm-dispatcher: Dispatching action 'dhcp4-change' for em1
Nov 21 00:48:01 localhost kernel: scsi 0:0:3:0: Direct-Access HGST HUC101812CSS204 FJ23 PQ: 0 ANSI: 6
Nov 21 00:48:01 localhost kernel: sd 0:0:3:0: [sdd] Disabling DIF Type 2 protection
Nov 21 00:48:01 localhost kernel: sd 0:0:3:0: [sdd] 2344225968 512-byte logical blocks: (1.20 TB/1.09 TiB)
Nov 21 00:48:01 localhost kernel: sd 0:0:3:0: Attached scsi generic sg3 type 0
Nov 21 00:48:01 localhost kernel: sd 0:0:3:0: [sdd] Write Protect is off
Nov 21 00:48:01 localhost kernel: sd 0:0:3:0: [sdd] Write cache: disabled, read cache: enabled, supports DPO and FUA
Nov 21 00:48:01 localhost kernel: sdd: sdd1
Nov 21 00:48:01 localhost kernel: sd 0:0:3:0: [sdd] Attached SCSI disk
再次查看块设备信息,sdd已经被系统识别:
[root@localhost /]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.1T 0 disk
├─sda1 8:1 0 500M 0 part /boot
└─sda2 8:2 0 1.1T 0 part
├─centos-root 253:0 0 50G 0 lvm /
├─centos-swap 253:1 0 31.4G 0 lvm [SWAP]
└─centos-home 253:2 0 1T 0 lvm /home
sdb 8:16 0 1.1T 0 disk
└─sdb1 8:17 0 1.1T 0 part /data-b
sdc 8:32 0 1.1T 0 disk
└─sdc1 8:33 0 1.1T 0 part /data-c
sdd 8:48 0 1.1T 0 disk
└─sdd1 8:49 0 1.1T 0 part
sr0 11:0 1 1024M 0 rom
关于硬盘分区 uuid
参考 :
http://blog.csdn.net/blaider/article/details/48264473
http://blog.csdn.net/smstong/article/details/46417213
实验:
[root@localhost /]# blkid /dev/sdd1
/dev/sdd1: UUID="aa9766c9-08e0-4f47-bfe0-4d5d9fda9b6b" TYPE="ext4"
删除分区后,重新建分区,uuid就变了:
[root@localhost /]# blkid /dev/sdd1
/dev/sdd1: UUID="5e673343-791a-4f60-a9ec-fdd52c8c394c" TYPE="ext4"
[root@localhost /]# ls /dev/disk/by-uuid/ -l
lrwxrwxrwx. 1 root root 10 11月 21 01:05 5e673343-791a-4f60-a9ec-fdd52c8c394c -> ../../sdd1