RAID1拔掉一个盘恢复步骤

2 篇文章 0 订阅

流程如下

1.格式化拔掉的这个盘,并插入到原槽位

[root@node-1 ~]# /opt/MegaRAID/MegaCli/MegaCli64  -PDlist -aALL

Enclosure Device ID: 252
Slot Number: 6
Enclosure position: N/A
Device Id: 14
WWN: 55cd2e4150226499
Sequence Number: 10
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA

Raw Size: 447.130 GB [0x37e436b0 Sectors]
Non Coerced Size: 446.630 GB [0x37d436b0 Sectors]
Coerced Size: 446.625 GB [0x37d40000 Sectors]
Sector Size:  512
Logical Sector Size:  512
Physical Sector Size:  4096
Firmware state: Unconfigured(bad)
Device Firmware Level: 0100
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x4433221106000000
Connected Port Number: 5(path0) 
Inquiry Data: ATA     INTEL SSDSC2KG480100BTYG84910CZF480BGN  
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Solid State Device
Drive:  Not Certified
Drive Temperature :30C (86.00 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Drive's NCQ setting : Enabled
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Drive has flagged a S.M.A.R.T alert : No

2.其中"Firmware state: Unconfigured(bad)"

3.原因分析

$ storcli /c0 show termlog |grep EVT#
07/22/19  8:15:36: C0:EVT#08346-07/22/19  8:15:36: 268=PD 0d(e0xfc/s5) Path 4433221105000000  reset (Type 03)
07/22/19  8:15:36: C0:EVT#08347-07/22/19  8:15:36: 112=Removed: PD 0d(e0xfc/s5)
07/22/19  8:15:36: C0:EVT#08348-07/22/19  8:15:36: 248=Removed: PD 0d(e0xfc/s5) Info: enclPd=fc, scsiType=0, portMap=02, sasAddr=4433221105000000,0000000000000000
07/22/19  8:15:36: C0:EVT#08349-07/22/19  8:15:36: 114=State change on PD 0d(e0xfc/s5) from ONLINE(18) to FAILED(11)
07/22/19  8:15:36: C0:EVT#08350-07/22/19  8:15:36:  81=State change on VD 00/0 from OPTIMAL(3) to DEGRADED(2)
07/22/19  8:15:36: C0:EVT#08351-07/22/19  8:15:36: 251=VD 00/0 is now DEGRADED
07/22/19  8:15:36: C0:EVT#08352-07/22/19  8:15:36: 114=State change on PD 0d(e0xfc/s5) from FAILED(11) to UNCONFIGURED_BAD(1)
07/22/19  8:34:31: C0:EVT#08353-07/22/19  8:34:31:  91=Inserted: PD 0d(e0xfc/s5)
07/22/19  8:34:31: C0:EVT#08354-07/22/19  8:34:31: 247=Inserted: PD 0d(e0xfc/s5) Info: enclPd=fc, scsiType=0, portMap=02, sasAddr=4433221105000000,0000000000000000
07/22/19  8:34:31: C0:EVT#08355-07/22/19  8:34:31: 547=PD 0d(e0xfc/s5) Inquiry info: Info- ATA  INTEL SSDSC2KG48 0HBN480BGN 447 GB

    3.1 几个可能会导致插回缓存盘状态仍然为UNCONFIGURED_BAD

    (1) 缓存盘本身有问题 <= 应该会打印坏盘信息,排除

    (2) format不干净,仍然残留configure <= 调查过没有foreign config,排除

    (3) 预设开启 Maintain PD Fail History <= 有可能,需作实验

    3.2 看下该节点有开 Maintain PD Fail History

$ storcli /c0 show all | grep -i maintain
Maintain PD Fail History = On
Maintain PD Fail History = Yes

    3.3 这是一个记录PD状态的选项,如果raid controller发现某个drive有问题(disconnect),会把它标记成Unconfigured bad,就算       插回盘也不会变成Unconfigured good,为的是让管理者知道盘有异常,让管理者决定下一步的动作

Enable this option to maintain the history of all drive failures. This option is used to keep track of drives that the RAID controller believes have failed. With this feature enabled, the RAID controller will track bad drives and mark them as Unconfigured bad if they return from disconnect or failure. Drives can be marked Unconfigured bad if they are failing or if the RAID controller loses communication with the drive while it is part of a configuration (a virtual drive member or a hot spare). The HBA will loose
communication with drives if they are removed while the system is turned on or if SIMs are removed
while the system is turned on. The default is Enabled.

 

有以下两种方法可以使raid1 rebuild成功

方法一:通过以下命令修改修改成good

[root@node-1 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDMakeGood -Physdrv "[252:6]" -a0
                                     
Adapter: 0: EnclId-252 SlotId-6 state changed to Unconfigured-Good.

Exit Code: 0x00

方法二:通过关闭Maintain PD Fail History配置

(1) 通过Megacli命令

$ /opt/MegaRAID/MegaCli/MegaCli64 -AdpSetProp MaintainPdFailHistoryEnbl 0 -a0

(2) 通过Storcli命令

检测

$ storcli /c0 show maintainpdfailhistory

设置

$ storcli /c0 set maintainpdfailhistory=off

4.查看rebuild进度

[root@node-1 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv [252:6] -a0 
                                     
Rebuild Progress on Device at Enclosure 252, Slot 6 Completed 14% in 0 Minutes.

Exit Code: 0x00

 

查看RAID卡所有配置信息命令:

/opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aALL

参考资料

1. https://forum.huawei.com/enterprise/zh/thread-192557-110.html

2. http://www.osslab.tw/Storage/Enterprise/SAS%E8%88%87RAID/LSI/LSI_RAID_Stack_Training/Maintain_PD_Fail_History_%E7%9A%84%E6%8A%80%E5%B7%A7%E8%88%87%E5%AF%A6%E4%BE%8B

  • 2
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值