ServeRAID disk drive error recovery

The demand for very large drive capacities has created the need to group many physical disks together in the form of arrays and logical drives. It is important to understand certain aspects of hard disk technology to understand the function of synchronization.

While a drive is in manufacturing, a set of low level tests is run against it to establish two internal sector lists. These tests create the "known good sectors" and the "known bad sectors" lists. The hard drive is then locked to their capacity by the firmware installed and defines which sectors become actively available. For example, a 36GB drive may actually accommodate 38GB of usable space. This extra space is listed in the NVRAM of the drive in another list called the "Known good reserved sectors".[@more@]

Sector sparingWhile a drive is in operation, the head may come across a sector with a weakened magnetic reading. The data is still readable but may fall below the preferred threshold for qualified good sector readings. This disk drive would consider this a failing sector and would "sector spare" this data to a new location available in the "known good reserve" list. Once the data is moved, the old sector address is added to the "Grown Defects" list, never to be used again. This process is a "recoverable" media error. The drive will give a Performance Failure Alert (PFA) once the drive "sector spares" the majority of its "known good reserved sectors". Hard drives do this as a routine, and PFAs are part of the Mean-Time-Between-Failures (MTBF) calculations for the drives.

Using this same example, a drive will only know to sector spare when it does a read, read-modify-write or a write-verify to a sector. This is important because if a drive does not read or write to a sector that is failing, the drive will never know to correct the problem, resulting in an "unrecoverable media error" on a future read or write before the disk can save the data. When an "unrecoverable media error" occurs, sector sparing still takes place, but no data can be moved.

Knowing this, you can use simple math to see that the risk of problems is doubled when you go from one drive to two drives in an array. If there are ten (10) to sixteen (16) drives in an array, media errors become more common.

Synchronization
Relating this to IBM's ServeRAID technology and synchronization, syncs are designed to force all the physical hard disks in a logical drive or array to do a read to each sector. This will cause the drives to sector spare "recoverable" media errors, hopefully before they become unrecoverable errors. If an "unrecoverable" media error occurs, it is corrected by the ServeRAID controller synchronizations operation on redundant logical drives, (RAID-1, RAID-1E, RAID-5, RAID-5E, RAID-5EE, RAID-10, RAID-1e0, and RAID-50) by rewriting the missing data.

Foreground syncs can be manually initiated two ways, by using the ServeRAID Manager GUI or using the IPSSEND command. The IPSSEND command can be used in a BAT or CMD file and then automated using most any scheduling utility.

引自:https://www-304.ibm.com/systems/support/supportsite.wss/docdisplay?lndocid=MIGR-57154&brandind=5000008

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/743764/viewspace-1004010/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/743764/viewspace-1004010/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值