ASM磁盘头故障的原因十分复杂,一般和安装与运维有关,其根本原因就是Oracle.比如在AIX平台上,如果没有清掉PVID,下次服务器重启时就可能会出现设备名混乱的问题,如果你用chdev去修改设备,那么ASM磁盘头可能会丢失。
ASM磁盘头丢失在以前是十分麻烦的一件事情,10年前,修复DISK HEAD的费用可能高达数万元甚至十多万元。因为那时候只有少数对ASM DISK HEAD比较了解的工程师才能通过kfed工具手工修复磁盘头。到目前位置,用kfed使用常规方法修复磁盘头还是一件十分困难的事情。直到Oracle 10.2.0.5开始,Oracle也意识到了asm的这个问题,在asm metadata中保留了一个备份块,这样使用 kfed的一个隐含功能就可以实现asm磁盘头的一键修复了。Kfed repair功能可以一键修复磁盘头,哪怕你对磁盘头一无所知,只要会使用这个命令就可以了。下面我们通过一个实验来验证一下。
SQL>select name,path,state,header_status from v$asm_disk
SQL> /
NAME PATH STATE HEADER_STATU
-------------------- ------------------------------ -------- ------------
DATA2_0000 /dev/oracleasm/disks/ASMDISK7 NORMAL MEMBER
FRA_DG_0000 /dev/oracleasm/disks/ASMDISK6 NORMAL MEMBER
DATA_0003 /dev/oracleasm/disks/ASMDISK5 NORMAL MEMBER
DATA_0002 /dev/oracleasm/disks/ASMDISK4 NORMAL MEMBER
AVM_DG_0000 /dev/oracleasm/disks/ASMDISK3 NORMAL MEMBER
DATA_0001 /dev/oracleasm/disks/ASMDISK2 NORMAL MEMBER
DATA_0000 /dev/oracleasm/disks/ASMDISK1 NORMAL MEMBER
从中我们来选择DATA的0001号盘ASMDISK2来做实验。
[root@localhost bin]# dd if=/dev/oracleasm/disks/ASMDISK2 of=ASMDISK2.dd bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.00168042 seconds, 2.4 MB/s
[root@localhost bin]# ls *.dd
ASMDISK2.dd
[root@localhost bin]# dd if=/dev/zero of=/dev/oracleasm/disks/ASMDISK2 bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.00219438 seconds, 1.9 MB/s
然后我们来启动ASM:
[grid@localhost bin]$ sqlplus '/as sysasm'
SQL*Plus: Release 11.2.0.2.0 Production on Fri Mar 27 09:27:07 2015
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ASM instance started
Total System Global Area 284565504 bytes
Fixed Size 1343692 bytes
Variable Size 258055988 bytes
ASM Cache 25165824 bytes
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing from group number "3"
SQL> alter diskgroup data mount;
alter diskgroup data mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "1" is missing from group number "3"
下面一键修复就大显神威了:
[root@localhost bin]# ./kfed repair /dev/oracleasm/disks/ASMDISK2
[root@localhost bin]#
然后我们来看看:
SQL> alter diskgroup data mount;
Diskgroup altered.
磁盘头已修复。