环境是AIX 5.2.0.5
HACMP 5.1.0.7
Oracle9.2.0.8的RAC。
在节点1上面对数据文件lv执行了extendlv后,从原来的4000M,扩展到16000M。
使用mklv创建了一些新的lv给数据库使用,整个过程在节点1上面是OK的,
节点1的实例也没有任何错误,但是,在节点2上面的实例告警日志出现如下错误提示:
Errors in file
/oracle/admin/orcl/bdump/orcl2_dbw0_221266.trc:
ORA-01186: file 136 failed verification tests
ORA-01122: database file 136 failed verification
check
ORA-01110: data file 136:
'/dev/rorcl_data126'
ORA-01200: actual file size of 524286 is smaller
than correct size of 2048000 blocks
Thu Jan 21 12:04:46 2010
File 136 not verified due to error ORA-01122
Thu Jan 21 12:04:46 2010
Errors in file
/oracle/admin/orcl/bdump/orcl2_dbw0_221266.trc:
ORA-01186: file 136 failed verification tests
ORA-01122: database file 136 failed verification
check
ORA-01110: data file 136:
'/dev/rorcl_data126'
ORA-01200: actual file size of 524286 is smaller
than correct size of 2048000 blocks
对于新创建的lv,ckpt进程的trace文件中有类似错误
*** 2010-01-21
13:55:22.053
*** SESSION ID:(11.1) 2010-01-21
13:55:22.040
ORA-01110: data file 147:
'/dev/rorcl_data'
ORA-01115: IO error reading block from file 147
(block # 1)
ORA-27063: skgfospo: number of
bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 5:
I/O error
Additional information: -1
Additional information: 8192
ORA-01110: data file 148:
'/dev/rorcl_dwcom'
ORA-01115: IO error reading block from file 148
(block # 1)
ORA-27063: skgfospo: number of
bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 5:
I/O error
Additional information: -1
Additional information: 8192
初略看,确实比较恐怖:
对于扩展的数据文件,控制文件读到信息和文件实际大小不一致,
对于新创建的数据文件,感觉是数据文件头坏了。
后来,再查看节点2的errpt也出现了LVM相关的错误:
613E5F38 0203142310 P H
LVDD I/O ERROR DETECTED BY LVM
00B984B3 0203142310 U H
hdisk6 UNDETERMINED
ERROR
613E5F38 0203142310 P H
LVDD I/O ERROR DETECTED BY LVM
00B984B3 0203142310 U H
hdisk6 UNDETERMINED
ERROR
613E5F38 0203142310 P H
LVDD I/O ERROR DETECTED BY LVM
00B984B3 0203142310 U H
hdisk6 UNDETERMINED
ERROR
初步判段,应该是由于使用extendlv来执行并发逻辑卷的扩展,导致节点1和节点2关于这些lv的ODM库没有一致导致的,
通过重启节点2的整个机器,问题得到解决。