ORA-00600 [kcrfr_update_nab_2]处理过程
数据库testdb1 ,AIX,oracle 10.2.0.1,asm;当删除partition时挂起,如:
ALTER TABLE DXUSER.HISTWEBCDMA1X DROP PARTITION P20130728;等待事件“DFS lock handle”,这个等待事件为CI跨实例的等待,有DLM管理;由于数据库是单实例的,涉及跨实例只能是ASM实例;
查询asm alert日志,ASM实例有报错日志+asm_ora_762040.trc,
*** 2014-03-03 12:15:18.846
*** SERVICE NAME:() 2014-03-03 12:15:18.825
*** SESSION ID:(36.7347) 2014-03-03 12:15:18.825
Waited for detached process: RBAL for 300 seconds:
同时,在errpt中发现报错:
testdb1#errpt |tail
825849BF 0303104614 T H fcs0 ADAPTER ERROR
C62E1EB7 0303104614 P H hdisk63 DISK OPERATION ERROR
C62E1EB7 0303104614 P H hdisk12 DISK OPERATION ERROR
C62E1EB7 0303104614 P H hdisk124 DISK OPERATION ERROR
C62E1EB7 0303104614 P H hdisk74 DISK OPERATION ERROR
B8FBD189 0303104614 T S fscsi0 SOFTWARE PROGRAM ERROR
B8FBD189 0303104614 T S fscsi0 SOFTWARE PROGRAM ERROR
825849BF 0303104614 T H fcs0 ADAPTER ERROR
825849BF 0303104614 T H fcs0 ADAPTER ERROR
系统报错显示,为硬盘或存储控制器等故障,于是通报故障;经过确认处理,更换存储部件,然后硬重启了数据库服务器;等我检查数据库服务器时,数据库不能打开:
SQL> startup open
ORACLE instance started.
Total System Global Area 1.6744E+10 bytes
Fixed Size 2050200 bytes
Variable Size 1694500712 bytes
Database Buffers 1.5032E+10 bytes
Redo Buffers 14725120 bytes
Database mounted.
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2],[0x7000003EF9D93F0], [2], [], [], [], [], []
查看alert日志:
Beginning crash recovery of 1 threads
parallel recovery started with 15 processes
Tue Mar 4 07:47:39 2014
Started redo scan
Tue Mar 4 07:47:40 2014
Errors in file /u01/app/oracle/admin/testdb/udump/testdb_ora_135988.trc:
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x7000003EF9D
993F0], [2], [], [], [], [], []
Tue Mar 4 07:47:42 2014
Aborting crash recovery due to error 600
接着查看错误日志:
testdb1$more /u01/app/oracle/admin/testdb/udump/testdb_ora_135988.trc
/u01/app/oracle/admin/testdb/udump/testdb_ora_135988.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engi
ne options
ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1
System name: AIX
Node name: testdb1
Release: 3
Version: 5
Machine: 00C051B24C00
Instance name: testdb
Redo thread mounted by this instance: 1
Oracle process number: 16
Unix process pid: 135988, image: oracle@testdb1 (TNS V1-V3)
*** 2014-03-04 07:47:34.099
*** SERVICE NAME:() 2014-03-04 07:47:34.088
*** SESSION ID:(1643.3) 2014-03-04 07:47:34.088
Successfully allocated 15 recovery slaves
Using 20 overflow buffers per recovery slave
Thread 1 checkpoint: logseq 21269, block 2, scn 109974607248
cache-low rba: logseq 21269, block 569541
on-disk rba: logseq 21269, block 584155, scn 109974738191
从上面日志看是在Started redo scan之后报错,而报错的日志序号为21269,现在查看logseq21269是哪个日志,
SQL> select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS
---------- ---------- ---------- ---------- ---------- --- ----------------
1 1 21268 52428800 2 NO INACTIVE
2 1 21266 52428800 2 NO INACTIVE
6 1 21265 524288000 2 NO INACTIVE
4 1 21269 524288000 2 NO CURRENT
5 1 21264 524288000 2 NO INACTIVE
3 1 21267 52428800 2 NO INACTIVE
日志组4为,
SQL> select member fromv$logfile;
+SYSDG/testdb/onlinelog/group_4.267.676633559 +DATADG1/testdb/onlinelog/group_4.363.676633561
查询网络发现这个ORA-00600[kcrfr_update_nab_2]错误为罕见报错,MOS和网络上相关信息较少;MOS上多认为是bug,没有绕开和解决方法;只能求助google,找到一篇“kcrfr_update_nab_2”文章,记录了作者的解决过程(http://www.oraclehome.com.br/2011/10/20/kcrfr_update_nab_2/),大体过程是删除报错日志组中的组员2文件(即日志组中的第二个组员),然后recover database,再open,打开数据库后重建出错日志组;
具体操作:
SQL> startup open
ORACLE instance started.
Total System Global Area 1.6744E+10 bytes
Fixed Size 2050200 bytes
Variable Size 1694500712 bytes
Database Buffers 1.5032E+10 bytes
Redo Buffers 14725120 bytes
Database mounted.
ORA-00600: internal error code, arguments: [kcrfr_update_nab_2],
[0x7000003EF9D93F0], [2], [], [], [], [], []
找到报错日志组的redo文件,删除member 1文件,即日志组的第2个组员文件;
$asmcmd
ASMCMD> cd +datadg1/testdb/ONLINELOG/
ASMCMD> ls
group_1.360.676633379
group_2.361.676633469
group_3.362.676633477
group_4.363.676633561
group_5.364.676633571
group_6.365.676633579
ASMCMD> rm group_4.363.676633561
SQL> recover database;
Media recovery complete.
SQL>startup open;
数据库打开后,要重建报错redo group,即group 4;
SQL>alter database drop logfile group 4;
SQL>alter database add logfile thread 1 group 4 ('+SYSDG','+DATADG1') size 512M ;
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/16976507/viewspace-1266952/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/16976507/viewspace-1266952/