ORA-00600 [kcrfr_update_nab_2]处理过程

ORA-00600 [kcrfr_update_nab_2]处理过程

       数据库testdb1 AIXoracle 10.2.0.1asm;当删除partition时挂起,如:

ALTER TABLE DXUSER.HISTWEBCDMA1X DROP PARTITION P20130728;等待事件“DFS lock handle”,这个等待事件为CI跨实例的等待,有DLM管理;由于数据库是单实例的,涉及跨实例只能是ASM实例;

查询asm alert日志,ASM实例有报错日志+asm_ora_762040.trc

*** 2014-03-03 12:15:18.846

*** SERVICE NAME:() 2014-03-03 12:15:18.825

*** SESSION ID:(36.7347) 2014-03-03 12:15:18.825

Waited for detached process: RBAL for 300 seconds:

同时,在errpt中发现报错:

testdb1#errpt |tail

825849BF   0303104614 T H fcs0           ADAPTER ERROR

C62E1EB7   0303104614 P H hdisk63        DISK OPERATION ERROR

C62E1EB7   0303104614 P H hdisk12        DISK OPERATION ERROR

C62E1EB7   0303104614 P H hdisk124       DISK OPERATION ERROR

C62E1EB7   0303104614 P H hdisk74        DISK OPERATION ERROR

B8FBD189   0303104614 T S fscsi0         SOFTWARE PROGRAM ERROR

B8FBD189   0303104614 T S fscsi0         SOFTWARE PROGRAM ERROR

825849BF   0303104614 T H fcs0           ADAPTER ERROR

825849BF   0303104614 T H fcs0           ADAPTER ERROR

系统报错显示,为硬盘或存储控制器等故障,于是通报故障;经过确认处理,更换存储部件,然后硬重启了数据库服务器;等我检查数据库服务器时,数据库不能打开:

SQL> startup open

ORACLE instance started.

 

Total System Global Area 1.6744E+10 bytes

Fixed Size                  2050200 bytes

Variable Size            1694500712 bytes

Database Buffers         1.5032E+10 bytes

Redo Buffers               14725120 bytes

Database mounted.

ORA-00600: internal error code, arguments: [kcrfr_update_nab_2],[0x7000003EF9D93F0], [2], [], [], [], [], []

 

查看alert日志:

Beginning crash recovery of 1 threads

  parallel recovery started with 15 processes

 Tue Mar  4 07:47:39 2014

 Started redo scan

 Tue Mar  4 07:47:40 2014

 Errors in file /u01/app/oracle/admin/testdb/udump/testdb_ora_135988.trc:

 ORA-00600: internal error code, arguments: [kcrfr_update_nab_2], [0x7000003EF9D

993F0], [2], [], [], [], [], []

 Tue Mar  4 07:47:42 2014

 Aborting crash recovery due to error 600

接着查看错误日志:

testdb1$more /u01/app/oracle/admin/testdb/udump/testdb_ora_135988.trc

 /u01/app/oracle/admin/testdb/udump/testdb_ora_135988.trc

 Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - 64bit Production

 With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engi

ne options

 ORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1

 System name:    AIX

 Node name:      testdb1

 Release:        3

 Version:        5

 Machine:        00C051B24C00

 Instance name: testdb

 Redo thread mounted by this instance: 1

 Oracle process number: 16

 Unix process pid: 135988, image: oracle@testdb1 (TNS V1-V3)

 

 *** 2014-03-04 07:47:34.099

 *** SERVICE NAME:() 2014-03-04 07:47:34.088

 *** SESSION ID:(1643.3) 2014-03-04 07:47:34.088

 Successfully allocated 15 recovery slaves

 Using 20 overflow buffers per recovery slave

 Thread 1 checkpoint: logseq 21269, block 2, scn 109974607248

   cache-low rba: logseq 21269, block 569541

     on-disk rba: logseq 21269, block 584155, scn 109974738191

从上面日志看是在Started redo scan之后报错,而报错的日志序号为21269,现在查看logseq21269是哪个日志,

SQL> select * from v$log;

 

     GROUP#    THREAD#  SEQUENCE#      BYTES    MEMBERS ARC STATUS

 ---------- ---------- ---------- ---------- ---------- --- ----------------

          1          1      21268   52428800          2 NO  INACTIVE

          2          1      21266   52428800          2 NO  INACTIVE

          6          1      21265  524288000          2 NO  INACTIVE

          4          1      21269  524288000          2 NO  CURRENT

          5          1      21264  524288000          2 NO  INACTIVE

          3          1      21267   52428800          2 NO  INACTIVE

日志组4为,

SQL> select member fromv$logfile

+SYSDG/testdb/onlinelog/group_4.267.676633559 +DATADG1/testdb/onlinelog/group_4.363.676633561

       查询网络发现这个ORA-00600[kcrfr_update_nab_2]错误为罕见报错,MOS和网络上相关信息较少;MOS上多认为是bug,没有绕开和解决方法;只能求助google,找到一篇“kcrfr_update_nab_2”文章,记录了作者的解决过程(http://www.oraclehome.com.br/2011/10/20/kcrfr_update_nab_2/),大体过程是删除报错日志组中的组员2文件(即日志组中的第二个组员),然后recover database,再open,打开数据库后重建出错日志组;

具体操作:

SQL> startup open

ORACLE instance started.

 

Total System Global Area 1.6744E+10 bytes

Fixed Size                  2050200 bytes

Variable Size            1694500712 bytes

Database Buffers         1.5032E+10 bytes

Redo Buffers               14725120 bytes

Database mounted.

ORA-00600: internal error code, arguments: [kcrfr_update_nab_2],

[0x7000003EF9D93F0], [2], [], [], [], [], []

 

找到报错日志组的redo文件,删除member 1文件,即日志组的第2个组员文件;

$asmcmd

ASMCMD> cd +datadg1/testdb/ONLINELOG/

ASMCMD> ls

 group_1.360.676633379

 group_2.361.676633469

 group_3.362.676633477

 group_4.363.676633561

 group_5.364.676633571

 group_6.365.676633579

ASMCMD> rm group_4.363.676633561

 

SQL> recover database;

Media recovery complete.

 

SQL> shutdown immediate

SQL>startup open;

数据库打开后,要重建报错redo group,即group 4

SQL>alter database drop logfile group 4;

SQL>alter database add logfile thread 1 group 4 ('+SYSDG','+DATADG1') size 512M ;

 

 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/16976507/viewspace-1266952/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/16976507/viewspace-1266952/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值