近经历了一次较为痛苦的灾难恢复过程,在一次维护过程中,需要shudown 整个db,shutdown immdiate命令一直无法结束,最后不得不使用shudown abort命令,强制关闭了数据库,但打开时出现如下的00600错误:

 

ORA-00600: 内部错误代码参数: [kclchkblk_4], [0], [1158738710], [0], [1128825042], [], [], []

Wed Apr 20 21:36:40 2011

Errors in file /u01/app/oracle/admin/orcl/udump/orcl1_ora_5622.trc:

ORA-00600: 内部错误代码参数: [kclchkblk_4], [0], [1158738710], [0], [1128825042], [], [], []

Wed Apr 20 21:36:40 2011

Error 600 happened during db open, shutting down database

USER: terminating instance due to error 600

  看来shutdown abort不能用呀,教训沉重!!现在只能硬着头皮做数据库恢复了。开始提示system01需要介质恢复,但查询了一个controlfilescn与数据文件头不一致,尝试做了recover database until cancel,提示恢复完成,使用alter system open resetlogs 打开时还是同样的错误!

   metalink上查找[kclchkblk_4]这个错误, [ID 275902.1]说明了这种情况:

1) Error, ORA-600[KCLCHKBLK_4], is signaled because the SCN in a tempfile block
is too high. The same reason caused the ORA-600[2662]s in the alert logs.

2) This issue is because the tempfiles may not get reinitialized during open
resetlogs.

 

具体的原因就是resetlog期间临时表空间的scn与系统scn不一致;解决办法就是在moun状态将物理的tempfile文件全部删除,然后再在打开状态添加临时文件即可。

按照这种方式处理后,打开时报出了一个新的错误:

 

Wed Apr 20 22:34:54 2011

SMON: enabling cache recovery

Wed Apr 20 22:34:54 2011

Errors in file /u01/app/oracle/admin/orcl/udump/orcl1_ora_30165.trc:

ORA-00600: 内部错误代码参数: [2662], [0], [1128985090], [0], [1158738710], [8388617], [], []

Wed Apr 20 22:34:58 2011

Errors in file /u01/app/oracle/admin/orcl/udump/orcl1_ora_30165.trc:

ORA-00600: 内部错误代码参数: [2662], [0], [1128985090], [0], [1158738710], [8388617], [], []

Wed Apr 20 22:34:58 2011

Error 600 happened during db open, shutting down database

USER: terminating instance due to error 600

Wed Apr 20 22:34:58 2011

Errors in file /u01/app/oracle/admin/orcl/bdump/orcl1_lmon_30042.trc:

ORA-00600: ??????, ??: [], [], [], [], [], [], [], []

Instance terminated by USER, pid = 30165

2662错误在使用了_all_resetlogs_curruption等参数不完全恢复后,经常出现的错误, 主要原原因是当前数据库的数据块的SCN早于当前的SCN主要是和存储在UGA变量中的dependent SCN进行比较如果当前的SCN小于它数据库就会产生这个ORA-600 [2662]的错误了:

 

Wed Apr 20 22:34:54 2011

SMON: enabling cache recovery

Wed Apr 20 22:34:54 2011

Errors in file /u01/app/oracle/admin/orcl/udump/orcl1_ora_30165.trc:

ORA-00600: 内部错误代码参数: [2662], [0], [1128985090], [0], [1158738710], [8388617], [], []

Wed Apr 20 22:34:58 2011

Errors in file /u01/app/oracle/admin/orcl/udump/orcl1_ora_30165.trc:

ORA-00600: 内部错误代码参数: [2662], [0], [1128985090], [0], [1158738710], [8388617], [], []

Wed Apr 20 22:34:58 2011

Error 600 happened during db open, shutting down database

USER: terminating instance due to error 600

Wed Apr 20 22:34:58 2011

Errors in file /u01/app/oracle/admin/orcl/bdump/orcl1_lmon_30042.trc:

ORA-00600: ??????, ??: [], [], [], [], [], [], [], []

Instance terminated by USER, pid = 30165

 

2662错误的解决方法一般为使用10015事件调节scn:

alter session set events '10015 trace name adjust_scn level x';

 

xlevel 1为增进SCN 10亿 (1 billion) (1024*1024*1024),通常Level 1已经足够。也可以根据实际情况适当调整。比如我们这里的情况提示1128985090小于1158738710,如果将level设置为1,新调整的scn1073741824,这样就会小于当前的scn了,调整的数不够,将会报出另一个为2256的错误,所以我使用level 2

根据以往在8i/9i下的经验,这时候就应该能够打开数据库了,可是打开时还是报出相同的错误,同时查询V$database发现scn也没有发生变化。看来调整scn 起作用,这下子就有点麻烦了。

  仔细分析生成的trace文件,发现在报出2662错误之前,还报了一个ORA-01031的权限不足的错误:

 

Clearing ORA-1031 thrown by trace 'ADJUST_SCN'

----- Dump for trace 'ADJUST_SCN': -----

*** 2011-04-20 23:54:19.034

ksedmp: internal or fatal error

ORA-01031: 权限不足

Current SQL statement for this session:

 alter database open

----- Call Stack Trace -----

calling              call     entry                argument values in hex     

location             type     point                (? means dubious value)    

-------------------- -------- -------------------- ----------------------------

看来,确实是因为某种权限的原因,导致了调整scn失败;但在8i/9i下这种方法是经常使用的,应该有没有问题,只能猜测Oracle 10g10015事件做了某些修改,后来经过多方打探,包括一些朋友和QQ圈,终于在一位朋友那里知道了一个参数,_allow_error_simulation,只有这个参数设置为true的情况下,才能使用10015调整scn向别人求助是个好习惯,但我坚决反对深夜求助!!!

   init.ora中设置这个参数,再次使用10015事件,终开打开了这个数据库;然后就是exp/imp重建,顺利收工。

   此次工作的教训就是,shutdown abort一定慎用,慎重再慎重!!