今天接一用户反馈,说其数据库不定期重启,通过日志发现,如下错误提示:
Current log# 2 seq# 119 mem# 0: Y:\ORADATA\ORCL\REDO02.LOG
Sat Jan 18 08:10:47 2014
********************* ATTENTION: ********************
The controlfile header block returned by the OS
has a sequence number that is too old.
The controlfile might be corrupted.
PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE
without following the steps below.
RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE
TO THE DATABASE, if the controlfile is truly corrupted.
In order to re-start the instance safely,
please do the following:
(1) Save all copies of the controlfile for later
analysis and contact your OS vendor and Oracle support.
(2) Mount the instance and issue:
ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
(3) Unmount the instance.
(4) Use the script in the trace file to
RE-CREATE THE CONTROLFILE and open the database.
*****************************************************
CKPT (ospid: 3924): terminating the instance
Sat Jan 18 08:10:47 2014
opiodr aborting process unknown ospid (324) as a result of ORA-1092
Sat Jan 18 08:10:47 2014
opiodr aborting process unknown ospid (2684) as a result of ORA-1092
Sat Jan 18 08:10:47 2014
ORA-1092 : opitsk aborting process
Sat Jan 18 08:10:48 2014
opiodr aborting process unknown ospid (2472) as a result of ORA-1092
Sat Jan 18 08:10:48 2014
ORA-1092 : opitsk aborting process
Sat Jan 18 08:10:48 2014
System state dump requested by (instance=1, osid=3924 (CKPT)), summary=[abnormal instance termination].
System State dumped to trace file D:\APP\ADMINISTRATOR\diag\rdbms\orcl\orcl\trace\orcl_diag_3880_20140118081048.trc
Sat Jan 18 08:10:48 2014
opiodr aborting process unknown ospid (2500) as a result of ORA-1092
Sat Jan 18 08:10:48 2014
ORA-1092 : opitsk aborting process
Sat Jan 18 08:10:48 2014
opiodr aborting process unknown ospid (1624) as a result of ORA-1092
Sat Jan 18 08:10:48 2014
ORA-1092 : opitsk aborting process
Sat Jan 18 08:10:48 2014
opiodr aborting process unknown ospid (920) as a result of ORA-1092
Sat Jan 18 08:10:48 2014
ORA-1092 : opitsk aborting process
Sat Jan 18 08:10:48 2014
opiodr aborting process unknown ospid (3588) as a result of ORA-1092
Sat Jan 18 08:10:48 2014
ORA-1092 : opitsk aborting process
Sat Jan 18 08:10:47 2014
ORA-1092 : opitsk aborting process
Sat Jan 18 08:10:48 2014
opiodr aborting process unknown ospid (992) as a result of ORA-1092
Sat Jan 18 08:10:48 2014
ORA-1092 : opitsk aborting process
Sat Jan 18 08:10:49 2014
opiodr aborting process unknown ospid (2928) as a result of ORA-1092
Sat Jan 18 08:10:49 2014
ORA-1092 : opitsk aborting process
Sat Jan 18 08:10:49 2014
opiodr aborting process unknown ospid (964) as a result of ORA-1092
Sat Jan 18 08:10:49 2014
ORA-1092 : opitsk aborting process
Sat Jan 18 08:10:49 2014
opiodr aborting process unknown ospid (3508) as a result of ORA-1092
Sat Jan 18 08:10:49 2014
ORA-1092 : opitsk aborting process
Sat Jan 18 08:10:49 2014
可以看到,在出现问题时候,日志中给了一个警告(ATTENTION),大意是控制文件文件头信息出现错误,但是重启后又能正常使用,通过metlink查询看到如下结果:
APPLIES TO:
Oracle Database - Enterprise Edition - Version 11.2.0.2 and laterInformation in this document applies to any platform.
SYMPTOMS
Database instance went down with following error message in alert log:
---
Wed Sep 11 23:26:39 2013
********************* ATTENTION: ********************
The controlfile header block returned by the OS
has a sequence number that is too old.
The controlfile might be corrupted.
PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE
without following the steps below.
RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE
TO THE DATABASE, if the controlfile is truly corrupted.
In order to re-start the instance safely,
please do the following:
(1) Save all copies of the controlfile for later
analysis and contact your OS vendor and Oracle support.
(2) Mount the instance and issue:
ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
(3) Unmount the instance.
(4) Use the script in the trace file to
RE-CREATE THE CONTROLFILE and open the database.
*****************************************************
USER (ospid: 24051722): terminating the instance
---
Impact:
- This caused our major business critical database down and impact our business.
CAUSE
BUG 14281768 - CONTROL FILE GETS CORRUPTED
Same symptoms apply
SOLUTION
Error is typically raised when the Controlfile is overwritten by an older copy of the Controlfile. Most likely this happened due to Storage OR I/o error.
All copies of the control file must have the same internal sequence number for Oracle to start up the database or shut it down in normal or immediate mode.
To make a sanity check in the future , please set the following parameter :-
SQL> alter system set "_controlfile_update_check"='HIGH' scope=spfile; -- then bounce the database.
Please check with your OS System/Storage admin regarding the issue.
The precautions is to relocate the control file on a fast and direct I/O enabled disk , the main target is not letting the OS to write an old copy (cached copy of the controlfile to it).
To reverse the parameter setting :-
SQL> alter system set "_controlfile_update_check"='OFF' scope=spfile; -- then bounce the database.
本质原因是由于IO性能太差,导致控制文件书写不同步,数据库检查到控制文件信息不一致,认为控制文件可能损坏,因此强制断开客户端连接,官方根除的解决方法是调整隐含参数,不对控制文件的修改作检查,但是这不是解决问题的根本,根本还是在调整IO,否则不排除出现真正的日志文件损坏.