问题描述
Oracle数据库无法启动,只能启动到nomount状态,启动mount状态报错。在alert告警日志,可见如下报错:
Thu May 31 15:23:20 2018
Starting ORACLE instance (normal)
****************** Large Pages Information *****************
...... ......
Thu May 31 15:23:37 2018
ARC3 started with pid=41, OS id=15059
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
System state dump requested by (instance=1, osid=15002), summary=[abnormal instance termination].
System State dumped to trace file //trace/_diag_14925.trc
USER (ospid: 15002): terminating the instance
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 15002 terminate the instance... Dumping diagnostic data in directory=[cdmp_20180531152337], requested by (instance=1, osid=15002), summary= [abnormal instance termination].
Instance terminated by USER, pid = 15002
找到异常进程相关的trace 文件,该文件在数据库安装的trace目录下,名称为:_diag_ospid.trc,其中sid是当前实例的实例名,假设为orcl,ospid是出问题的用户进程,在本案例中数据库因为USER (ospid: 15002)而出现异常,所以用于诊断的trace文件名称为:orcl_diag_15002.trc ,打开trace文件,可见如下错误。
Error: kccpb_sanity_check_2
Control file sequence number mismatch! <<<<<<<<< Control file sequence number mismatch!
fhcsq: 25038565 bhcsq: 3481437667 cfn 8
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace ----- ksedsts()+461
----- End of Abridged Call Stack Trace -----
*** 2018-05-31 15:23:37.560
USER (ospid: 15002): terminating the instance <<<<<<<<<<<<<<< Instance was terminated by ospid:15002 ksuitm: waiting up to [5] seconds before killing DIAG(14925)
告警信息
详细见问题描述。
处理过程
详细见解决方案。
根因
磁盘io异常,导致oracle控制文件损坏。
解决方案
1、通过alert日志文件,判断Oracle无法启动的原因。但在本案例中alert日志中没有明显的故障原因,只是提示用户进程终止了数据库启动,详细要查看trace文件;
2、通过对应的trace文件,得到Oracle异常的具体原因;Error: kccpb_sanity_check_2Control file sequence number mismatch! <<<<<<<<< Control file sequence number mismatch!
3、通过Oracle MOS可知,该错误是因为控制文件损坏导致Oracle无法mount;
4、恢复控制文件,然后启动数据库到mount状态,再打开到open状态 ;有如下几种恢复方式:
a、如果控制文件有多个副本,首先将所有副本做好备份,修改静态参数文件的control_file参数。首先尝试control01.ctl文件启动数据库,如果启动失败可尝试control02.ctl文件启动数据库,如果某个控制文件可以启动数据库,则说明该文件未损坏,可把该文件复制替换另一个损坏的控制文件,再启动数据库即可。如果多个控制文件都无法启动数据库,则说明多个副本均失效,只能考虑后续的方法b和方法c来恢复。
b、如果已经使用了rman备份了控制文件,可考虑做控制文件recovery。通过rman备份的日志可知都有哪些backuppiece包含控制文件的备份,然后使用如下命令恢复控制文件即可;Rman> restore controlfile from '' ;
Rman >recover database ;复制代码
c、如果a方法恢复失败,b方法因缺失备份没条件实施,则可以考虑重建控制文件的方法来解决,方法如下:- Set #1. NORESETLOGS case
--
-- The following commands will create a new control file and use it
-- to open the database.
-- Data used by Recovery Manager will be lost.
-- Additional logs may be required for media recovery of offline
-- Use this only if the current versions of all online logs are
-- available.
-- After mounting the created controlfile, the following SQL
-- statement will place the database in the appropriate
-- protection mode:
-- ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PERFORMANCE
STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "RACDB" NORESETLOGS ARCHIVELOG
MAXLOGFILES 192
MAXLOGMEMBERS 3
MAXDATAFILES 1024
MAXINSTANCES 32
MAXLOGHISTORY 292
LOGFILE
GROUP 1 '+ARCH/racdb/onlinelog/group_1.257.1016591199' SIZE 52M BLOCKSIZE 512,
GROUP 2 '+ARCH/racdb/onlinelog/group_2.258.1016591199' SIZE 52M BLOCKSIZE 512,
GROUP 3 '+ARCH/racdb/onlinelog/group_3.259.1016602253' SIZE 52M BLOCKSIZE 512,
GROUP 4 '+ARCH/racdb/onlinelog/group_4.260.1016602255' SIZE 52M BLOCKSIZE 512
-- STANDBY LOGFILE
DATAFILE
'+DATA/racdb/datafile/system.256.1016591203',
'+DATA/racdb/datafile/sysaux.257.1016591219',
'+DATA/racdb/datafile/undotbs1.258.1016591231',
'+DATA/racdb/datafile/undotbs2.260.1016591267',
'+DATA/racdb/datafile/users.261.1016591271',
'+DATA/racdb/datafile/soe.263.1018573635',
'+DATA/racdb/datafile/hn_data.264.1019130887'
CHARACTER SET ZHS16GBK
;
-- Commands to re-create incarnation table
-- Below log names MUST be changed to existing filenames on
-- disk. Any one log file from each branch can be used to
-- re-create incarnation records.
-- ALTER DATABASE REGISTER LOGFILE '+ARCH';
-- Recovery is required if any of the datafiles are restored backups,
-- or if the last shutdown was not normal or immediate.
RECOVER DATABASE
-- All logs need archiving and a log switch is needed.
ALTER SYSTEM ARCHIVE LOG ALL;
-- Database can now be opened normally.
ALTER DATABASE OPEN;
-- Commands to add tempfiles to temporary tablespaces.
-- Online tempfiles have complete space information.
-- Other tempfiles may require adjustment.
ALTER TABLESPACE TEMP ADD TEMPFILE '+DATA/racdb/tempfile/temp.259.1016591235'
SIZE 134217728 REUSE AUTOEXTEND ON NEXT 655360 MAXSIZE 32767M;
-- End of tempfile additions.
建议与总结
1、Oracle数据库建议创建多副本的控制文件,且最好分布在不同磁盘;
2、Oracle数据库要做好备份,包括对控制文件的备份;