我的试验库是11g,控制文件分布如下:
/u01/oradata/denver/control01.ctl
/u01/flash_recovery_area/denver/control02.ctl
在数据库处于open阶段,删除/u01/oradata/denver/control01.ctl之后实例并不会crash,而不像OCP资料描述那样:实例马上崩溃。
但是alert日志里面定时会有相关的告警,无法找到控制文件导致:
Sun Oct 04 21:15:39 2015
Errors in file /u01/diag/rdbms/denver/denver/trace/denver_m000_6504.trc:
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/u01/oradata/denver/control01.ctl'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
trace日志如下:
[oracle@db1 denver]$ more /u01/diag/rdbms/denver/denver/trace/denver_m000_6504.trc
Trace file /u01/diag/rdbms/denver/denver/trace/denver_m000_6504.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle
System name: Linux
Node name: db1
Release: 2.6.32-220.el6.x86_64
Version: #1 SMP Wed Nov 9 08:03:13 EST 2011
Machine: x86_64
Instance name: denver
Redo thread mounted by this instance: 1
Oracle process number: 36
Unix process pid: 6504, image: oracle@db1 (M000)
*** 2015-10-04 21:44:29.660
*** SESSION ID:(11.32) 2015-10-04 21:44:29.660
*** CLIENT ID:() 2015-10-04 21:44:29.660
*** SERVICE NAME:(SYS$BACKGROUND) 2015-10-04 21:44:29.660
*** MODULE NAME:(MMON_SLAVE) 2015-10-04 21:44:29.660
*** ACTION NAME:(DDE async action) 2015-10-04 21:44:29.660
========= Dump for error ORA 202 (no incident) ========
----- DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (Async) -----
dbkh_reactive_run_check: BEGIN
dbkh_reactive_run_check:; incident_id=0
dbkh_run_check_internal: BEGIN; check_namep=DB Structure Integrity Check, run_namep=<null>
dbkh_run_check_internal: BEGIN; timeout=0
dbkh_run_check_internal: AFTER RUN CREATE; run_id=461
DDE: Problem Key 'ORA 202' was flood controlled (0x1) (no incident)
ORA-00202: control file: '/u01/oradata/denver/control01.ctl'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
kcidr_cross_check - error:
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/u01/oradata/denver/control01.ctl'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
dbkh_run_check_internal: END ERR CATCH BLOCK; e=210
dbkh_post_process_run: BEGIN
dbkh_post_process_run: NEW FAILURE COUNT: 0; DBKH_NUM_NEW_FAILURES_CTX(ctxp)=dbkh_post_process_run: END
dbkh_reactive_run_check: ERR received; e=210
dbkh_reactive_run_check: END
-------------------------------------------------------
[oracle@db1 denver]$
这个报错很明显,由于定时运行:DB Structure Integrity Check(数据库结构完整性检查),据说实例还是同步heartbeat数据至控制文件,找不到控制文件所致。
所以,控制文件丢失一个,实例并不会崩溃,至少在11g数据库里不会。
并且数据库正常提供服务,只是诸如添加数据文件,添加表空间的操作会失败。
这是控制文件丢失一个的结果:实例会正常运行,数据库服务会“有所保留”地继续运行。
那再来看看,控制文件都丢失呢?
执行如下操作:
mv /u01/flash_recovery_area/denver/control02.ctl /u01/flash_recovery_area/denver/control02.ctl.kkk
这下好了,实例需要的所有控制文件都被删除了。结果怎样:实例也不会崩溃,数据库同样可以“有所保留地”继续服务。
结论:
这个试验里,我把数据库仅仅有的两个控制文件都给mv了,数据库同样可以继续服务,只是涉及到控制文件更新的一些操作(如添加删除表空间,添加删除数据文件),这些操作会直接失败,alert日志会报错说找不到第一个控制文件。所以对于应用来说,控制文件的丢失几乎都不受影响。但是此时的数据库绝对是极度危险的。
同样,我们在实例不崩溃的情况下,把我们mv过去的控制文件给mv回来,数据库就正常了。
发散:
请看如下试验,我也是第一次做:
先干净关闭数据库:
SQL> conn /as sysdba
Connected.
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
然后手工cp备份我们的控制文件:
[oracle@db1 denver]$ cp /u01/oradata/denver/control01.ctl /tmp/control01.ctl
然后再启动我们的数据库,并添加一个数据文件(这会使得控制文件被强制更新):
SQL> startup;
ORACLE instance started.
Total System Global Area 835104768 bytes
Fixed Size 2217952 bytes
Variable Size 595593248 bytes
Database Buffers 230686720 bytes
Redo Buffers 6606848 bytes
Database mounted.
Database opened.
SQL> ALTER TABLESPACE "EXAMPLE" ADD DATAFILE '/u01/oradata/denver/example02.dbf' SIZE 100M REUSE AUTOEXTEND ON NEXT 5M MAXSIZE 2G;
Tablespace altered.
SQL>
此时再删除所有控制文件:
SQL> select name from v$controlfile;
NAME
--------------------------------------------------------------------------------
/u01/oradata/denver/control01.ctl
/u01/flash_recovery_area/denver/control02.ctl
SQL> ! mv /u01/oradata/denver/control01.ctl /u01/oradata/denver/control01.ctl.kkk
SQL> ! mv /u01/flash_recovery_area/denver/control02.ctl /u01/flash_recovery_area/denver/control02.ctl.kkk
SQL>
很明显根据之前的试验,实例是不会崩溃的。那么,如果我们使用刚刚cp备份的控制文件来恢复呢?:
SQL> ! cp /tmp/control01.ctl /u01/oradata/denver/control01.ctl
SQL> ! cp /tmp/control01.ctl /u01/flash_recovery_area/denver/control02.ctl
SQL>
如上,我们把备份的控制文件给恢复过去,然后执行一个添加数据文件的操作,看看数据库会怎么对待老的控制文件:
SQL> ALTER TABLESPACE "EXAMPLE" ADD DATAFILE '/u01/oradata/denver/example03.dbf' SIZE 100M REUSE AUTOEXTEND ON NEXT 5M MAXSIZE 2048M;
Tablespace altered.
SQL>
纳尼?成功啦?不是应该报错吗?这里我真的认为是应该报错的。再试一次:
SQL> ALTER TABLESPACE "EXAMPLE" ADD DATAFILE '/u01/oradata/denver/example04.dbf' SIZE 100M REUSE AUTOEXTEND ON NEXT 5M MAXSIZE 2048M;
Tablespace altered.
SQL>
我靠,真的成功了,这表示什么?难道老的控制文件被恢复过去之后,实例自动让控制赶上来啦?不会这么犀利吧?!
我尝试干净关闭实例:
SQL> shutdown immediate;
Database closed.
ORA-03113: end-of-file on communication channel
Process ID: 6275
Session ID: 191 Serial number: 3
SQL>
嘿嘿,好像有什么不对劲了。告警日志显示:
Shutting down instance (immediate)
Stopping background process SMCO
Shutting down instance: further logons disabled
Stopping background process QMNC
Sun Oct 04 21:55:35 2015
Stopping background process CJQ0
Stopping background process MMNL
Stopping background process MMON
License high water mark = 14
Stopping Job queue slave processes, flags = 7
Job queue slave processes stopped
All dispatchers and shared servers shutdown
ALTER DATABASE CLOSE NORMAL
Sun Oct 04 21:55:40 2015
SMON: disabling tx recovery
SMON: disabling cache recovery
Sun Oct 04 21:55:41 2015
Shutting down archive processes
Archiving is disabled
Archive process shutdown avoided: 0 active
Thread 1 closed at log sequence 29
Successful close of redo thread 1
Completed: ALTER DATABASE CLOSE NORMAL
ALTER DATABASE DISMOUNT
USER (ospid: 6275): terminating the instance
Instance terminated by USER, pid = 6275
很明显,实例最终被终止,不是正常关闭。但是没有详细的什么trace日志,所以无从继续。
再来startup看看:
SQL> startup
ORACLE instance started.
Total System Global Area 835104768 bytes
Fixed Size 2217952 bytes
Variable Size 595593248 bytes
Database Buffers 230686720 bytes
Redo Buffers 6606848 bytes
Database mounted.
ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '/u01/oradata/denver/system01.dbf'
ORA-01207: file is more recent than control file - old control file
SQL>
我去,果然要出错。报错说,控制文件过老。告警日志显示:
ALTER DATABASE OPEN
Errors in file /u01/diag/rdbms/denver/denver/trace/denver_ora_7560.trc:
ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '/u01/oradata/denver/system01.dbf'
ORA-01207: file is more recent than control file - old control file
ORA-1122 signalled during: ALTER DATABASE OPEN...
Sun Oct 04 21:59:56 2015
Errors in file /u01/diag/rdbms/denver/denver/trace/denver_m000_7562.trc:
ORA-00338: log 1 of thread 1 is more recent than control file
ORA-00312: online log 1 thread 1: '/u01/oradata/denver/redo01.rdo'
Errors in file /u01/diag/rdbms/denver/denver/trace/denver_m000_7562.trc:
ORA-00338: log 1 of thread 1 is more recent than control file
ORA-00312: online log 1 thread 1: '/u01/oradata/denver/redo01.rdo'
Errors in file /u01/diag/rdbms/denver/denver/trace/denver_m000_7562.trc:
ORA-00338: log 2 of thread 1 is more recent than control file
ORA-00312: online log 2 thread 1: '/u01/oradata/denver/redo02.rdo'
Errors in file /u01/diag/rdbms/denver/denver/trace/denver_m000_7562.trc:
ORA-00338: log 2 of thread 1 is more recent than control file
ORA-00312: online log 2 thread 1: '/u01/oradata/denver/redo02.rdo'
Errors in file /u01/diag/rdbms/denver/denver/trace/denver_m000_7562.trc:
ORA-00338: log 3 of thread 1 is more recent than control file
ORA-00312: online log 3 thread 1: '/u01/oradata/denver/redo03.rdo'
Errors in file /u01/diag/rdbms/denver/denver/trace/denver_m000_7562.trc:
ORA-00338: log 3 of thread 1 is more recent than control file
ORA-00312: online log 3 thread 1: '/u01/oradata/denver/redo03.rdo'
Checker run found 1 new persistent data failures
数据库也无法被打开:
SQL> select status from v$instance;
STATUS
------------
MOUNTED
SQL>
此时只有手工重建控制文件,并resetlogs才能打开数据库,如果有rman备份信息,则还需要手工扫描rman备份数据文件才能将备份元数据找回。