今天在测试Oracle环境上做热备,发现RFS进程不存在,做了如下操作后,还是不行:
1、拷贝主库的密码文件到备库
2、重启备库,也重启了主库,并alter system switch logfile;切换日志
3、未发现主库与备库的gap,监听也正常
备库似乎一切正常,但是就是RFS进程不存在,主库的归档日志没有传输到备库
倒是在备库里有个错误提示:
Sun May 25 21:24:09 2014
Errors in file /oracle/admin/dbcc/bdump/dbcc_mrp0_2596.trc:
ORA-16037: user requested cancel of managed recovery operation
Recovery interrupted!
查看跟踪文件
[oracle@sdk32 dbs]$ more /oracle/admin/dbcc/bdump/dbcc_mrp0_2596.trc
/oracle/admin/dbcc/bdump/dbcc_mrp0_2596.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/product/10.2.0/db_1
System name: Linux
Node name: sdk32
Release: 2.6.32-279.el6.i686
Version: #1 SMP Fri Jun 22 10:59:55 UTC 2012
Machine: i686
Instance name: dbcc
Redo thread mounted by this instance: 1
Oracle process number: 18
Unix process pid: 2596, image: oracle@sdk32 (MRP0)
*** SERVICE NAME:() 2014-05-25 21:13:39.499
*** SESSION ID:(153.1) 2014-05-25 21:13:39.499
ARCH: Connecting to console port...
*** 2014-05-25 21:13:39.500 62692 kcrr.c
MRP0: Background Managed Standby Recovery process started
*** 2014-05-25 21:13:44.500 1118 krsm.c
Managed Recovery: Initialization posted.
*** 2014-05-25 21:13:44.500 62692 kcrr.c
Managed Standby Recovery not using Real Time Apply
Recovery target incarnation = 2, activation ID = -1741977668
Influx buffer limit = 11227 (50% x 22455)
Successfully allocated 2 recovery slaves
Using 543 overflow buffers per recovery slave
Start recovery at thread 1 ckpt scn 15793487 logseq 872 block 2
*** 2014-05-25 21:13:44.579
Media Recovery add redo thread 1
*** 2014-05-25 21:13:44.579 1118 krsm.c
Managed Recovery: Active posted.
*** 2014-05-25 21:13:44.587 62692 kcrr.c
Media Recovery Waiting for thread 1 sequence 872
*** 2014-05-25 21:24:09.713
*** 2014-05-25 21:24:09.713 62692 kcrr.c
MRP0: Background Media Recovery cancelled with status 16037
ORA-16037: user requested cancel of managed recovery operation
*** 2014-05-25 21:24:09.713
Media Recovery drop redo thread 1
*** 2014-05-25 21:24:11.179 1118 krsm.c
Managed Recovery: Not Active posted.
ORA-16037: user requested cancel of managed recovery operation
ARCH: Connecting to console port...
*** 2014-05-25 21:24:11.182 62692 kcrr.c
MRP0: Background Media Recovery process shutdown
*** 2014-05-25 21:24:11.182 1118 krsm.c
查了下对应的ORA错误资料,也未见特别的说明。
于是检查下主库参数:
SQL> show parameter log_archive_dest_state_2;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_2 string defer
为什么这个参数被设置为了defer,原来是之前做了测试忘改回来了
SQL> ALTER system SET log_archive_dest_state_2 = 'enable';
系统已更改。
SQL> show parameter log_archive_dest_state_2;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
log_archive_dest_state_2 string enable
改回来后,备库alert日志立马开始响应,RFS进程起来了:
Sun May 25 21:54:32 2014
Redo Shipping Client Connected as PUBLIC
-- Connected User is Valid
RFS[1]: Assigned to RFS process 2816
RFS[1]: Identified database type as 'physical standby'
Sun May 25 21:54:32 2014
RFS LogMiner: Client disabled from further notification
RFS[1]: Archived Log: '/archive/1_876_847917566.dbf'
RFS[1]: Archived Log: '/archive/1_877_847917566.dbf'
RFS[1]: Archived Log: '/archive/1_878_847917566.dbf'
Sun May 25 21:54:35 2014
Fetching gap sequence in thread 1, gap sequence 875-875
Sun May 25 21:54:45 2014
RFS[1]: Archived Log: '/archive/1_879_847917566.dbf'
Sun May 25 21:54:58 2014
RFS[1]: Archived Log: '/archive/1_880_847917566.dbf'
Sun May 25 21:55:11 2014
RFS[1]: Archived Log: '/archive/1_881_847917566.dbf'
Sun May 25 21:55:24 2014
RFS[1]: Archived Log: '/archive/1_882_847917566.dbf'
Sun May 25 21:55:36 2014
RFS[1]: Archived Log: '/archive/1_883_847917566.dbf'
Sun May 25 21:55:49 2014
RFS[1]: Archived Log: '/archive/1_884_847917566.dbf'
Sun May 25 21:56:04 2014
RFS[1]: Archived Log: '/archive/1_885_847917566.dbf'
Sun May 25 21:56:18 2014
RFS[1]: Archived Log: '/archive/1_886_847917566.dbf'
Sun May 25 21:56:31 2014
RFS[1]: Archived Log: '/archive/1_887_847917566.dbf'
最后再次检查主库与备库的同步情况
主库查询归档日志切换情况:
SQL> select group#,sequence#,status,first_time,to_char(first_change#) as begin_scn
from v$log order by group# asc;
GROUP# SEQUENCE# STATUS FIRST_TIME BEGIN_SCN
---------- ---------- ---------------- -------------- ----------------------------------------
1 898 CURRENT 25-5月 -14 16021321
2 896 INACTIVE 25-5月 -14 15926616
3 897 ACTIVE 25-5月 -14 15973241
备库查询归档日志切换情况:
SQL> select process,status,thread#,sequence#,block#,blocks from V$managed_standby order by 1,4;
PROCESS STATUS THREAD# SEQUENCE# BLOCK# BLOCKS
--------- ------------ ---------- ---------- ---------- ----------
ARCH CONNECTED 0 0 0 0
ARCH CONNECTED 0 0 0 0
MRP0 WAIT_FOR_LOG 1 898 0 0
RFS IDLE 0 0 0 0
OK,由于部署了一些测试应用,归档日志已经从刚开始时候的878切换到了898
总结:遇到问题要冷静思考可能的原因,Oracle10g的Dataguard还算是比较稳定的,注意检查核心的参数是否有变化。