生产库的DG需要和主库一样,也需要监控.主要从归档日志传输状态,备库恢复性能状态来进行监控.
归档日志传输状态
监控内容:主要是监控视图v$archive_gap,v$history.
v$archive_gap,当某些日志没有成功传输到备库时,归档就会出现裂隙,默认状态下Data Guard可以自动检测和解决这个问题.
v$log_history,控制文件中记录的归档日志的时间和先后顺序.
Then copy these log files to physical standby database and register them using the ALTER DATABASE REGISTER LOGFILE statement on standby database. After registering these log files on the standby database, we can restart Redo Apply(SQL> RECOVER STANDBY DATABASE;).
(2),如果主库的归档日志连备份也没有了,则需要重新配置,从主库的全备份恢复备库.
拷贝主库的rman备份,通过standby复制的方式进行同步,即相当重新配置了data guard.
归档日志传输状态
监控内容:主要是监控视图v$archive_gap,v$history.
v$archive_gap,当某些日志没有成功传输到备库时,归档就会出现裂隙,默认状态下Data Guard可以自动检测和解决这个问题.
v$log_history,控制文件中记录的归档日志的时间和先后顺序.
例子-1
check_standby_status.sh发现备库出现归档日志gap.
select thread#,sequence#,first_change#,to_char(first_time,'yyyy-mm-dd hh24:mi:ss') FirstTime,next_change#
from v$log_history
where first_time=(select max(first_time) from v$log_history);
select THREAD#,LOW_SEQUENCE#,HIGH_SEQUENCE# from v\$archive_gap;
SQL>select THREAD#,LOW_SEQUENCE#,HIGH_SEQUENCE# from v$archive_gap;
THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
---------- ------------- --------------
1 200535 201001
备库alert日志
...
Thu Aug 16 01:44:16 2012
Media Recovery Log +BACKUP/pdhcpcs/archivelog/2012_08_11/thread_1_seq_200533.9131.791017793
Media Recovery Log +BACKUP/pdhcpcs/archivelog/2012_08_11/thread_1_seq_200534.9132.791019447
Thu Aug 16 01:44:27 2012
Media Recovery Waiting for thread 1 sequence 200535
Fetching gap sequence in thread 1, gap sequence 200535-200634
Thu Aug 16 01:44:38 2012
FAL[client]: Failed to request gap sequence
GAP - thread 1 sequence 200535-200634
DBID 4161893172 branch 727959542
FAL[client]: All defined FAL servers have been attempted.
-------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that is sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
-------------------------------------------------------------
Thu Aug 16 02:01:52 2012
RFS[2]: Opened log for thread 1 sequence 201181 dbid -133074124 branch 727959542
Archived Log entry 143848 added for thread 1 sequence 201181 rlc 727959542 ID 0xf8110c31 dest 2:
...
...
Thu Aug 16 23:45:41 2012
RFS[2]: Opened log for thread 1 sequence 201309 dbid -133074124 branch 727959542
Archived Log entry 143976 added for thread 1 sequence 201309 rlc 727959542 ID 0xf8110c31 dest 2:
Fri Aug 17 00:03:55 2012
Fetching gap sequence in thread 1, gap sequence 200535-200634
Fri Aug 17 00:04:06 2012
FAL[client]: Failed to request gap sequence
GAP - thread 1 sequence 200535-200634
DBID 4161893172 branch 727959542
FAL[client]: All defined FAL servers have been attempted.
-------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that is sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
-------------------------------------------------------------
Fri Aug 17 00:08:59 2012
RFS[2]: Opened log for thread 1 sequence 201310 dbid -133074124 branch 727959542
Archived Log entry 143977 added for thread 1 sequence 201310 rlc 727959542 ID 0xf8110c31 dest 2:
...
查看备库v$archived_log
,200535到201001系列日志根本都没有.
select thread#,sequence#,substr(name,39,34) a_name,
to_char(FIRST_TIME,'yyyy-mm-dd hh24:mi:ss') a_first_time,
first_CHANGE#,
to_char(NEXT_TIME,'yyyy-mm-dd hh24:mi:ss') a_next_time,
applied,status
from v$archived_log
where sequence# between 200530 and 201005
order by FIRST_TIME
THREAD# SEQUENCE# A_NAME A_FIRST_TIME FIRST_CHANGE# A_NEXT_TIME APPLIED S
---------- ---------- ---------------------------------------- ------------------- ---------------- ------------------- --------- -
1 200530 2012-08-11 06:10:56 44,590,329,558 2012-08-11 06:38:31 YES D
1 200531 2012-08-11 06:38:31 44,590,728,722 2012-08-11 06:39:55 YES D
1 200532 2012-08-11 06:39:55 44,591,233,458 2012-08-11 07:08:23 YES D
1 200533 2012-08-11 07:08:23 44,591,747,460 2012-08-11 07:09:41 YES D
1 200534 2012-08-11 07:09:41 44,592,235,819 2012-08-11 07:37:12 YES D
1 201002 thread_1_seq_201002.6470.791318245 2012-08-14 18:13:30 44,800,747,857 2012-08-14 18:37:02 NO A
1 201003 thread_1_seq_201003.4428.791318493 2012-08-14 18:37:02 44,801,095,874 2012-08-14 18:40:52 NO A
1 201004 thread_1_seq_201004.4498.791318705 2012-08-14 18:40:52 44,801,617,583 2012-08-14 18:44:45 NO A
1 201005 thread_1_seq_201005.815.791320277 2012-08-14 18:44:45 44,802,107,303 2012-08-14 19:10:24 NO A
查看主库v$archived_log,200525到200928已经被删除.
THREAD# SEQUENCE# A_NAME A_FIRST_TIME FIRST_CHANGE# A_NEXT_TIME APPLIED S
---------- ---------- ---------------------------------------- ------------------- ------------- ------------------- --------- -
1 200920 2012-08-14 04:08:38 4.4764E+10 2012-08-14 04:10:05 NO D
1 200921 2012-08-14 04:10:05 4.4765E+10 2012-08-14 04:37:55 NO D
1 200922 2012-08-14 04:37:55 4.4765E+10 2012-08-14 04:39:29 NO D
1 200923 2012-08-14 04:39:29 4.4766E+10 2012-08-14 04:41:02 NO D
1 200924 2012-08-14 04:41:02 4.4766E+10 2012-08-14 05:08:57 NO D
1 200925 2012-08-14 05:08:57 4.4767E+10 2012-08-14 05:10:24 NO D
1 200926 2012-08-14 05:10:24 4.4767E+10 2012-08-14 05:37:49 NO D
1 200927 2012-08-14 05:37:49 4.4768E+10 2012-08-14 05:39:11 NO D
1 200928 2012-08-14 05:39:11 4.4768E+10 2012-08-14 05:40:57 NO D
1 200929 thread_1_seq_200929.2778.791273315 2012-08-14 05:40:57 4.4769E+10 2012-08-14 06:08:34 NO A
1 200930 thread_1_seq_200930.499.791273369 2012-08-14 06:08:34 4.4769E+10 2012-08-14 06:09:28 NO A
THREAD# SEQUENCE# A_NAME A_FIRST_TIME FIRST_CHANGE# A_NEXT_TIME APPLIED S
---------- ---------- ---------------------------------------- ------------------- ------------- ------------------- --------- -
1 200931 thread_1_seq_200931.3047.791274471 2012-08-14 06:09:28 4.4769E+10 2012-08-14 06:27:51 NO A
1 200932 thread_1_seq_200932.1060.791275123 2012-08-14 06:27:51 4.4770E+10 2012-08-14 06:38:42 NO A
1 200933 thread_1_seq_200933.2791.791275207 2012-08-14 06:38:42 4.4770E+10 2012-08-14 06:40:07 NO A
1 200934 thread_1_seq_200934.1812.791276903 2012-08-14 06:40:07 4.4771E+10 2012-08-14 07:08:23 NO A
1 200935 thread_1_seq_200935.1053.791276997 2012-08-14 07:08:23 4.4771E+10 2012-08-14 07:09:56 NO A
1 200936 thread_1_seq_200936.1054.791278045 2012-08-14 07:09:56 4.4772E+10 2012-08-14 07:27:24 NO A
1 200937 thread_1_seq_200937.1055.791278723 2012-08-14 07:27:24 4.4772E+10 2012-08-14 07:38:42 NO A
1 200938 thread_1_seq_200938.1192.791278793 2012-08-14 07:38:42 4.4773E+10 2012-08-14 07:39:51 NO A
1 200939 thread_1_seq_200939.1207.791280441 2012-08-14 07:39:51 4.4773E+10 2012-08-14 08:07:20 NO A
1 200940 thread_1_seq_200940.1212.791280531 2012-08-14 08:07:20 4.4774E+10 2012-08-14 08:08:51 NO A
1 200941 thread_1_seq_200941.482.791280607 2012-08-14 08:08:51 4.4774E+10 2012-08-14 08:10:06 NO A
现状分析
备库上200535到201001序列的日志没有,而其他的日志正常接收,而主库上200535到200928已经删除了,所以这段日志根本没有办法传输到备库上面.
解决方法
(1),查看主库上是否有这些归档日志或者其备份文件,如果有的话,将这些归档日志拷贝到备库,并注册到备库的控制文件.Then copy these log files to physical standby database and register them using the ALTER DATABASE REGISTER LOGFILE statement on standby database. After registering these log files on the standby database, we can restart Redo Apply(SQL> RECOVER STANDBY DATABASE;).
(2),如果主库的归档日志连备份也没有了,则需要重新配置,从主库的全备份恢复备库.
拷贝主库的rman备份,通过standby复制的方式进行同步,即相当重新配置了data guard.
原因分析
检查主备库的数据和归档备份策略,发现主库的归档日志主库还未同步到备库,就已经被备份删除策略删除掉了,所以需要调整备份和删除策略.
原来的备份脚本如下:这是按照冗余策略进行了删除,这种删除不会检查是否已经被备库使用.
rman target / nocatalog <<EOF
spool log to /export/home/oracle/scripts/log/PDHCPC_rman.log append;
run {
BACKUP DATABASE TAG Full_bk;
BACKUP CURRENT ControlFile FORMAT '/export/home/oracle/rman_backup/controlfile/ctl_%U';
sql 'alter system archive log current';
RESTORE DATABASE VALIDATE;
CROSSCHECK BACKUP OF DATABASE;
REPORT OBSOLETE;
DELETE NOPROMPT OBSOLETE;
}
LIST BACKUP SUMMARY;
quit;
EOF
备库恢复性能状态
监控standby的恢复进度,最重要的方法是查看v$archived_log两边的日志差异.
v$recovery_progress查看恢复的细节,注意这个视图在mount恢复状态下是无法查询的,只有在read only状态下可以.
select to_char(start_time,'yyyy-mm-dd hh24:mi:ss') Start_Time,
TYPE,ITEM,UNITS,Total,
to_char(timestamp,'yyyy-mm-dd hh24:mi:ss') timestamp
from v$recovery_progress;
START_TIME TYPE ITEM UNITS TOTAL TIMESTAMP
------------------- -------------------- -------------------------------- -------------------- ---------- -------------------
2012-08-16 20:11:06 Media Recovery Log Files Files 200
2012-08-16 20:11:06 Media Recovery Active Apply Rate KB/sec 5655
2012-08-16 20:11:06 Media Recovery Average Apply Rate KB/sec 64
2012-08-16 20:11:06 Media Recovery Redo Applied Megabytes 2441
2012-08-16 20:11:06 Media Recovery Last Applied Redo SCN+Time 1117581256 2012-08-17 06:27:01
2012-08-16 20:11:06 Media Recovery Active Time Seconds 370
2012-08-16 20:11:06 Media Recovery Apply Time per Log Seconds 1
2012-08-16 20:11:06 Media Recovery Elapsed Time Seconds 37931
2012-08-15 20:11:12 Media Recovery Active Time Seconds 411
2012-08-15 20:11:12 Media Recovery Apply Time per Log Seconds 1
2012-08-15 20:11:12 Media Recovery Elapsed Time Seconds 31742
2012-08-07 20:11:06 Media Recovery Log Files Files 208
2012-08-07 20:11:06 Media Recovery Last Applied Redo SCN+Time 672050995 2012-08-08 06:27:38
2012-08-07 20:11:06 Media Recovery Active Time Seconds 349
2012-08-07 20:11:06 Media Recovery Apply Time per Log Seconds 1
2012-08-07 20:11:06 Media Recovery Elapsed Time Seconds 37059
2012-07-13 20:11:09 Media Recovery Last Applied Redo SCN+Time 1.8447E+19 2012-07-14 06:20:35
2012-07-13 20:11:09 Media Recovery Active Time Seconds 1531
2012-07-13 20:11:09 Media Recovery Elapsed Time Seconds 36684
v$dataguard_stats查看redo传输和redo应用的延迟情况.
select name,value,unit,time_computed from v$dataguard_stats
NAME VALUE UNIT TIME_COMPUTED
-------------------------------- ------------------------------ ------------------------------ ------------------------------
apply finish time +00 00:05:58.9 day(2) to second(1) interval 17-AUG-2012 20:10:12
apply lag +00 22:30:30 day(2) to second(0) interval 17-AUG-2012 20:10:12
estimated startup time 491 second 17-AUG-2012 20:10:12
standby has been open Y 17-AUG-2012 20:10:12
transport lag +00 09:07:39 day(2) to second(0) interval 17-AUG-2012 20:10:12
v$dataguard_status查看主库,备库的进程日志
col lines 200 pages 100
col message for a120
select facility,to_char(timestamp,'dd-mon-yyyy hh24:mi') time,message from v$dataguard_status;
v$managed_standby查看归档,MRP,RFS的进程状态
select process,pid,status,thread#,group#,sequence#,delay_mins from v$managed_standby;
PROCESS PID STATUS THREAD# GROUP# SEQUENCE# DELAY_MINS
--------- ---------- ------------ ---------- ---------------------------------------- ---------- ----------
ARCH 27188 CONNECTED 0 N/A 0 0
ARCH 27190 CONNECTED 0 N/A 0 0
MRP0 6790 WAIT_FOR_LOG 3 N/A 238828 0
RFS 27227 IDLE 0 N/A 0 0
RFS 27229 IDLE 0 N/A 0 0
RFS 27231 IDLE 0 N/A 0 0
完结