Oracle data guard monitor

生产库的DG需要和主库一样,也需要监控.主要从归档日志传输状态,备库恢复性能状态来进行监控.
归档日志传输状态

监控内容:主要是监控视图v$archive_gap,v$history.
v$archive_gap,当某些日志没有成功传输到备库时,归档就会出现裂隙,默认状态下Data Guard可以自动检测和解决这个问题.

v$log_history,控制文件中记录的归档日志的时间和先后顺序.

例子-1
check_standby_status.sh发现备库出现归档日志gap.

select thread#,sequence#,first_change#,to_char(first_time,'yyyy-mm-dd hh24:mi:ss') FirstTime,next_change# 
from v$log_history 
where first_time=(select max(first_time) from v$log_history);
select THREAD#,LOW_SEQUENCE#,HIGH_SEQUENCE#  from v\$archive_gap;
SQL>select THREAD#,LOW_SEQUENCE#,HIGH_SEQUENCE#  from v$archive_gap;

   THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
---------- ------------- --------------
         1        200535         201001
备库alert日志
...
Thu Aug 16 01:44:16 2012
Media Recovery Log +BACKUP/pdhcpcs/archivelog/2012_08_11/thread_1_seq_200533.9131.791017793
Media Recovery Log +BACKUP/pdhcpcs/archivelog/2012_08_11/thread_1_seq_200534.9132.791019447
Thu Aug 16 01:44:27 2012
Media Recovery Waiting for thread 1 sequence 200535
Fetching gap sequence in thread 1, gap sequence 200535-200634
Thu Aug 16 01:44:38 2012
FAL[client]: Failed to request gap sequence
 GAP - thread 1 sequence 200535-200634
 DBID 4161893172 branch 727959542
FAL[client]: All defined FAL servers have been attempted.
-------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that is sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
-------------------------------------------------------------
Thu Aug 16 02:01:52 2012
RFS[2]: Opened log for thread 1 sequence 201181 dbid -133074124 branch 727959542
Archived Log entry 143848 added for thread 1 sequence 201181 rlc 727959542 ID 0xf8110c31 dest 2:
...
...
Thu Aug 16 23:45:41 2012
RFS[2]: Opened log for thread 1 sequence 201309 dbid -133074124 branch 727959542
Archived Log entry 143976 added for thread 1 sequence 201309 rlc 727959542 ID 0xf8110c31 dest 2:
Fri Aug 17 00:03:55 2012
Fetching gap sequence in thread 1, gap sequence 200535-200634
Fri Aug 17 00:04:06 2012
FAL[client]: Failed to request gap sequence
 GAP - thread 1 sequence 200535-200634
 DBID 4161893172 branch 727959542
FAL[client]: All defined FAL servers have been attempted.
-------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that is sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
-------------------------------------------------------------
Fri Aug 17 00:08:59 2012
RFS[2]: Opened log for thread 1 sequence 201310 dbid -133074124 branch 727959542
Archived Log entry 143977 added for thread 1 sequence 201310 rlc 727959542 ID 0xf8110c31 dest 2:
...
查看备库v$archived_log ,200535到201001系列日志根本都没有.
select thread#,sequence#,substr(name,39,34) a_name,
       to_char(FIRST_TIME,'yyyy-mm-dd hh24:mi:ss') a_first_time,
       first_CHANGE#,
       to_char(NEXT_TIME,'yyyy-mm-dd hh24:mi:ss') a_next_time,
       applied,status
from v$archived_log
where sequence# between 200530 and 201005
order by FIRST_TIME
   THREAD#  SEQUENCE# A_NAME                                   A_FIRST_TIME           FIRST_CHANGE# A_NEXT_TIME         APPLIED   S
---------- ---------- ---------------------------------------- ------------------- ---------------- ------------------- --------- -
         1     200530                                          2012-08-11 06:10:56   44,590,329,558 2012-08-11 06:38:31 YES       D
         1     200531                                          2012-08-11 06:38:31   44,590,728,722 2012-08-11 06:39:55 YES       D
         1     200532                                          2012-08-11 06:39:55   44,591,233,458 2012-08-11 07:08:23 YES       D
         1     200533                                          2012-08-11 07:08:23   44,591,747,460 2012-08-11 07:09:41 YES       D
         1     200534                                          2012-08-11 07:09:41   44,592,235,819 2012-08-11 07:37:12 YES       D
         1     201002 thread_1_seq_201002.6470.791318245       2012-08-14 18:13:30   44,800,747,857 2012-08-14 18:37:02 NO        A
         1     201003 thread_1_seq_201003.4428.791318493       2012-08-14 18:37:02   44,801,095,874 2012-08-14 18:40:52 NO        A
         1     201004 thread_1_seq_201004.4498.791318705       2012-08-14 18:40:52   44,801,617,583 2012-08-14 18:44:45 NO        A
         1     201005 thread_1_seq_201005.815.791320277        2012-08-14 18:44:45   44,802,107,303 2012-08-14 19:10:24 NO        A
查看主库v$archived_log,200525到200928已经被删除.
   THREAD#  SEQUENCE# A_NAME                                    A_FIRST_TIME        FIRST_CHANGE# A_NEXT_TIME         APPLIED   S
---------- ---------- ---------------------------------------- ------------------- ------------- ------------------- --------- -
         1     200920                                           2012-08-14 04:08:38    4.4764E+10 2012-08-14 04:10:05 NO        D
         1     200921                                           2012-08-14 04:10:05    4.4765E+10 2012-08-14 04:37:55 NO        D
         1     200922                                           2012-08-14 04:37:55    4.4765E+10 2012-08-14 04:39:29 NO        D
         1     200923                                           2012-08-14 04:39:29    4.4766E+10 2012-08-14 04:41:02 NO        D
         1     200924                                           2012-08-14 04:41:02    4.4766E+10 2012-08-14 05:08:57 NO        D
         1     200925                                           2012-08-14 05:08:57    4.4767E+10 2012-08-14 05:10:24 NO        D
         1     200926                                           2012-08-14 05:10:24    4.4767E+10 2012-08-14 05:37:49 NO        D
         1     200927                                           2012-08-14 05:37:49    4.4768E+10 2012-08-14 05:39:11 NO        D
         1     200928                                           2012-08-14 05:39:11    4.4768E+10 2012-08-14 05:40:57 NO        D
         1     200929 thread_1_seq_200929.2778.791273315        2012-08-14 05:40:57    4.4769E+10 2012-08-14 06:08:34 NO        A
         1     200930 thread_1_seq_200930.499.791273369         2012-08-14 06:08:34    4.4769E+10 2012-08-14 06:09:28 NO        A

   THREAD#  SEQUENCE# A_NAME                                   A_FIRST_TIME        FIRST_CHANGE# A_NEXT_TIME         APPLIED   S
---------- ---------- ---------------------------------------- ------------------- ------------- ------------------- --------- -
         1     200931 thread_1_seq_200931.3047.791274471        2012-08-14 06:09:28    4.4769E+10 2012-08-14 06:27:51 NO        A
         1     200932 thread_1_seq_200932.1060.791275123        2012-08-14 06:27:51    4.4770E+10 2012-08-14 06:38:42 NO        A
         1     200933 thread_1_seq_200933.2791.791275207        2012-08-14 06:38:42    4.4770E+10 2012-08-14 06:40:07 NO        A
         1     200934 thread_1_seq_200934.1812.791276903        2012-08-14 06:40:07    4.4771E+10 2012-08-14 07:08:23 NO        A
         1     200935 thread_1_seq_200935.1053.791276997        2012-08-14 07:08:23    4.4771E+10 2012-08-14 07:09:56 NO        A
         1     200936 thread_1_seq_200936.1054.791278045        2012-08-14 07:09:56    4.4772E+10 2012-08-14 07:27:24 NO        A
         1     200937 thread_1_seq_200937.1055.791278723        2012-08-14 07:27:24    4.4772E+10 2012-08-14 07:38:42 NO        A
         1     200938 thread_1_seq_200938.1192.791278793        2012-08-14 07:38:42    4.4773E+10 2012-08-14 07:39:51 NO        A
         1     200939 thread_1_seq_200939.1207.791280441        2012-08-14 07:39:51    4.4773E+10 2012-08-14 08:07:20 NO        A
         1     200940 thread_1_seq_200940.1212.791280531        2012-08-14 08:07:20    4.4774E+10 2012-08-14 08:08:51 NO        A
         1     200941 thread_1_seq_200941.482.791280607         2012-08-14 08:08:51    4.4774E+10 2012-08-14 08:10:06 NO        A
现状分析

备库上200535到201001序列的日志没有,而其他的日志正常接收,而主库上200535到200928已经删除了,所以这段日志根本没有办法传输到备库上面.

解决方法

(1),查看主库上是否有这些归档日志或者其备份文件,如果有的话,将这些归档日志拷贝到备库,并注册到备库的控制文件.
Then copy these log files to physical standby database and register them using the ALTER DATABASE REGISTER LOGFILE statement on standby database. After registering these log files on the standby database, we can restart Redo Apply(SQL> RECOVER STANDBY DATABASE;).
(2),如果主库的归档日志连备份也没有了,则需要重新配置,从主库的全备份恢复备库.

拷贝主库的rman备份,通过standby复制的方式进行同步,即相当重新配置了data guard.

原因分析

检查主备库的数据和归档备份策略,发现主库的归档日志主库还未同步到备库,就已经被备份删除策略删除掉了,所以需要调整备份和删除策略.
原来的备份脚本如下:这是按照冗余策略进行了删除,这种删除不会检查是否已经被备库使用.

rman target / nocatalog  <<EOF
spool log to /export/home/oracle/scripts/log/PDHCPC_rman.log append;
run {
BACKUP DATABASE TAG Full_bk;
BACKUP CURRENT ControlFile FORMAT '/export/home/oracle/rman_backup/controlfile/ctl_%U';
sql 'alter system archive log current';
RESTORE DATABASE VALIDATE;
CROSSCHECK BACKUP OF DATABASE;
REPORT OBSOLETE;
DELETE NOPROMPT OBSOLETE;
}
LIST BACKUP SUMMARY;
quit;
EOF

备库恢复性能状态
监控standby的恢复进度,最重要的方法是查看v$archived_log两边的日志差异.
v$recovery_progress查看恢复的细节,注意这个视图在mount恢复状态下是无法查询的,只有在read only状态下可以.

select to_char(start_time,'yyyy-mm-dd hh24:mi:ss') Start_Time,
       TYPE,ITEM,UNITS,Total,
       to_char(timestamp,'yyyy-mm-dd hh24:mi:ss') timestamp 
from   v$recovery_progress;
START_TIME          TYPE                 ITEM                             UNITS                     TOTAL TIMESTAMP
------------------- -------------------- -------------------------------- -------------------- ---------- -------------------
2012-08-16 20:11:06 Media Recovery       Log Files                        Files                       200
2012-08-16 20:11:06 Media Recovery       Active Apply Rate                KB/sec                     5655
2012-08-16 20:11:06 Media Recovery       Average Apply Rate               KB/sec                       64
2012-08-16 20:11:06 Media Recovery       Redo Applied                     Megabytes                  2441
2012-08-16 20:11:06 Media Recovery       Last Applied Redo                SCN+Time             1117581256 2012-08-17 06:27:01
2012-08-16 20:11:06 Media Recovery       Active Time                      Seconds                     370
2012-08-16 20:11:06 Media Recovery       Apply Time per Log               Seconds                       1
2012-08-16 20:11:06 Media Recovery       Elapsed Time                     Seconds                   37931
2012-08-15 20:11:12 Media Recovery       Active Time                      Seconds                     411
2012-08-15 20:11:12 Media Recovery       Apply Time per Log               Seconds                       1
2012-08-15 20:11:12 Media Recovery       Elapsed Time                     Seconds                   31742
2012-08-07 20:11:06 Media Recovery       Log Files                        Files                       208
2012-08-07 20:11:06 Media Recovery       Last Applied Redo                SCN+Time              672050995 2012-08-08 06:27:38
2012-08-07 20:11:06 Media Recovery       Active Time                      Seconds                     349
2012-08-07 20:11:06 Media Recovery       Apply Time per Log               Seconds                       1
2012-08-07 20:11:06 Media Recovery       Elapsed Time                     Seconds                   37059
2012-07-13 20:11:09 Media Recovery       Last Applied Redo                SCN+Time             1.8447E+19 2012-07-14 06:20:35
2012-07-13 20:11:09 Media Recovery       Active Time                      Seconds                    1531
2012-07-13 20:11:09 Media Recovery       Elapsed Time                     Seconds                   36684
v$dataguard_stats查看redo传输和redo应用的延迟情况.
select name,value,unit,time_computed from v$dataguard_stats
NAME                             VALUE                          UNIT                           TIME_COMPUTED
-------------------------------- ------------------------------ ------------------------------ ------------------------------
apply finish time                +00 00:05:58.9                 day(2) to second(1) interval   17-AUG-2012 20:10:12
apply lag                        +00 22:30:30                   day(2) to second(0) interval   17-AUG-2012 20:10:12
estimated startup time           491                            second                         17-AUG-2012 20:10:12
standby has been open            Y                                                             17-AUG-2012 20:10:12
transport lag                    +00 09:07:39                   day(2) to second(0) interval   17-AUG-2012 20:10:12

v$dataguard_status查看主库,备库的进程日志

col lines 200 pages 100
col message for a120
select facility,to_char(timestamp,'dd-mon-yyyy hh24:mi') time,message from v$dataguard_status;
v$managed_standby查看归档,MRP,RFS的进程状态

select process,pid,status,thread#,group#,sequence#,delay_mins from v$managed_standby;
PROCESS          PID STATUS          THREAD# GROUP#                                    SEQUENCE# DELAY_MINS
--------- ---------- ------------ ---------- ---------------------------------------- ---------- ----------
ARCH           27188 CONNECTED             0 N/A                                               0          0
ARCH           27190 CONNECTED             0 N/A                                               0          0
MRP0            6790 WAIT_FOR_LOG          3 N/A                                          238828          0
RFS            27227 IDLE                  0 N/A                                               0          0
RFS            27229 IDLE                  0 N/A                                               0          0
RFS            27231 IDLE                  0 N/A                                               0          0

完结

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值