背景:
有时由于dr机房故障或机房整改等,需要将dataguard同步dr临时关掉,此时为了pr能够正常备份并删除归档,需要将pr的log_archive_dest_2置空,即:
alter system set log_archive_dest_2='' scope=both;
恢复dr可以采用duplicate重建dataguard,但是如果pr和dr相差的归档数量少于10000个,可以采用增量恢复dr,见情形1
查看pr和dr归档相差:
select a.thread#,(log_archived - log_applied) log_gap from (select max(sequence#) log_archived,thread# from gv$archived_log where dest_id = 1 and archived = 'YES' group by thread#) a ,(select max(sequence#) log_applied,thread# from gv$archived_log where dest_id = 2 and APPLIED = 'YES' group by thread#) b where a.thread#=b.thread#;
duplicate或者增量恢复完成后,需要将log_archive_dest_2恢复成原来的配置,如:
alter system set LOG_ARCHIVE_DEST_2='SERVICE=dr_cc LGWR ASYNC VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) db_unique_name=dr_cc' scope=both;
情形1:dr端归档未应用丢失,pr端归档已被删,dr恢复
dr端还未应用就被归档删除脚本清除,pr端的归档也被删除,通常我们是利用全备恢复dr,经测试可以利用基于主库scn的备份方式恢复dr
1. dr端查询状态为UNRESOLVABLE GAP:
Select switchover_status from v$database;
UNRESOLVABLE GAP
2. dr取消归档应用MRP进程
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
3. 确定备份的scn号
(1)查询数据文件头的最小scn号scn1
select to_char(min(checkpoint_change#)) from v$datafile_header;
(2)查询dr的current_scn,scn2
select to_char(current_scn) from v$database;
取scn1和scn2的较小值为备份的起始scn
4. 确定pr没有新文件创建
SELECT FILE#, NAME FROM V$DATAFILE WHERE CREATION_CHANGE# > min(scn1,scn2);
5. 基于scn备份pr
run{
allocate channel d1 type disk;
allocate channel d2 type disk;
allocate channel d3 type disk;
allocate channel d4 type disk;
backup incremental from SCN 34855116 database format '/oradata/archivelog/ForStandby_%U' tag 'FORSTANDBY';
backup current controlfile for standby format '/oradata/archivelog/ForStandbyCTRL.bck';
release channel d1;
release channel d2;
release channel d3;
release channel d4;
}
6. 将备份集从新pr传到新dr,并恢复
restore standby controlfile from '/oradata/archivelog/ForStandbyCTRL.bck';
alter database mount;
CATALOG START WITH '/oradata/archivelog';
RECOVER DATABASE NOREDO; --noredo的意思是rman recover备份集而不恢复本地的归档
7.clear standby redo
set pages 9999
select 'ALTER DATABASE CLEAR LOGFILE GROUP '||GROUP#||';' from v$logfile where TYPE='STANDBY' group by GROUP# order by GROUP# ;
8. 新dr打开并测试dataguard能否同步
alter database open readonly;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;
情形2:pr端异常掉电,无法open,dr需要failover成主库,原pr恢复
有时pr异常掉电,只能mount无法open,且一时半会无法恢复,需要手动将dr failover成新pr,供业务使用。我们通常的做法是将原pr根据新pr的全备重建,这样比较耗时。
1. 原dr failover
(1)取消归档应用MRP进程
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
(2)dr failover
alter database recover managed standby database finish force;
alter database commit to switchover to primary;
或者直接用命令
ALTER DATABASE ACTIVATE STANDBY DATABASE;
记录dr failover成主库的scn1:
select to_char(standby_became_primary_scn) from v$database;
(3)open新pr
shutdown immediate
startup
注意此时新pr的归档从1开始重新计算
2. 原pr查询需要恢复的起始scn
startup mount
由于导致数据库shutdown abort,没写检查点,导致归档没写入数据文件,导致数据文件的scn号较小,找出数据文件头最小的checkpoint#,记为scn2
select to_char(min(checkpoint_change#)) from v$datafile_header;
检查下原pr当前的归档日志情况,并查出V$ARCHIVED_LOG中相应的NEXT_CHANGE#(归档的最后scn)值,记为scn3
如上图,原pr当前的redo序列号是6(redo的序列号可以在v$log中得出,且自建库open开始递增),即应该看已归档的5的NEXT_CHANGE#
select applied,SEQUENCE#,FIRST_CHANGE# ,NEXT_CHANGE# from V$ARCHIVED_LOG order by SEQUENCE#;
上面scn1,scn2,scn3,取三者的最小值当作备份的其实scn,否则原pr recover成功后无法open成功(报ORA-01152或者ORA-01190错误),当前取34855116
3. 新pr备份
保证scn2之后没有新文件创建:
SELECT FILE#, NAME FROM V$DATAFILE WHERE CREATION_CHANGE# > 34855116;
run{
allocate channel d1 type disk;
allocate channel d2 type disk;
allocate channel d3 type disk;
allocate channel d4 type disk;
backup incremental from SCN 34855116 database format '/oradata/archivelog/ForStandby_%U' tag 'FORSTANDBY';
backup current controlfile for standby format '/oradata/archivelog/ForStandbyCTRL.bck';
release channel d1;
release channel d2;
release channel d3;
release channel d4;
}
4. 新dr恢复
将备份集从新pr传到新dr,并恢复
restore standby controlfile from '/oradata/archivelog/ForStandbyCTRL.bck';
alter database mount;
CATALOG START WITH '/oradata/archivelog';
RECOVER DATABASE NOREDO; --noredo的意思是rman recover备份集而不恢复本地的归档
5.clear standby redo
set pages 9999
select 'ALTER DATABASE CLEAR LOGFILE GROUP '||GROUP#||';' from v$logfile where TYPE='STANDBY' group by GROUP# order by GROUP# ;
6. 新dr打开并测试dataguard能否同步
alter database open readonly;
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;
注意:
这两种情形能操作的前提:
- (新)pr端没有新的数据文件创建
- 针对pr和dr相差的归档不是很多的情况,因为基于scn的备份恢复会扫描整个库,判断数据块需不需要进行备份,有时如果pr和dr相差的归档很大,反而不如全备快(恢复比全备快,因为全备恢复需要先restore再recover,而此种方式则不需要restore)
某环境相差400多个redo,每个redo 512M,备份了1个半小时左右,恢复了10分钟左右(但是当时旧磁阵io慢、且机型为solaris,不可当作基准参考)
某相差4000多个redo,每个redo2G,备份了68分钟,备份集为300G左右,恢复了35分钟(可以作为基准参考)
- 保证pr和dr的文件路径和文件名完全一样,如果两端由于omf的原因或者配置了db_file_name_convert参数,两端的文件目录或文件名不同,可以按照以下步骤恢复:
比如pr的文件:
dr的数据文件位置:
明显两端有不同的文件名或者文件位置,参考mos文档【Doc ID 836986.1】【5)---12)步骤】
(1)传备份集到dr后,recover数据库
CATALOG START WITH '/expdb/incre_cc';
RECOVER DATABASE NOREDO;
(2)恢复备份的控制文件
SHUTDOWN IMMEDIATE ;
STARTUP NOMOUNT;
restore standby controlfile from '/expdb/incre_cc/ForStandbyCTRL.bck';
(3)catalog dr的数据文件
list incarnation;
CATALOG START WITH '+CCSYS81DG/ccdb/datafile/'
switch datafile <number> to copy;
(4)dr clear standby redo
set pages 9999
select 'ALTER DATABASE CLEAR LOGFILE GROUP '||GROUP#||';' from v$logfile where TYPE='STANDBY' group by GROUP# order by GROUP# ;
(5)dr起mrp进程
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;
- 保证新pr端的归档备份或者删除脚本任务被disable
参考文档:
11g Steps to perform for Rolling Forward a Physical Standby Database using RMAN Incremental Backup. (Doc ID 836986.1)