Oracle Data Guard Gap日志间隙SCN增量备份恢复案例分享

        本期将为大家分享”Oracle Data Guard Gap日志间隙SCN增量备份恢复”案例。

        欢迎关注“数据库运维之道”公众号,一起学习数据库技术!

        关键字:Data Guard 日志间隙、Redo Gap、Archive Gap、Data Guard Gap、ORA-00308、ORA-27037、Failed to request gap sequence

        由于Oracle Primary数据库产生的归档日志传输到Standby数据库之前被删除,从而导致Standby数据库出现GAP。为了解决这个问题,本文采用增量备份前滚的方式进行修复。

数据库日志信息如下:
FAL[client]: Failed to request gap sequence
FAL[client]:All defined FAL servers have been attempted.

或者
Errors in file /u01/app/oracle/diag/rdbms/ywzdh1/ywzd2/trace/ywzd2_arc2_2340.trc:
ORA-00308: cannot open archived log '/u01/app/archivelog/1_69_1107964745.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

        登录备库查看Archive Gap情况,归档日志从编号5955开始就无法往下应用到备库。

SELECT THREAD#, LOW_SEQUENCE#, HIGH_SEQUENCE# FROM V$ARCHIVE_GAP;
   THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
---------- ------------- --------------
     1        5955       5971     
     2        6043       6058

        检查主备日志差异数量及日志延迟情况。

备库检查主从同步日志差异
SELECT ARCH.THREAD# "Thread", ARCH.SEQUENCE# "Last in Sequence", APPL.SEQUENCE# "Last Applied Sequence", (ARCH.SEQUENCE# - APPL.SEQUENCE#) "Difference"
FROM
(SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$ARCHIVED_LOG GROUP BY THREAD#)) ARCH,
(SELECT THREAD# ,SEQUENCE# FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY GROUP BY THREAD#)) APPL
WHERE
ARCH.THREAD# = APPL.THREAD#
ORDER BY 1;

    Thread Last in Sequence Last Applied Sequence Difference
---------- ---------------- --------------------- ----------
     1           5971             5955      16
     2           6058             6043      15


登录备库,待应用的归档日志是否已传输到备库(备库)
set linesize 10000
col name for a80
select name,BACKUP_COUNT,DELETED from gv$archived_log 
where thread#=1 and SEQUENCE#>=5955 and SEQUENCE#<=5969 
and name <>'standby_ywzddg' order by SEQUENCE#;

        备库查询最小SCN,以最小SCN在主库进行增量备份。

SELECT to_char(CURRENT_SCN) CURRENT_SCN FROM V$DATABASE;

CURRENT_SCN
----------------------------------------
38356313018 --最小scn值

select min(fhscn) from x$kcvfh;

MIN(FHSCN)
----------------
38356313019

select min(kc.fhscn) from x$kcvfh kc, v$datafile dd where kc.hxfil =dd.file# and dd.enabled != 'READ ONLY';

MIN(KC.FHSCN)
----------------


select min(kc.fhscn) from x$kcvfh kc, v$datafile dd where kc.hxfil =dd.file#;

MIN(KC.FHSCN)
----------------
38356313019

select distinct to_char(checkpoint_change#) from v$datafile;

TO_CHAR(CHECKPOINT_CHANGE#)
----------------------------------------
38356313019

select distinct to_char(checkpoint_change#) from v$datafile_header;

TO_CHAR(CHECKPOINT_CHANGE#)
----------------------------------------
38356313019

        备库停止日志应用,并关闭数据库。

alter database recover managed standby database cancel;
srvctl stop database -d ywzddg

        确认主备GAP期间是否新增数据文件。

select file# from v$datafile where creation_change# > =38356313018;

        基于异常的scn时间点,对主库进行一次手动增量备份,然后将增量备份恢复至备库。

主库增量备份并传输到备库上,主库进行增量备份
run 
{
allocate channel ch00 type disk;
allocate channel ch01 type disk;
backup incremental from scn 38356313018 database format '/home/oracle/ywzd_%U';
release channel ch00;
release channel ch01;
}

scp ywzd* 192.168.220.2:/home/oracle/ywzddg

        主库上创建standby controlfile文件并传输到备库。

SQL> alter database create standby controlfile as '/home/oracle/control_ywzd.ctl';
scp control_ywzd.ctl 192.168.220.2:/home/oracle/ywzddg

        登录asm磁盘组,通过cp命令将spfile和controlfile拷贝到/tmp目录下进行备份。执行select file_id,file_name from dba_data_files语句,备份数据文件名称。

        备库恢复控制文件

RMAN> STARTUP NOMOUNT;
RMAN> RESTORE STANDBY CONTROLFILE FROM '/home/oracle/ywzddg/control_ywzd.ctl';
输出日志如下:
Starting restore at 19-SEP-2023 12:53:18
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=71 instance=ywzddg2 device type=DISK
channel ORA_DISK_1: copied control file copy
output file name=+ywzd/ywzddg/CONTROLFILE/current.256.1066861809
Finished restore at 19-SEP-2023 12:53:19

        恢复控制文件之后,将数据库启动到mount状态。

RMAN> alter database mount standby database;

RMAN> report schema;
输出日志如下:
List of Permanent Datafiles
===========================
File Size(MB) Tablespace           RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1    0        SYSTEM               ***     +ywzd/ywzddg/datafile/system.261.1045757579
2    0        SYSAUX               ***     +ywzd/ywzddg/datafile/sysaux.262.1045757581
3    0        UNDOTBS1             ***     +ywzd/ywzddg/datafile/undotbs1.263.1045757583
4    0        UNDOTBS2             ***     +ywzd/ywzddg/datafile/undotbs2.265.1045757589
5    0        USERS                ***     +ywzd/ywzddg/datafile/users.266.1045757591

        备库上进行恢复,先将主库的增量备份文件注册到控制文件。

RMAN> CATALOG START WITH '/home/oracle/ywzddg';
输出日志:
searching for all files that match the pattern /home/oracle/ywzddg
List of Files Unknown to the Database
=====================================
File Name: /home/oracle/ywzddg/ywzd_nn26or7j_1_1
File Name: /home/oracle/ywzddg/ywzd_np26or7n_1_1
File Name: /home/oracle/ywzddg/ywzd_no26or7j_1_1
File Name: /home/oracle/ywzddg/control_ywzd.ctl
Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done
List of Cataloged Files
=======================
File Name: /home/oracle/ywzddg/ywzd_nn26or7j_1_1
File Name: /home/oracle/ywzddg/ywzd_np26or7n_1_1
File Name: /home/oracle/ywzddg/ywzd_no26or7j_1_1
File Name: /home/oracle/ywzddg/control_ywzd.ctl

        由于主备数据文件名称不一样,调整参数以便后面调整数据文件名。

alter system set standby_file_management=MANUAL;

        查看数据文件与控制文件头SCN信息,发现数据文件头的信息无法识别出来。

SQL> select file#,to_char(checkpoint_change#),to_char(CHECKPOINT_TIME,'yyyy-mm-dd hh24:mi:ss') from v$datafile;
     FILE# TO_CHAR(CHECKPOINT_CHANGE#)              TO_CHAR(CHECKPOINT_
---------- ---------------------------------------- -------------------
         1 38359206434                              2023-09-19 12:47:48
         2 38359206434                              2023-09-19 12:47:48
         3 38359206429                              2023-09-19 12:47:47
         4 38359206429                              2023-09-19 12:47:47
         5 38359206434                              2023-09-19 12:47:48
         6 38359206434                              2023-09-19 12:47:48
         7 38359206429                              2023-09-19 12:47:47
         8 38359206434                              2023-09-19 12:47:48
         9 38359206429                              2023-09-19 12:47:47
        10 38359206434                              2023-09-19 12:47:48
        11 38359206429                              2023-09-19 12:47:47
        12 38359206429                              2023-09-19 12:47:47
        13 38359206434                              2023-09-19 12:47:48
        
SQL> select file#,to_char(checkpoint_change#),to_char(CHECKPOINT_TIME,'yyyy-mm-dd hh24:mi:ss') from v$datafile_header;
     FILE# TO_CHAR(CHECKPOINT_CHANGE#)              TO_CHAR(CHECKPOINT_
---------- ---------------------------------------- -------------------
         1 0                                                           
         2 0                                                           
         3 0                                                           
         4 0                                                           
         5 0                                                           
         6 0                                                           
         7 0                                                           
         8 0                                                           
         9 0                                                           
        10 0                                                           
        11 38356313019                              2023-09-16 10:02:24
        12 0                                                           
        13 0 
可以看到数据文件头的信息与控制文件不匹配。通过rename将控制文件里的数据文件名调整为实际的文件名。
ALTER DATABASE RENAME FILE '+ywzd/ywzddg/datafile/system.261.1045757579' to '+ywzd/ywzddg/DATAFILE/system.264.1066862341';
ALTER DATABASE RENAME FILE '+ywzd/ywzddg/datafile/sysaux.262.1045757581' to '+ywzd/ywzddg/DATAFILE/sysaux.262.1066862315';
ALTER DATABASE RENAME FILE '+ywzd/ywzddg/datafile/undotbs1.263.1045757583' to '+ywzd/ywzddg/DATAFILE/undotbs1.265.1066862341';
ALTER DATABASE RENAME FILE '+ywzd/ywzddg/datafile/users.266.1045757591' to '+ywzd/ywzddg/DATAFILE/users.269.1066862371';


        进行增量恢复,备库重新接收并应用日志。

run 
{
allocate channel ch00 type disk;
RECOVER DATABASE;
release channel ch00;
}

SQL> alter database recover managed standby database using current logfile disconnect from session;

        同步情况检查,调整相关参数并开启数据库同步。

1、同步情况检查
select t.NAME,t.SWITCHOVER_STATUS,t.OPEN_MODE,t.DATABASE_ROLE from v$database t;
select thread#, max(sequence#) arc, max(decode(registrar||','||applied,'RFS,YES',sequence#,0)) app 
from v$archived_log where  resetlogs_id = (SELECT resetlogs_id FROM v$database_incarnation 
WHERE status = 'CURRENT' AND rownum = 1) group by thread#;
select * from v$archive_gap;

2、调整相关参数
调整相关参数:alter system set standby_file_management=auto;
alter system set standby_file_management=auto;


3、关库并开库
shutdown immediate
srvctl start database -d ywzddg

4、备库重新接收并应用日志
SQL> alter database recover managed standby database using current logfile disconnect from session;

5、同步情况检查
select t.NAME,t.SWITCHOVER_STATUS,t.OPEN_MODE,t.DATABASE_ROLE from v$database t;
NAME      SWITCHOVER_STATUS    OPEN_MODE        DATABASE_ROLE
--------- -------------------- -------------------- ----------------
ywzd   NOT ALLOWED           READ ONLY WITH APPLY PHYSICAL STANDBY

select thread#, max(sequence#) arc, max(decode(registrar||','||applied,'RFS,YES',sequence#,0)) app 
from v$archived_log where  resetlogs_id = (SELECT resetlogs_id 
FROM v$database_incarnation WHERE status = 'CURRENT' AND rownum = 1) 
group by thread#;

SYS@ywzddg2> 
   THREAD#      ARC         APP
---------- ---------- ----------
     1     5977        5976
     2     6064        6064
    

select * from v$archive_gap;
no rows selected   -->GAP已消失

        什么是归档日志间隙(Archive Gap)?归档日志间隙是在 Standby 端一系列丢失的重做日志,导致日志应用服务无法继续运行。这通常发生在 Standby 端无法从 Primary Database 接收重做日志或重做日志在 Standby Database 上不可用时。

        常见原因有:(1)网络连接断开或者日志传输服务停止;(2)Standby Database 不可用;(3)日志传输服务的配置错误;(4)Standby 端的 IO 问题;(5)归档日志在应用到 Standby 前被手工删除;(6)Primary 和 Standby 之间的网络带宽不足。一旦在 Standby Database 上存在归档间隙,Log Apply Services 就会卡住,直到日志间隙(Gap)被解决,例如。丢失的 Redo 被重新获取并且在 Standby 端可用。然后,日志应用服务可以选中它并继续处理。

        场景1:由于Oracle Primary数据库产生的归档日志已被备份,但是归档日志未传输到Standby数据库之前被删除,从而导致Standby数据库出现GAP。这种情况可以通过恢复归档日志解决。

1、若主库的归档日志已被备份且已被删除(主库),需要从nbu备份系统将归档恢复出来。
RUN {
allocate channel ch01 type 'sbt_tape';
send 'nb_ora_serv=NBU_MASTER服务器主机名'; 
send 'nb_ora_client=NBU客户端主机名';
set archivelog destination to '/home/oracle/arch';

restore archivelog from sequence 1152 thread 2;   -->从某个序列开始恢复归档
restore archivelog from logseq 15143 until logseq 15146 thread 2;  --> 恢复某个序列段的归档
restore archivelog time between "to_date('20220103 00','yyyymmdd hh24')" and "to_date('20220106 23','yyyymmdd hh24')";  -->恢复某个时间段的归档
restore archivelog all;  -->恢复所有的归档


RELEASE CHANNEL ch01;
}

2、待应用的归档日志是否已传输到备库(备库)
set linesize 10000
col name for a80
select name,BACKUP_COUNT,DELETED from gv$archived_log 
where thread#=1 and SEQUENCE#>=5955 
and SEQUENCE#<=5969 and name  <>'standby_orcldg' order by SEQUENCE#;

3、如果还有差异,重启下同步进程。
alter database recover managed standby database cancel;
alter database recover managed standby database using current logfile  disconnect from session;

4、检查主从同步日志差异(备库)
select thread#, max(sequence#) arc, max(decode(registrar||','||applied,'RFS,YES',sequence#,0))  app 
from v$archived_log where   resetlogs_id = (SELECT resetlogs_id FROM v$database_incarnation 
WHERE  status = 'CURRENT' AND rownum = 1) group by thread#;

        场景2:由于特殊情况,主库归档日志没有正常传输到备库,并且还存在。可以通过手动拷贝方式传输到备库,并注册到备库。

1、使用RMAN从ASM设备中拷贝出来所需归档文件
run {
copy archivelog '+YWZD/YWZD/1_29_856078807.arc' to '/home/oracle/1_29_856078807.arc';
copy archivelog '+YWZD/YWZD/1_30_856078807.arc' to '/home/oracle/1_30_856078807.arc';
copy archivelog '+YWZD/YWZD/2_14_856078807.arc' to '/home/oracle/2_14_856078807.arc';
copy archivelog '+YWZD/YWZD/2_15_856078807.arc' to '/home/oracle/2_15_856078807.arc';
copy archivelog '+YWZD/YWZD/2_16_856078807.arc' to '/home/oracle/2_16_856078807.arc';
}

2、在备库查看归档路径
SQL> archive log list;

3、在主库将未同步的归档日志拷贝到备库
scp *arc 备库:/u01/app/oracle/arch

4、在备库取消日志应用
SQL> alter database recover managed standby database cancel;

5、注册日志文件
SQL> alter database register logfile '/u01/app/oracle/arch/1_29_856078807.arc';
……
SQL> alter database register logfile '/u01/app/oracle/arch/2_16_856078807.arc';

6:开始日志应用
SQL> alter database recover managed standby database disconnect from session;

        场景3:遇到极端情况,需要对备库进行重建。

1、通过RMAN脚本进行数据文件备份
run
{
allocate channel c1 type disk;
backup incremental level 0 database format '/home/oracle/db_level0_%d_%T_%U.bak' 
plus archivelog format '/home/oracle/arch_%d_%T_%U.bak';
release channel c1;
}

2、控制文件备份
run
{
allocate channel c1 type disk;
backup current controlfile for standby format='/home/oracle/control_ywzdg.ctl';
release channel c1;
}

3、拷贝主库备份至备库
scp -r /home/oracle/control_ywzdg.ctl oracle@192.168.xx.xx:/home/oracle/rman

4、使用rman恢复备库控制文件:
export ORACLE_SID=ywzdg2
rman target /
startup nomount
restore standby controlfile from '/home/oracle/rman/control_ywzdg.ctl';
alter database mount standby database;
REPORT SCHEMA ; 

5、使用rman命令恢复备库数据文件
run 
{
allocate channel ch00 type disk;


restore database;
release channel ch00;

}
6、使用rman命令还原备库数据文件
run 
{
allocate channel ch00 type disk;
recover database;
release channel ch00;
}


run 
{
allocate channel ch00 type disk;

set until logseq 2189 thread 2;
recover database;
release channel ch00;
}

7、开启日志同步应用:
alter database recover managed standby database using current logfile disconnect from session;


8、验证同步情况
select t.NAME,t.SWITCHOVER_STATUS,t.OPEN_MODE,t.DATABASE_ROLE from v$database t;
select thread#, max(sequence#) arc, max(decode(registrar||','||applied,'RFS,YES',sequence#,0)) app 
from v$archived_log where  resetlogs_id = (SELECT resetlogs_id 
FROM v$database_incarnation WHERE status = 'CURRENT' AND rownum = 1) group by thread#;

  • Data Guard Gap Detection and Resolution Possibilities (Doc ID 1537316.1)
  • Data Guard 日志间隙检测及解决方案 (Doc ID 2403964.1)

        以上就是本期关于“Oracle Data Guard Gap日志间隙SCN增量备份恢复”案例分享。希望能给大家带来帮助。

        欢迎关注“数据库运维之道”公众号,一起学习数据库技术!

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值