本期将为大家分享”Oracle Data Guard Gap日志间隙SCN增量备份恢复”案例。
欢迎关注“数据库运维之道”公众号,一起学习数据库技术!
关键字:Data Guard 日志间隙、Redo Gap、Archive Gap、Data Guard Gap、ORA-00308、ORA-27037、Failed to request gap sequence
由于Oracle Primary数据库产生的归档日志传输到Standby数据库之前被删除,从而导致Standby数据库出现GAP。为了解决这个问题,本文采用增量备份前滚的方式进行修复。
数据库日志信息如下:
FAL[client]: Failed to request gap sequence
FAL[client]:All defined FAL servers have been attempted.
或者
Errors in file /u01/app/oracle/diag/rdbms/ywzdh1/ywzd2/trace/ywzd2_arc2_2340.trc:
ORA-00308: cannot open archived log '/u01/app/archivelog/1_69_1107964745.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
登录备库查看Archive Gap情况,归档日志从编号5955开始就无法往下应用到备库。
SELECT THREAD#, LOW_SEQUENCE#, HIGH_SEQUENCE# FROM V$ARCHIVE_GAP;
THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
---------- ------------- --------------
1 5955 5971
2 6043 6058
检查主备日志差异数量及日志延迟情况。
备库检查主从同步日志差异
SELECT ARCH.THREAD# "Thread", ARCH.SEQUENCE# "Last in Sequence", APPL.SEQUENCE# "Last Applied Sequence", (ARCH.SEQUENCE# - APPL.SEQUENCE#) "Difference"
FROM
(SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$ARCHIVED_LOG GROUP BY THREAD#)) ARCH,
(SELECT THREAD# ,SEQUENCE# FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY GROUP BY THREAD#)) APPL
WHERE
ARCH.THREAD# = APPL.THREAD#
ORDER BY 1;
Thread Last in Sequence Last Applied Sequence Difference
---------- ---------------- --------------------- ----------
1 5971 5955 16
2 6058 6043 15
登录备库,待应用的归档日志是否已传输到备库(备库)
set linesize 10000
col name for a80
select name,BACKUP_COUNT,DELETED from gv$archived_log
where thread#=1 and SEQUENCE#>=5955 and SEQUENCE#<=5969
and name <>'standby_ywzddg' order by SEQUENCE#;
备库查询最小SCN,以最小SCN在主库进行增量备份。
SELECT to_char(CURRENT_SCN) CURRENT_SCN FROM V$DATABASE;
CURRENT_SCN
----------------------------------------
38356313018 --最小scn值
select min(fhscn) from x$kcvfh;
MIN(FHSCN)
----------------
38356313019
select min(kc.fhscn) from x$kcvfh kc, v$datafile dd where kc.hxfil =dd.file# and dd.enabled != 'READ ONLY';
MIN(KC.FHSCN)
----------------
select min(kc.fhscn) from x$kcvfh kc, v$datafile dd where kc.hxfil =dd.file#;
MIN(KC.FHSCN)
----------------
38356313019
select distinct to_char(checkpoint_change#) from v$datafile;
TO_CHAR(CHECKPOINT_CHANGE#)
----------------------------------------
38356313019
select distinct to_char(checkpoint_change#) from v$datafile_header;
TO_CHAR(CHECKPOINT_CHANGE#)
----------------------------------------
38356313019
备库停止日志应用,并关闭数据库。
alter database recover managed standby database cancel;
srvctl stop database -d ywzddg
确认主备GAP期间是否新增数据文件。
select file# from v$datafile where creation_change# > =38356313018;
基于异常的scn时间点,对主库进行一次手动增量备份,然后将增量备份恢复至备库。
主库增量备份并传输到备库上,主库进行增量备份
run
{
allocate channel ch00 type disk;
allocate channel ch01 type disk;
backup incremental from scn 38356313018 database format '/home/oracle/ywzd_%U';
release channel ch00;
release channel ch01;
}
scp ywzd* 192.168.220.2:/home/oracle/ywzddg
主库上创建standby controlfile文件并传输到备库。
SQL> alter database create standby controlfile as '/home/oracle/control_ywzd.ctl';
scp control_ywzd.ctl 192.168.220.2:/home/oracle/ywzddg
登录asm磁盘组,通过cp命令将spfile和controlfile拷贝到/tmp目录下进行备份。执行select file_id,file_name from dba_data_files语句,备份数据文件名称。
备库恢复控制文件
RMAN> STARTUP NOMOUNT;
RMAN> RESTORE STANDBY CONTROLFILE FROM '/home/oracle/ywzddg/control_ywzd.ctl';
输出日志如下:
Starting restore at 19-SEP-2023 12:53:18
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=71 instance=ywzddg2 device type=DISK
channel ORA_DISK_1: copied control file copy
output file name=+ywzd/ywzddg/CONTROLFILE/current.256.1066861809
Finished restore at 19-SEP-2023 12:53:19
恢复控制文件之后,将数据库启动到mount状态。
RMAN> alter database mount standby database;
RMAN> report schema;
输出日志如下:
List of Permanent Datafiles
===========================
File Size(MB) Tablespace RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1 0 SYSTEM *** +ywzd/ywzddg/datafile/system.261.1045757579
2 0 SYSAUX *** +ywzd/ywzddg/datafile/sysaux.262.1045757581
3 0 UNDOTBS1 *** +ywzd/ywzddg/datafile/undotbs1.263.1045757583
4 0 UNDOTBS2 *** +ywzd/ywzddg/datafile/undotbs2.265.1045757589
5 0 USERS *** +ywzd/ywzddg/datafile/users.266.1045757591
备库上进行恢复,先将主库的增量备份文件注册到控制文件。
RMAN> CATALOG START WITH '/home/oracle/ywzddg';
输出日志:
searching for all files that match the pattern /home/oracle/ywzddg
List of Files Unknown to the Database
=====================================
File Name: /home/oracle/ywzddg/ywzd_nn26or7j_1_1
File Name: /home/oracle/ywzddg/ywzd_np26or7n_1_1
File Name: /home/oracle/ywzddg/ywzd_no26or7j_1_1
File Name: /home/oracle/ywzddg/control_ywzd.ctl
Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done
List of Cataloged Files
=======================
File Name: /home/oracle/ywzddg/ywzd_nn26or7j_1_1
File Name: /home/oracle/ywzddg/ywzd_np26or7n_1_1
File Name: /home/oracle/ywzddg/ywzd_no26or7j_1_1
File Name: /home/oracle/ywzddg/control_ywzd.ctl
由于主备数据文件名称不一样,调整参数以便后面调整数据文件名。
alter system set standby_file_management=MANUAL;
查看数据文件与控制文件头SCN信息,发现数据文件头的信息无法识别出来。
SQL> select file#,to_char(checkpoint_change#),to_char(CHECKPOINT_TIME,'yyyy-mm-dd hh24:mi:ss') from v$datafile;
FILE# TO_CHAR(CHECKPOINT_CHANGE#) TO_CHAR(CHECKPOINT_
---------- ---------------------------------------- -------------------
1 38359206434 2023-09-19 12:47:48
2 38359206434 2023-09-19 12:47:48
3 38359206429 2023-09-19 12:47:47
4 38359206429 2023-09-19 12:47:47
5 38359206434 2023-09-19 12:47:48
6 38359206434 2023-09-19 12:47:48
7 38359206429 2023-09-19 12:47:47
8 38359206434 2023-09-19 12:47:48
9 38359206429 2023-09-19 12:47:47
10 38359206434 2023-09-19 12:47:48
11 38359206429 2023-09-19 12:47:47
12 38359206429 2023-09-19 12:47:47
13 38359206434 2023-09-19 12:47:48
SQL> select file#,to_char(checkpoint_change#),to_char(CHECKPOINT_TIME,'yyyy-mm-dd hh24:mi:ss') from v$datafile_header;
FILE# TO_CHAR(CHECKPOINT_CHANGE#) TO_CHAR(CHECKPOINT_
---------- ---------------------------------------- -------------------
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 38356313019 2023-09-16 10:02:24
12 0
13 0
可以看到数据文件头的信息与控制文件不匹配。通过rename将控制文件里的数据文件名调整为实际的文件名。
ALTER DATABASE RENAME FILE '+ywzd/ywzddg/datafile/system.261.1045757579' to '+ywzd/ywzddg/DATAFILE/system.264.1066862341';
ALTER DATABASE RENAME FILE '+ywzd/ywzddg/datafile/sysaux.262.1045757581' to '+ywzd/ywzddg/DATAFILE/sysaux.262.1066862315';
ALTER DATABASE RENAME FILE '+ywzd/ywzddg/datafile/undotbs1.263.1045757583' to '+ywzd/ywzddg/DATAFILE/undotbs1.265.1066862341';
ALTER DATABASE RENAME FILE '+ywzd/ywzddg/datafile/users.266.1045757591' to '+ywzd/ywzddg/DATAFILE/users.269.1066862371';
进行增量恢复,备库重新接收并应用日志。
run
{
allocate channel ch00 type disk;
RECOVER DATABASE;
release channel ch00;
}
SQL> alter database recover managed standby database using current logfile disconnect from session;
同步情况检查,调整相关参数并开启数据库同步。
1、同步情况检查
select t.NAME,t.SWITCHOVER_STATUS,t.OPEN_MODE,t.DATABASE_ROLE from v$database t;
select thread#, max(sequence#) arc, max(decode(registrar||','||applied,'RFS,YES',sequence#,0)) app
from v$archived_log where resetlogs_id = (SELECT resetlogs_id FROM v$database_incarnation
WHERE status = 'CURRENT' AND rownum = 1) group by thread#;
select * from v$archive_gap;
2、调整相关参数
调整相关参数:alter system set standby_file_management=auto;
alter system set standby_file_management=auto;
3、关库并开库
shutdown immediate
srvctl start database -d ywzddg
4、备库重新接收并应用日志
SQL> alter database recover managed standby database using current logfile disconnect from session;
5、同步情况检查
select t.NAME,t.SWITCHOVER_STATUS,t.OPEN_MODE,t.DATABASE_ROLE from v$database t;
NAME SWITCHOVER_STATUS OPEN_MODE DATABASE_ROLE
--------- -------------------- -------------------- ----------------
ywzd NOT ALLOWED READ ONLY WITH APPLY PHYSICAL STANDBY
select thread#, max(sequence#) arc, max(decode(registrar||','||applied,'RFS,YES',sequence#,0)) app
from v$archived_log where resetlogs_id = (SELECT resetlogs_id
FROM v$database_incarnation WHERE status = 'CURRENT' AND rownum = 1)
group by thread#;
SYS@ywzddg2>
THREAD# ARC APP
---------- ---------- ----------
1 5977 5976
2 6064 6064
select * from v$archive_gap;
no rows selected -->GAP已消失
什么是归档日志间隙(Archive Gap)?归档日志间隙是在 Standby 端一系列丢失的重做日志,导致日志应用服务无法继续运行。这通常发生在 Standby 端无法从 Primary Database 接收重做日志或重做日志在 Standby Database 上不可用时。
常见原因有:(1)网络连接断开或者日志传输服务停止;(2)Standby Database 不可用;(3)日志传输服务的配置错误;(4)Standby 端的 IO 问题;(5)归档日志在应用到 Standby 前被手工删除;(6)Primary 和 Standby 之间的网络带宽不足。一旦在 Standby Database 上存在归档间隙,Log Apply Services 就会卡住,直到日志间隙(Gap)被解决,例如。丢失的 Redo 被重新获取并且在 Standby 端可用。然后,日志应用服务可以选中它并继续处理。
场景1:由于Oracle Primary数据库产生的归档日志已被备份,但是归档日志未传输到Standby数据库之前被删除,从而导致Standby数据库出现GAP。这种情况可以通过恢复归档日志解决。
1、若主库的归档日志已被备份且已被删除(主库),需要从nbu备份系统将归档恢复出来。
RUN {
allocate channel ch01 type 'sbt_tape';
send 'nb_ora_serv=NBU_MASTER服务器主机名';
send 'nb_ora_client=NBU客户端主机名';
set archivelog destination to '/home/oracle/arch';
restore archivelog from sequence 1152 thread 2; -->从某个序列开始恢复归档
restore archivelog from logseq 15143 until logseq 15146 thread 2; --> 恢复某个序列段的归档
restore archivelog time between "to_date('20220103 00','yyyymmdd hh24')" and "to_date('20220106 23','yyyymmdd hh24')"; -->恢复某个时间段的归档
restore archivelog all; -->恢复所有的归档
RELEASE CHANNEL ch01;
}
2、待应用的归档日志是否已传输到备库(备库)
set linesize 10000
col name for a80
select name,BACKUP_COUNT,DELETED from gv$archived_log
where thread#=1 and SEQUENCE#>=5955
and SEQUENCE#<=5969 and name <>'standby_orcldg' order by SEQUENCE#;
3、如果还有差异,重启下同步进程。
alter database recover managed standby database cancel;
alter database recover managed standby database using current logfile disconnect from session;
4、检查主从同步日志差异(备库)
select thread#, max(sequence#) arc, max(decode(registrar||','||applied,'RFS,YES',sequence#,0)) app
from v$archived_log where resetlogs_id = (SELECT resetlogs_id FROM v$database_incarnation
WHERE status = 'CURRENT' AND rownum = 1) group by thread#;
场景2:由于特殊情况,主库归档日志没有正常传输到备库,并且还存在。可以通过手动拷贝方式传输到备库,并注册到备库。
1、使用RMAN从ASM设备中拷贝出来所需归档文件
run {
copy archivelog '+YWZD/YWZD/1_29_856078807.arc' to '/home/oracle/1_29_856078807.arc';
copy archivelog '+YWZD/YWZD/1_30_856078807.arc' to '/home/oracle/1_30_856078807.arc';
copy archivelog '+YWZD/YWZD/2_14_856078807.arc' to '/home/oracle/2_14_856078807.arc';
copy archivelog '+YWZD/YWZD/2_15_856078807.arc' to '/home/oracle/2_15_856078807.arc';
copy archivelog '+YWZD/YWZD/2_16_856078807.arc' to '/home/oracle/2_16_856078807.arc';
}
2、在备库查看归档路径
SQL> archive log list;
3、在主库将未同步的归档日志拷贝到备库
scp *arc 备库:/u01/app/oracle/arch
4、在备库取消日志应用
SQL> alter database recover managed standby database cancel;
5、注册日志文件
SQL> alter database register logfile '/u01/app/oracle/arch/1_29_856078807.arc';
……
SQL> alter database register logfile '/u01/app/oracle/arch/2_16_856078807.arc';
6:开始日志应用
SQL> alter database recover managed standby database disconnect from session;
场景3:遇到极端情况,需要对备库进行重建。
1、通过RMAN脚本进行数据文件备份
run
{
allocate channel c1 type disk;
backup incremental level 0 database format '/home/oracle/db_level0_%d_%T_%U.bak'
plus archivelog format '/home/oracle/arch_%d_%T_%U.bak';
release channel c1;
}
2、控制文件备份
run
{
allocate channel c1 type disk;
backup current controlfile for standby format='/home/oracle/control_ywzdg.ctl';
release channel c1;
}
3、拷贝主库备份至备库
scp -r /home/oracle/control_ywzdg.ctl oracle@192.168.xx.xx:/home/oracle/rman
4、使用rman恢复备库控制文件:
export ORACLE_SID=ywzdg2
rman target /
startup nomount
restore standby controlfile from '/home/oracle/rman/control_ywzdg.ctl';
alter database mount standby database;
REPORT SCHEMA ;
5、使用rman命令恢复备库数据文件
run
{
allocate channel ch00 type disk;
restore database;
release channel ch00;
}
6、使用rman命令还原备库数据文件
run
{
allocate channel ch00 type disk;
recover database;
release channel ch00;
}
run
{
allocate channel ch00 type disk;
set until logseq 2189 thread 2;
recover database;
release channel ch00;
}
7、开启日志同步应用:
alter database recover managed standby database using current logfile disconnect from session;
8、验证同步情况
select t.NAME,t.SWITCHOVER_STATUS,t.OPEN_MODE,t.DATABASE_ROLE from v$database t;
select thread#, max(sequence#) arc, max(decode(registrar||','||applied,'RFS,YES',sequence#,0)) app
from v$archived_log where resetlogs_id = (SELECT resetlogs_id
FROM v$database_incarnation WHERE status = 'CURRENT' AND rownum = 1) group by thread#;
- Data Guard Gap Detection and Resolution Possibilities (Doc ID 1537316.1)
- Data Guard 日志间隙检测及解决方案 (Doc ID 2403964.1)
以上就是本期关于“Oracle Data Guard Gap日志间隙SCN增量备份恢复”案例分享。希望能给大家带来帮助。
欢迎关注“数据库运维之道”公众号,一起学习数据库技术!