问题描述:DG备库硬件故障,主库defer掉日志后,备库做硬件更换,更换完成后启动备库,报错。
处理:
启动备库
02:17:49 > startup
ORACLE instance started.
Total System Global Area 3.4206E+11 bytes
Fixed Size 2275496 bytes
Variable Size 2.6307E+10 bytes
Database Buffers 3.1514E+11 bytes
Redo Buffers 611278848 bytes
Database mounted.
ORA-10458: standby database requires recovery
ORA-01196: file 1 is inconsistent due to a failed media recovery session
ORA-01110: data file 1: '+DATA/sesadg/datafile/system.261.992298667'
查看alert日志,报错如下:
Completed: ALTER DATABASE MOUNT
Fri Apr 30 02:18:09 2021
ALTER DATABASE OPEN
Beginning Standby Crash Recovery.
Serial Media Recovery started
Managed Standby Recovery starting Real Time Apply
Media Recovery of Online Log [Thread=1, Seq=52565]
Recovery of Online Redo Log: Thread 1 Group 13 Seq 52565 Reading mem 0
Mem# 0: +DATA/sesadg/onlinelog/redo13a.log
Mem# 1: +DATA/sesadg/onlinelog/redo13b.log
Media Recovery Log +ARCH/sesadg/archivelog/2021_04_29/thread_2_seq_117672.708.1071175805
SUCCESS: diskgroup ARCH was mounted
Fri Apr 30 02:18:13 2021
ERROR: failed to establish dependency between database sesadg and diskgroup resource ora.ARCH.dg
Fri Apr 30 02:19:15 2021
Standby crash recovery failed to bring standby database to a consistent
point because needed redo hasn't arrived yet.
MRP: Wait timeout: thread 1 sequence# 0
Errors with log +ARCH/sesadg/archivelog/2021_04_29/thread_2_seq_117672.708.1071175805
Standby Crash Recovery aborted due to error 16016.
Errors in file /u01/app/oracle/diag/rdbms/sesadg/sesadg/trace/sesadg_ora_18915.trc:
ORA-16016: archived log for thread 1 sequence# 52565 unavailable
Recovery interrupted!
Some recovered datafiles maybe left media fuzzy
Media recovery may continue but open resetlogs may fail
Completed Standby Crash Recovery.
Errors in file /u01/app/oracle/diag/rdbms/sesadg/sesadg/trace/sesadg_ora_18915.trc:
ORA-10458: standby database requires recovery
ORA-01196: file 1 is inconsistent due to a failed media recovery session
ORA-01110: data file 1: '+DATA/sesadg/datafile/system.261.992298667'
ORA-10458 signalled during: ALTER DATABASE OPEN...
分析alert日志可知,缺失 thread 1 sequence# 52565 归档
到主库查找,发现没有该归档
找备份集,看备份集中是否有该归档日志的备份
RMAN> list backup; --此处只是截图了一部分
[03:39:36]BS Key Size Device Type Elapsed Time Completion Time
[03:39:36]------- ---------- ----------- ------------ ---------------
[03:39:36]34535 36.41G DISK 00:10:05 30-APR-21
[03:39:36] BP Key: 34535 Status: AVAILABLE Compressed: NO Tag: TAG20210430T005833
[03:39:36] Piece Name: /home/oracle/nfs/arch_SESDB_20210430_34825_1
[03:39:36]
[03:39:36] List of Archived Logs in backup set 34535
[03:39:36] Thrd Seq Low SCN Low Time Next SCN Next Time
[03:39:36] ---- ------- ---------- --------- ---------- ---------
[03:39:36] 1 52554 274424598451 29-APR-21 274425396300 29-APR-21
[03:39:36] 1 52555 274425396300 29-APR-21 274426206741 29-APR-21
[03:39:36] 1 52556 274426206741 29-APR-21 274427035053 29-APR-21
[03:39:36] 1 52557 274427035053 29-APR-21 274427919003 29-APR-21
[03:39:36] 1 52558 274427919003 29-APR-21 274428758776 29-APR-21
[03:39:36] 1 52559 274428758776 29-APR-21 274429636533 29-APR-21
[03:39:36] 1 52560 274429636533 29-APR-21 274430537334 29-APR-21
[03:39:36] 1 52561 274430537334 29-APR-21 274431472524 29-APR-21
[03:39:36] 1 52562 274431472524 29-APR-21 274432360894 29-APR-21
[03:39:36] 1 52563 274432360894 29-APR-21 274433301054 29-APR-21
[03:39:36] 1 52564 274433301054 29-APR-21 274434280703 29-APR-21
[03:39:36] 1 52565 274434280703 29-APR-21 274435243252 29-APR-21
[03:39:36] 1 52566 274435243252 29-APR-21 274435795760 29-APR-21
[03:39:36] 1 52567 274435795760 29-APR-21 274436939754 29-APR-21
[03:39:36] 1 52568 274436939754 29-APR-21 274438234760 29-APR-21
[03:39:36] 1 52569 274438234760 29-APR-21 274439273218 29-APR-21
[03:39:36] 1 52570 274439273218 29-APR-21 274440635024 29-APR-21
[03:39:36] 1 52571 274440635024 29-APR-21 274441489210 30-APR-21
[03:39:36] 1 52572 274441489210 30-APR-21 274441541590 30-APR-21
[03:39:36] 2 117654 274424415534 29-APR-21 274424934411 29-APR-21
[03:39:36] 2 117655 274424934411 29-APR-21 274425510477 29-APR-21
[03:39:36] 2 117656 274425510477 29-APR-21 274426087214 29-APR-21
[03:39:36] 2 117657 274426087214 29-APR-21 274426647791 29-APR-21
[03:39:36] 2 117658 274426647791 29-APR-21 274427162456 29-APR-21
[03:39:36] 2 117659 274427162456 29-APR-21 274427707412 29-APR-21
[03:39:36] 2 117660 274427707412 29-APR-21 274428085546 29-APR-21
[03:39:36] 2 117661 274428085546 29-APR-21 274428644562 29-APR-21
[03:39:36] 2 117662 274428644562 29-APR-21 274429179636 29-APR-21
[03:39:36] 2 117663 274429179636 29-APR-21 274429738419 29-APR-21
[03:39:36] 2 117664 274429738419 29-APR-21 274430303149 29-APR-21
[03:39:36] 2 117665 274430303149 29-APR-21 274430820743 29-APR-21
[03:39:36] 2 117666 274430820743 29-APR-21 274431412189 29-APR-21
[03:39:36] 2 117667 274431412189 29-APR-21 274431657992 29-APR-21
[03:39:36] 2 117668 274431657992 29-APR-21 274432209955 29-APR-21
[03:39:36] 2 117669 274432209955 29-APR-21 274432716309 29-APR-21
[03:39:36] 2 117670 274432716309 29-APR-21 274433266510 29-APR-21
[03:39:36] 2 117671 274433266510 29-APR-21 274433819363 29-APR-21
[03:39:36] 2 117672 274433819363 29-APR-21 274434331885 29-APR-21
[03:39:36] 2 117673 274434331885 29-APR-21 274434887430 29-APR-21
[03:39:36] 2 117674 274434887430 29-APR-21 274435244645 29-APR-21
[03:39:36] 2 117675 274435244645 29-APR-21 274435247812 29-APR-21
[03:39:36] 2 117676 274435247812 29-APR-21 274435794757 29-APR-21
[03:39:36] 2 117677 274435794757 29-APR-21 274436301323 29-APR-21
[03:39:36] 2 117678 274436301323 29-APR-21 274436848204 29-APR-21
[03:39:36] 2 117679 274436848204 29-APR-21 274437402879 29-APR-21
[03:39:36] 2 117680 274437402879 29-APR-21 274437907572 29-APR-21
[03:39:36] 2 117681 274437907572 29-APR-21 274438477176 29-APR-21
[03:39:36] 2 117682 274438477176 29-APR-21 274438773392 29-APR-21
[03:39:36] 2 117683 274438773392 29-APR-21 274439272364 29-APR-21
[03:39:36] 2 117684 274439272364 29-APR-21 274439806032 29-APR-21
[03:39:36] 2 117685 274439806032 29-APR-21 274440313042 29-APR-21
[03:39:36] 2 117686 274440313042 29-APR-21 274440634811 29-APR-21
[03:39:36] 2 117687 274440634811 29-APR-21 274441171272 30-APR-21
[03:39:36] 2 117688 274441171272 30-APR-21 274441433393 30-APR-21
[03:39:36] 2 117689 274441433393 30-APR-21 274441488883 30-APR-21
[03:39:36] 2 117690 274441488883 30-APR-21 274441506037 30-APR-21
[03:39:36] 2 117691 274441506037 30-APR-21 274441523254 30-APR-21
[03:39:36] 2 117692 274441523254 30-APR-21 274441541299 30-APR-21
[03:39:36] 2 117693 274441541299 30-APR-21 274441560239 30-APR-21
[03:39:36]
[03:39:36]BS Key Type LV Size Device Type Elapsed Time Completion Time
[03:39:36]------- ---- -- ---------- ----------- ------------ ---------------
[03:39:36]34536 Full 36.83M DISK 00:00:02 30-APR-21
[03:39:36] BP Key: 34536 Status: AVAILABLE Compressed: NO Tag: TAG20210430T012322
[03:39:36] Piece Name: /home/oracle/nfs/ctl_SESDB_20210430_34827_1
[03:39:36] Control File Included: Ckp SCN: 274441877850 Ckp time: 30-APR-21
[03:39:36]
发现有缺失的归档,将备份集scp到备库,执行基于scn的恢复
cd /home/oracle/nfs
scp arch_SESDB_20210430_34825_1 10.1.11.160:/u01/backup
RMAN> catalog backuppiece '/u01/backup/arch_SESDB_20210430_34825_1';
由于缺失的是52565号归档文件,故将数据库恢复到该归档文件的最后一个scn,也就是 274435243252
RMAN> run{
set until scn 274435243252;
recover database;
}
恢复完成后,启动数据库
SQL> alter database open;
应用归档
SQL> alter database recover managed standby database using current logfile disconnect from session;
查询DG延时状态,发现延时一直增加
查看alert日志,发现报错,还是缺失归档,关闭备库后再启动到mount状态,重新执行基于SCN的恢复
alert日志中的报错如下
Fri Apr 30 04:03:19 2021
Managed Standby Recovery starting Real Time Apply
Only allocated 127 recovery slaves (requested 128)
Parallel Media Recovery started with 127 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Completed: alter database recover managed standby database using current logfile disconnect from session
Media Recovery Log +ARCH/sesadg/archivelog/2021_04_30/thread_1_seq_52566.2091.1071201073
Media Recovery Log +ARCH/sesadg/archivelog/2021_04_30/thread_2_seq_117674.1789.1071201703
Media Recovery Waiting for thread 2 sequence 117675
Fetching gap sequence in thread 2, gap sequence 117675-117675
Fri Apr 30 04:05:14 2021
FAL[client]: Failed to request gap sequence
GAP - thread 2 sequence 117675-117675
DBID 1476960055 branch 992196795
FAL[client]: All defined FAL servers have been attempted.
------------------------------------------------------------
Check that the CONTROL_FILE_RECORD_KEEP_TIME initialization
parameter is defined to a value that's sufficiently large
enough to maintain adequate log switch information to resolve
archivelog gaps.
------------------------------------------------------------
Fri Apr 30 04:09:57 2021
由于缺失的是 117675 归档日志,上面的备份集中有该归档,因此直接恢复即可
117675 对应的scn是 274435247812
RMAN> run{
set until scn 274435247812;
recover database;
}
恢复完成后,启动数据库
SQL> alter database open;
应用归档
SQL> alter database recover managed standby database using current logfile disconnect from session;
查询DG延时状态,发现延时在减少,持续监控,直至延时为0,然后在主库建表做测试,确保无误
04:13:38 SYS@sesadg(sesadg)> select name,value from v$dataguard_stats;
NAME VALUE
-------------------------------- ----------------------------------------------------------------
transport lag +00 00:00:00
apply lag +00 06:11:45
apply finish time
estimated startup time 19
04:14:46 SYS@sesadg(sesadg)> select name,value from v$dataguard_stats;
NAME VALUE
-------------------------------- ----------------------------------------------------------------
transport lag +00 00:00:00
apply lag +00 03:47:20
apply finish time
estimated startup time 19
04:14:47 SYS@sesadg(sesadg)> select name,value from v$dataguard_stats;
NAME VALUE
-------------------------------- ----------------------------------------------------------------
transport lag +00 00:00:00
apply lag +00 00:00:02
apply finish time
estimated startup time 19
04:31:50 SYS@sesadg(sesadg)> select name,value from v$dataguard_stats;
NAME VALUE
-------------------------------- ----------------------------------------------------------------
transport lag +00 00:00:00
apply lag +00 00:00:00
apply finish time
estimated startup time 19
总结:DG断档处理,根据不同的情况,有不同的处理方法。
1、如果是设置了fal_client何fal_server,并且缺失的日志在主库都还存在,则备库启动应用进程后,主库会自动将缺失的日志传输到备库应用
2、如果备库缺失的日志在主库已经删除,则在主库的备份中查找是否有对应日志的备份,有备份的话可以按照本文描述的方式做恢复
3、如果备库缺失的日志在主库已经删除,并且没有做备份,则需要根据缺失日志去情况,做基于SCN的备份恢复,后面另写一篇博客测试吧。
4、如果无法做基于SCN的恢复,则只能重新搭建DG了。