
In this Document
APPLIES TO:Oracle Database - Enterprise Edition - Version 10.2.0.1 to 11.2.0.4 [Release 10.2 to 11.2]Information in this document applies to any platform. SYMPTOMSStandby Redo Apply can terminate due to a failure of redo-data consistency checks, a problem called stuck recovery. Stuck recovery can occur when an underlying operating system or storage system loses a write issued by the Primary or Standby database during normal operation. Because there is an inconsistency between the information stored in the redo and the information stored in a database block being recovered, the database signals an internal error when applying the redo.
ORA-00600: internal error code, arguments: [3020], [2885689059], [1], [419819],[26750], [808], [], []
ORA-10567: Redo is inconsistent with data block (file# 1, block# 419819) ORA-10564: tablespace USER1 ORA-01110: data file ’/oracle/datafiles/user1.dbf’ CAUSEThe ORA-600 [3020] stuck recovery error could occur on the Standby database for several reasons including: a lost write on the Primary, a lost write on the Standby, missing redo, or logical corruption on the primary resulting in an incomplete redo chain.
CAUTION: If you have a RAC primary and you don’t have the Oracle RDBMS patch that has a fix for bug 11674485, the “lost write on the primary” detected at the Standby maybe a “false positive”. Please follow the steps described in Section “Steps to recover from bug 11674485” first to make this determination before you proceed any further.
ORA-752: recovery detected a lost write of a data block
This ORA-752 error indicates a lost write occurred on the Primary database. Oracle strongly recommends enabling DB_LOST_WRITE_PROTECT (and DB_BLOCK_CHECKSUM=FULL) for greater detection and protection from lost writes. Studies have shown the impact on the primary database is negligible. SOLUTIONIn the majority of cases, Standby stuck recovery errors indicate a corruption of the Primary database. No errors may have been reported on the Primary.
WARNING: Do not repair the Standby by restoring a backup taken on the Primary, as that will ensure that the Standby is also corrupt! The only exception is when the Standby is known to have a lost write, but this determination should be made by Oracle Support.
An ORA-752 error definitively identifies a lost write on the Primary. Consider failing over to the Standby immediately if data integrity is critical and some data loss is acceptable. Oracle Support should also be engaged immediately when an ORA-600 [3020] error occurs by opening a Service Request via My Oracle Support. When media recovery encounters a problem, the alert log may indicate that recovery can continue if it is allowed to corrupt the data block causing the problem. The alert log contains information about the block: its block type, block address, the tablespace it belongs to, and so forth. For blocks containing user data, the alert log may also report the data object number. Determining the Extent of CorruptionTo determine if the corruption is isolated run a diagnostic trial recovery, this scans the redo for problems but does not actually make any changes to the recovered database. Trial recovery reports any additional corruptions in the alert_<SID>.log. You can use the RECOVER ... TEST statement to invoke trial recovery. Refer to Document 283262.1 for additional details on trial recovery. Determining Root CauseInformation that needs to be gathered and sent to Oracle Support immediately:
What actions can be taken when an ORA-752 lost write error is signalled?Option 1: Determine if affected objects can be recreated and recovery allowed to continue:
Option 2: Activate the standby database If the affected objects cannot be recreated then activate the standby database. By activating the standby database you will be experiencing data loss but data integrity will be ensured.
If the standby database is managed by the Data Guard broker, issue the following Data Guard broker command to perform an immediate failover to the standby database:
DGMGRL> FAILOVER TO database-name IMMEDIATE;
Note: Under certain situations, "alter database activate standby database" command can fail on a physical standby database because of inability to archive the final partial standby redo logs. To work around the failure to archive the partial archived redo logs, please perform the following steps: 1) Shutdown & Mount one instance of the physical standby database 2. Back up the new primary. Performing a backup immediately is a necessary safety measure, because you cannot recover changes made after the failover without a complete backup copy of the database. As a result of the failover, the original primary database can no longer participate in the Data Guard configuration, and all other standby databases will now receive and apply redo data from the new primary database. 3. Open the new primary database. Be aware that a physical standby created using the backup taken from the new primary will have the same datafiles as the old standby. Therefore, any undetected lost writes that the old standby had before it was activated will not be detected by the new standby, since the new standby will be comparing the same blocks. Any new lost writes that happen on either the primary or the standby will be detected. What action can be taken when an ORA-600 [3020] is signalled?Engage Oracle Support immediately when an ORA-600 [3020] error occurs. Be prepared to supply the information listed under the “Determining the Root Cause” section when opening a Service Request via My Oracle Support. Protecting Against Lost WritesProtect against lost write by setting DB_LOST_WRITE_PROTECT to TYPICAL on primary and standby databases. By doing so the physical standby Redo Apply process will compare the block SCN on the standby to the block SCN stored in the primary redo stream (when the block was read) to decide whether there is a lost write on the primary. Redo Apply is able to do this using the additional information logged at the primary when DB_LOST_WRITE_PROTECT is enabled. If the block SCN on the primary database is lower than on the standby database, then it detects a lost write on the primary database and throws an external error (ORA-752). Note that DB_LOST_WRITE_PROTECT is only available in Oracle 11g and later. Steps to recover from bug 11674485These steps in this section will help you determine if you have hit the bug 11674485 and explain the subsequent workaround necessary to repair the standby database. If you detect you have hit bug 11674485 (see below for details), the primary has not lost a write nor does it have a corruption. The only workaround necessary is to recover the datafile at the standby that has been marked with a corrupt block by managed standby recovery when the false positive was triggered. Detection – how do we know “Lost write at the Primary” detected at the standby is a false positive?When standby recovery detects that a lost write that happened on the primary database, standby recovery signals ORA-752 and stops. Below is an example output from the standby alert log when this happens:
Hex dump of (file 786, block 657290) in trace file /diag/rdbms/gmfcdwp_lvt/gmfcdwp8/trace/gmfcdwp8_pr0m_28759.trc
STANDBY REDO APPLICATION HAS DETECTED THAT THE PRIMARY DATABASE LOST A DISK WRITE OF BLOCK 657290, FILE 786 NO REDO AT OR AFTER SCN 10753662227200 CAN BE USED FOR RECOVERY. Tue Apr 26 22:21:08 2011 Slave exiting with ORA-752 exception Errors in file /diag/rdbms/gmfcdwp_lvt/gmfcdwp8/trace/gmfcdwp8_pr0m_28759.trc: ORA-00752: recovery detected a lost write of a data block ORA-10567: Redo is inconsistent with data block (file# 786, block# 657290, file offset is 1089552384 bytes) <ORA-10564: tablespace ODS_DATA ORA-01110: data file 786: '+DATA/gmfcdwp/datafile/ods_data_306.dbf' ORA-10561: block type 'TRANSACTION MANAGED INDEX BLOCK', data object#4718509 Tue Apr 26 22:21:09 2011 Recovery Slave PR0M previously exited with exception 752
If the primary is RAC, then it is possible that this incident is caused by bug 11674485. When bug 11674485 is encountered, standby recovery declares that it found a primary lost write even though there was no lost write or any other corruption at the primary.
If primary is not RAC, then bug 11674485 cannot occur.
In order to determine if a particular incident of ORA-752 was caused by bug 11674485, you need to look into the trace file of the recovery process that detected the lost write. In the case of example above, we need to look into trace file gmfcdwp8_pr0m_28759.trc.
The trace file should contain the following message:
STANDBY REDO APPLICATION HAS DETECTED THAT THE PRIMARY DATABASE
Immediately following the above message is information of the redo change vector header that triggered standby recovery to declare primary lost write. For example:
STANDBY REDO APPLICATION HAS DETECTED THAT THE PRIMARY DATABASE
LOST A DISK WRITE OF BLOCK 657290, FILE 786 The block read on the primary had SCN 10753662227193 (0x09c7.c83792f9) seq 1 (0x01) while expected to have SCN 10753662227199 (0x09c7.c83792ff) seq 1 (0x01) The block was read at SCN 10753662227200 (0x09c7.c8379300), BRR: CHANGE #1 TYP:0 CLS:32772 AFN:786 DBA:0x95ca078a OBJ:4718509 SCN:0x09c7.c83792f9 SEQ:1 OP:23.2 ENC:0 RBL:1
The lines starting with “The block was read at SCN…” and the immediate next line starting with “CHANGE #1 TYP:0 CLS:…” are important for the analysis below, so please make a note of these lines.
The same trace file will contain the redo dump of all relevant redo for the particular block (in the case of the example above, that block is file# 786, block# 657290). The particular redo record can be found as follows:
Below is an example of redo dump that matches the two criteria described above:
REDO RECORD - Thread:8 RBA: 0x000058.000a60b9.0038 LEN: 0x0034 VLD: 0x10
SCN: 0x09c7.c8379300 SUBSCN: 1 04/26/2011 21:59:30 CHANGE #1 TYP:0 CLS:32772 AFN:786 DBA:0x95ca078a OBJ:4718509 SCN:0x09c7.c83792f9 SEQ:1 OP:23.2 ENC:0 RBL:1 Block Read - afn: 786 rdba: 0x95ca078a BFT:(1024,2513045386) non-BFT: 599,657290) scn: 0x09c7.c83792f9 seq: 0x01 flags: 0x00008004 ( ckval ping )
The lost write detection is caused by bug 11674485 if both the following two conditions are met:
Workaround for Bug 11674485
When bug 11674485 has been encountered managed standby recovery will marked one of the blocks in a datafile as corrupt. The name and number of the of the datafile is easily determined from the standby alert log line below:
Reading datafile '+DATA/gmfcdwp/datafile/ods_data_306.dbf' for corruption at rdba: 0x95ca078a (file 786, block 657290)
How to Copy ASM datafile From Primary Database to Standby Database on ASM using RMAN (Doc ID 605234.1) REFERENCESNOTE:283262.1 - Trial Recovery NOTE:314422.1 - Remote Diagnostic Agent (RDA) - Getting Started BUG:11689702 - ORA-600 [3020] DURING RECOVERY AFTER DATAFILE RESIZE NOTE:1302614.1 - Rman or User Managed Restore/Recovery Fails With Ora-600 [3020] if Datafile resize Operation was Performed. | ![]() |
![]()
|