Block Corrupt

彦祖的小号

于 2021-08-04 09:46:16 发布

阅读量506

点赞数

分类专栏： Oracle

本文链接：https://blog.csdn.net/linsuhangoracle/article/details/119377121

版权

Oracle 专栏收录该内容

193 篇文章 0 订阅

订阅专栏

Oracle有不同方法来监测及修复data block corruption，常用的修复方法

重建对象
对于表或索引它们自身包含数据，重建不能恢复数据，可以使用DBMS_REPAIR修复

DBMS_REPAIR只是把table及index的corruption blocks标识后忽略，在下次查询或操作时不会对corruption blocks操作，不会对corruption block进行数据恢复，所以一般用于对块间的逻辑坏块进行恢复

RMAN的RECOVER BLOCK来恢复

Oracle Database provides different methods for detecting and correcting data block corruption. One method of correction is to drop and re-create an object after the corruption is detected. However, this is not always possible or desirable. If data block corruption is limited to a subset of rows, then another option is to rebuild the table by selecting all data except for the corrupt rows.

Another way to manage data block corruption is to use the DBMS_REPAIR package. You can use DBMS_REPAIR to detect and repair corrupt blocks in tables and indexes. You can continue to use objects while you attempt to rebuild or repair them.

You can also use the Recovery Manager (RMAN) command RECOVER BLOCK to recover a corrupt data block or set of data blocks.

PARAMETERS OF CHECKING CORRUPTS

Note: 不论DB_BLOCK_CHECKING与DB_BLOCK_CHECKSUM如何设置，Oracle均为SYSTEM表空间进行block checking与checksum计算

DB_BLOCK_CHECKING

DB_BLOCK_CHECKING指在块内容更改时是否进行basic block header check与semantic checks，主要用于保证内存中数据的逻辑一致

Oracle checks a block by going through the data in the block, making sure it is logically self-consistent. Block checking can often prevent memory and data corruption.

Block checking typically causes 1% to 10% overhead in most applications, depending on workload and the parameter value. Specific DML overhead may be higher. The more updates or inserts in a workload, the more expensive it is to turn on block checking.

You should set DB_BLOCK_CHECKING to FULL if the performance overhead is acceptable.

DB_BLOCK_CHECKING = { FALSE(default) | OFF | LOW | MEDIUM | TRUE | FULL }

OFF or FALSE: No block checking is performed for blocks in user tablespaces. However, semantic block checking for SYSTEM tablespace blocks is always turned on.

LOW: Basic block header checks are performed after block contents change in memory (for example, after UPDATE, INSERT or DELETE statements, or after inter-instance block transfers in Oracle RAC).

MEDIUM: All LOW checks and full semantic checks are performed for all objects except indexes (whose contents can be reconstructed by a drop+rebuild on encountering a corruption).

FULL or TRUE: All LOW and MEDIUM checks and full semantic checks are performed for all objects.

For backward compatibility, the use of FALSE (implying OFF) and TRUE (implying FULL) is preserved

Note: If block checking is enabled for a CDB, then you cannot subsequently disable block checking in any of its PDBs. That is, if the value of DB_BLOCK_CHECKING in a CDB is LOW, MEDIUM, TRUE, or FULL, and you then attempt to set the value of DB_BLOCK_CHECKING in one of its PDBs to FALSE or OFF, an error will occur.

DB_BLOCK_CHECKSUM

DB_BLOCK_CHECKSUM 用于写入时是否计算块的checksum，在读入块时会根据块内数据重新计算checksum进行对比，用于能监测出存储块是否有corruption。如果设置为FULL还会对内存块进行checksum计算，防止内存block corruption写入磁盘

DB_BLOCK_CHECKSUM determines whether DBWn and the direct loader will calculate a checksum (a number calculated from all the bytes stored in the block) and store it in the cache header of every data block when writing it to disk.

Checksums allow Oracle to detect corruption caused by underlying disks, storage systems, or I/O systems. If set to FULL, DB_BLOCK_CHECKSUM also catches in-memory corruptions and stops them from making it to the disk.

Turning on this feature in TYPICAL mode causes only an additional 1% to 2% overhead. In the FULL mode it causes 4% to 5% overhead.

Oracle recommends that you set DB_BLOCK_CHECKSUM to TYPICAL.

DB_BLOCK_CHECKSUM = { OFF | FALSE | TYPICAL(default) | TRUE | FULL }

Checksums are verified when a block is read - only if this parameter is TYPICAL or FULL and the last write of the block stored a checksum.

In FULL mode, Oracle also verifies the checksum before a change application from update/delete statements and recomputes it after the change is applied. In addition, Oracle gives every log block a checksum before writing it to the current log.

Most of the log block checksum is done by the generating foreground processes, while the LGWR or the LGWR slave processes (LGnn processes) perform the rest of the work, for better CPU and cache efficiency.

If this parameter is set to OFF, DBWn calculates checksums only for the SYSTEM tablespace, but not for user tablespaces. In addition, no log checksum is performed when this parameter is set to OFF.

For backward compatibility the use of TRUE (implying TYPICAL) and FALSE (implying OFF) values is preserved.

DB_LOST_WRITE_PROTECT

DB_LOST_WRITE_PROTECT通过在redo log中记录表空间缓冲区缓存的读取，来保证数据未写入存储时可追回

DB_LOST_WRITE_PROTECT enables or disables lost write detection.

A data block lost write occurs when an I/O subsystem acknowledges the completion of the block write, while in fact the write did not occur in the persistent storage.

DB_LOST_WRITE_PROTECT = { NONE | TYPICAL | FULL }

When the parameter is set to TYPICAL on the primary database, the instance logs buffer cache reads for read/write tablespaces in the redo log, which is necessary for detection of lost writes.

When the parameter is set to FULL on the primary database, the instance logs reads for read-only tablespaces and read/write tablespaces.

When the parameter is set to TYPICAL or FULL on the standby database or on the primary database during media recovery, the instance performs lost write detection.

When the parameter is set to NONE on either the primary database or the standby database, no lost write detection functionality is enabled.

DB_ULTRA_SAFE

DB_ULTRA_SAFE sets the default values for other parameters that control protection levels.

DB_ULTRA_SAFE = { OFF(default) | DATA_ONLY | DATA_AND_INDEX }

OFF: 不启用，此为默认值，不会影响DB_BLOCK_CHECKING, DB_BLOCK_CHECKSUM及 DB_LOST_WRITE_PROTECT的设置

DATA_ONLY：会将DB_BLOCK_CHECKING设置为MEDIUM；DB_LOST_WRITE_PROTECT 设置为 TYPICAL；DB_BLOCK_CHECKSUM 设置为FULL.

DATA_AND_INDEX：会将DB_BLOCK_CHECKING设置为FULL；DB_LOST_WRITE_PROTECT设置为TYPICAL；DB_BLOCK_CHECKSUM设置为FULL.

Basic Concepts of RMAN Validation

Checksums and Corrupt Blocks

Block corruptions can be caused by several different failures including, but not limited to the following:

Faulty disks and disk controllers

Faulty memory

Oracle Database software defects

Oracle在把redo log及data file的blocks写入磁盘时会先计算出checksum并写入block header.下次从磁盘读取blocks时会重新计算checksum，如果两次结果不同则block corrupt

DB_BLOCK_CHECKSUM is a database initialization parameter that controls the writing of checksums for the blocks in data files and online redo log files in the database (not backups). If DB_BLOCK_CHECKSUM is typical, then the database computes a checksum for each block during normal operations and stores it in the header of the block before writing it to disk.

When the database reads the block from disk later, it recomputes the checksum and compares it to the stored value. If the values do not match, then the block is corrupt.

BACKUP默认会为产出备份blocks计算checksum，它不受DB_BLOCK_CHECKSUM参数影响，但注意RESTORE备份的写入操作是否记录checksum受DB_BLOCK_CHECKSUM影响

如果指定NOCHECKSUM选项则在备份时不计算checksum

By default, the BACKUP command computes a checksum for each block and stores it in the backup. The BACKUP command ignores the values of DB_BLOCK_CHECKSUM because this initialization parameter applies to data files in the database, not backups.

When restoring a backup datafile, RMAN honors the DB_BLOCK_CHECKSUM initialization parameter setting. RMAN clears the checksum if DB_BLOCK_CHECKSUM is set to false. If set to typical, then RMAN verifies the checksum when restoring from the backup and writing to the datafile.

If you specify the NOCHECKSUM option, then RMAN does not perform a checksum of the blocks when creating the backup.

Physical and Logical Block Corruption

物理坏块指block完全认不出来了，逻辑坏块指块中某记录或索引记录不一致

In a physical corruption, which is also called a media corruption, the database does not recognize the block at all: the checksum is invalid, the block contains all zeros, or the header and footer of the block do not match.

In a logical corruption, the contents of the block are logically inconsistent. Examples of logical corruption include corruption of a row piece or index entry. If RMAN detects logical corruption, then it logs the block in the alert log and server session trace file.

RMAN默认不检查逻辑坏块，除非使用CHECK LOGICAL选项

By default, RMAN does not check for logical corruption. If you specify CHECK LOGICAL on the BACKUP command, however, then RMAN tests data and index blocks for logical corruption, such as corruption of a row piece or index entry, and log them in the alert log located in the Automatic Diagnostic Repository (ADR).

If you use RMAN with the following configuration when backing up or restoring files, then it detects all types of block corruption that are possible to detect:

In the initialization parameter file of a database, set DB_BLOCK_CHECKSUM=typical so that the database calculates data file checksums automatically (not for backups, but for data files in use by the database)
Do not precede the BACKUP command with SET MAXCORRUPT so that RMAN does not tolerate any unmarked block corruptions.

关于BACKUP使用SET MAXCORRUPT选项：

指在备份时可允许出现的坏块数，默认为0即出现坏块中止BACKUP

You can use the SET MAXCORRUPT command to set the total number of unmarked corruptions permitted in a file for RMAN backups. The default is zero, meaning that RMAN does not tolerate unmarked corrupt blocks of any kind.

If the total number of physical and logical corruptions detected in a file is less than its SET MAXCORRUPT setting, then the RMAN command completes and the database populates the V$DATABASE_BLOCK_CORRUPTION view with corrupt block ranges. If MAXCORRUPT is exceeded, then the command terminates without populating the views.

RMAN writes the newly detected corrupt block to the backup with a special header indicating that the block is marked corrupt. Because RMAN allows marked corrupt blocks in a backup, it is possible to restore a data file that has several blocks marked as corrupt. If you backup this restored data file (assuming no new corruptions have happened), even without MAXCORRUPT setting, the backup succeeds. This is because the previously marked corruptions do not stop RMAN from completing the backup.

In a BACKUP command, do not specify the NOCHECKSUM option so that RMAN calculates a checksum when writing backups. Physical corruption checking is enabled by default.
In BACKUP and RESTORE commands, specify the CHECK LOGICAL option so that RMAN checks for logical and physical corruption.Logical corruption checking is disabled by default.

Intrablock Corruption and Interblock Corruption

Block corruption分为块间与块内corruption，它们会用到不同的检查、修复及监控技术

块内corruption可能为物理或逻辑坏块，而块间corruption只能是逻辑坏块

Oracle Database supports different techniques for detecting, repairing, and monitoring block corruption. The technique depends on whether the corruption is interblock corruption or intrablock corruption.

In intrablock corruption, the corruption occurs within the block itself. This corruption can be either physical or logical. In an interblock corruption, the corruption occurs between blocks and can only be logical.

Intrablock Corruption: 所有Oracle用于检测坏块的方法都能检测到，会被记录于V$DATABASE_BLOCK_CORRUPTION视图以及ADR，通过block media recovery、常规还原恢复或block newing来修复，这些方法均不能用于修复块间corruption

Interblock Corruption: 只能通过DBVERIFY及ANALYZE语句检测，会记录于ADR，只能通过DROP对象或重建索引修复

Table 16-1 Detection, Repair, and Monitoring of Block Corruption

Response	Intrablock Corruption	Interblock Corruption
Detection	All database utilities detect intrablock corruption, including RMAN (for example, the BACKUP command) and the DBVERIFY utility. If a database process can encounter the ORA-1578 error, then it can detect the corruption and monitor it.	Only DBVERIFY and the ANALYZE statement detect interblock corruption.
Tracking	The V$DATABASE_BLOCK_CORRUPTION view displays blocks marked corrupt by Oracle Database components such as RMAN commands, ANALYZE, SQL queries, and so on. Any process that encounters an intrablock corruption records the block corruption in this view and in ADR.	The database monitors this type of block corruption in ADR.
Repair	Repair techniques include block media recovery, restoring data files, recovering with incremental backups, and block newing. Block media recovery can repair physical corruptions, but not logical corruptions. Any RMAN command that fixes or detects that a block is repaired updates V$DATABASE_BLOCK_CORRUPTION. For example, RMAN updates the repository at end of successful block media recovery. If a BACKUP, RESTORE, or VALIDATE command detects that a block is no longer corrupted, then it removes the repaired block from the view.	You must fix interblock corruption using manual techniques such as dropping an object, rebuilding an index, and so on.

Checking for Block Corruption Using RMAN Command

以下三个命令均只检测物理块坏，如果检测逻辑需加CHECK LOGCIAL

Checking for Block Corruption with the VALIDATE Command

Validate可以认为是backup validate与restore validate的综合，可以检测更多类型文件。它们均可检测坏块以及整个文件是否缺失

To validate all data files/control files/spfile, but not validate archivelog and backups:

RMAN> VALIDATE DATABASE;

Others:

VALIDATE DATAFILE 1 BLOCK 10;

VALIDATE BACKUPSET 22;

VALIDATE TABLESPACE USERS;

VALIDATE DATAFILE 1;

VALIDATE CURRENT CONTROLFILE;

VALIDATE SPFILE;

VALIDATE ARCHIVELOG ALL;

VALIDATE ARCHIELOG SEQUENCE 1;

To validate a large Data File in Parallel(与backup的section使用相同):

RUN

{ ALLOCATE CHANNEL c1 DEVICE TYPE DISK;

ALLOCATE CHANNEL c2 DEVICE TYPE DISK;

VALIDATE DATAFILE 1 SECTION SIZE 1200M;}

Validating Database Files with BACKUP VALIDATE

BACKUP VALIDATE不进行实际备份不会产生备份集或镜像备份，它的原理是是模拟备份过程以检查文件是否可备份，它并不能检测BACKUPSET坏块

When you run BACKUP VALIDATE, RMAN reads the files to be backed up in their entirety, as it does during a real backup. RMAN does not, however, actually produce any backup sets or image copies.

You cannot use the BACKUPSET, MAXCORRUPT, or PROXY parameters with BACKUP VALIDATE. To validate specific backup sets, run the VALIDATE command.

To validate that all database files and archived logs can be backed up by running a command as shown in the following example. This command checks for physical corruptions only.

BACKUP VALIDATE DATABASE ARCHIVELOG ALL;

OTHERS:

BACKUP VALIDATE TABLESPACE USERS;

BACKUP VALIDATE DATAFILE 1;

BACKUP VALIDATE CURRENT CONTROLFILE;

BACKUP VALIDATE SPFILE;

BACKUP VALIDATE ARCHIVELOG SEQUENCE 1;

To check for logical corruptions in addition to physical corruptions, run the following variation of the preceding command:

BACKUP VALIDATE CHECK LOGICAL DATABASE ARCHIVELOG ALL;

Note that corruptions are not tolerated in the control file and archived redo log backups.

Validating Backups Before Restoring Them

RESTORE VALIDATE原理不同于BACKUP VALIDATE，它是模拟还原过程看备份是否可还原。只能在数据库为OPEN或MOUNT执行

validate与backup validate对数据库文件校验需要设置初始参数db_block_checksum，但validate与restore validate对备份校验不需要设置此参数

You can run RESTORE...VALIDATE to test whether RMAN can restore a specific file or set of files from a backup. RMAN chooses which backups to use.

The database must be mounted or open for this command. You do not have to take data files offline when validating the restore of data files, because validation of backups of the data files only reads the backups and does not affect the production data files.

When validating files on disk or tape, RMAN reads all blocks in the backup piece or image copy. RMAN also validates offsite backups. The validation is identical to a real restore operation except that RMAN does not write output files.

RMAN also allows to specify a copy number for the backup pieces being validated.

Note: As an additional test measure, you can perform a trial recovery with the RECOVER ... TEST command. A trial recovery applies redo in a way similar to normal recovery, but it is in memory only and it rolls back its changes after the trial.

Examples:

RESTORE DATABASE VALIDATE;

RESTORE ARCHIVELOG ALL VALIDATE;

RESTORE VALIDATE ARCHIVELOG SEQUENCE 1;

RESTORE VALIDATE ARCHIVELOG FROM TIME 'SYSDATE-1';

RESTORE VALIDATE TABLESPACE USERS;

RESTORE VALIDATE DATAFILE 1;

RESTORE VALIDATE CONTROLFILE;

RESTORE VALIDATE SPFILE;

注：RESTORE VALIDATE/VALIDATE检测出corruption blocks不会记录到V$BACKUP_CORRUPTION中，此视图中只是在备份时检测出坏块时才记录(如备份时设置了MAXCORRUPT选项)，所以这个视图不常用，另外控制文件与归档备份时不允许有坏块

v$backup_corruption displays information about corrupt block ranges in datafile backups from the control file. Note that corruptions are not tolerated in the control file and archived redo log backups.

Validating CDBs and PDBs

RMAN enables you to validate multitenant container databases (CDBs) and pluggable databases (PDBs) using the VALIDATE command.

Validating a Whole CDB

The steps to validate a CDB are similar to the ones used to validate a non-CDB.

The only difference is that you must connect to the root as a common user with the common SYSBACKUP or SYSDBA privilege. Then, use the VALIDATE DATABASE and RESTORE DATABASE VALIDATE commands.

The following command, when connected to the root, validates the whole CDB:

VALIDATE DATABASE;

The following command validates the root:

VALIDATE DATABASE ROOT;

Validating PDBs

Use one of the following techniques to validate PDBs:

Connect to the root and use the VALIDATE PLUGGABLE DATABASE or RESTORE PLUGGABLE DATABASE VALIDATE command. This enables you to validate one or more PDBs.

VALIDATE PLUGGABLE DATABASE hr_pdb, sales_pdb;

Connect to the PDB and use the VALIDATE DATABASE and RESTORE DATABASE VALIDATE commands to validate only one PDB. The commands used here are the same commands that you would use for a non-CDB.

RESTORE DATABASE VALIDATE;

Performing Block Media Recovery

Block media recovery provides the following advantages over data file media recovery:

Lowers the mean time to recover (MTTR) because only blocks needing recovery are restored and recovered
Enables affected data files to remain online during recovery

Block Media Recovery Of Data Guard

DG如果主库出现坏块会优先在备库找相应block自动修复，其次是从flashback log中找，最后是从full或0级增量备份中找blocks

If the database on which the corruption occurs is associated with a real-time query physical standby database, then the database automatically attempts to perform block media recovery. The primary database searches for good copies of blocks on the standby database and, if they are found, repairs the blocks with no impact to the query that encountered the corrupt block. The Oracle Database physical block corruption message (ORA-1578) is displayed only if the database cannot repair the corruption.

Whenever block corruption has been automatically detected, you can perform block media recovery manually with the RECOVER ... BLOCK command. By default, RMAN first searches for good blocks in the real-time query physical standby database, then flashback logs and then blocks in full or level 0 incremental backups.

Note: For block media recovery to work automatically, the physical standby database must be in real-time query mode. An Oracle Active Data Guard license is required.

如果是从备库中发现坏块，会从主库找相应blocks修复

If a corrupt data block is discovered on a real-time query physical standby database, the server attempts to repair the corruption by obtaining a copy of the block from the primary database. The repair is performed in the background, enabling subsequent queries to succeed if the repair is successful.

如果备库中配置了以下参数会尝试自动修复坏块

Automatic block repair is attempted if the following database initialization parameters are configured on the standby database as described:

The LOG_ARCHIVE_CONFIG parameter is configured with a DG_CONFIG list and a LOG_ARCHIVE_DEST_n parameter is configured for the primary database with the DB_UNIQUE_NAME attribute

The FAL_SERVER parameter is configured and its value contains an Oracle Net service name for the primary database

Note: If a corrupt block is detected during validation, such as by the RMAN VALIDATE command, then recovery is not initiated automatically.

About Missing Redo During Block Recovery

恢复使用的redo log有gaps，则仍可能完成块恢复，但肯定无法进行数据文件恢复

与数据文件恢复相同，block media recovery在恢复时会自动restore所需归档的备份，如果有丢失或不可访问的归档文件，则它们均无法进行恢复

Like data file media recovery, block media recovery cannot generally survive a missing or inaccessible archived log, although it attempts restore failover when looking for usable copies of archived redo log files.

如果归档文件是因为物理坏块导致checksum失败则block media recovery将无法恢复

但如果归档文件只是部分redo record丢失或逻辑corrupt，则可能不会影响block media recovery，但此时无法进行数据文件恢复

Also, block media recovery cannot survive physical redo corruptions that result in checksum failure. However, block media recovery can survive gaps in the redo stream if the missing or corrupt redo records do not affect the blocks being recovered. Whereas data file recovery requires an unbroken series of redo changes from the beginning of recovery to the end, block media recovery only requires an unbroken set of redo changes for the blocks being recovered.

Note: Each block is recovered independently during block media recovery, so recovery may be successful for a subset of blocks.

之所以可能不影响的原因在于在损坏的redo record后可能block被重建

When RMAN first detects missing or corrupt redo records during block media recovery, it does not immediately signal an error because the block undergoing recovery may create one later in the redo stream.

When a block is re-created, all previous redo for that block becomes irrelevant because the redo applies to an old incarnation of the block. For example, the database creates a new a block when users drop or truncate a table and then use the block for other data.

Assume that media recovery is performed on block 13 as depicted in Figure 19-1.

Figure 19-1 Performing RMAN Media Recovery

After block recovery begins, RMAN discovers that change 120 is missing from the redo stream, either because the log block is corrupt or because the log cannot be found. RMAN continues recovery if block 13 is re-created later in the redo stream. Assume that in change 140 a user drops the table employees stored in block 13, allocates a new table in this block, and inserts data into the new table. At this point, the database formats block 13 as a new block. Recovery can now proceed with this block even though some redo preceding the recreation operation was missing.

Prerequisites for Block Media Recovery

The following prerequisites apply to the RECOVER ... BLOCK command:

The target database must run in ARCHIVELOG mode and be open or mounted with a current control file.
If the target database is a standby database, then it must be in a consistent state, recovery cannot be in session, and the backup must be older than the corrupted file.
The backups of the data files containing the corrupt blocks must be full or level 0 backups. They cannot be proxy copies or incremental backups.
If only proxy copy backups exist, then you can restore them to a nondefault location on disk, in which case RMAN considers them data file copies and searches them for blocks during block media recovery.
RMAN can use only archived redo logs for the recovery.
RMAN cannot use level 1 incremental backups. Block media recovery cannot survive a missing or inaccessible archived redo log, although it can sometimes survive missing redo records.
Flashback Database must be enabled on the target database for RMAN to search the flashback logs for good copies of corrupt blocks.
If flashback logging is enabled and contains older, uncorrupted versions of the corrupt blocks, then RMAN can use these blocks, possibly speeding up the recovery.
The target database must be associated with a real-time query physical standby database for RMAN to search the database for good copies of corrupt blocks.

Recovering Individual Blocks

Recovering Individual Blocks Using the RECOVER...BLOCK Command

To recover specific data blocks using the RECOVER...BLOCK command:

Obtain the data file numbers and block numbers of the corrupted blocks.
Start RMAN and connect to the target database, which must be mounted or open.
Run the SHOW ALL command to confirm that the appropriate channels are preconfigured.
Run the RECOVER ... BLOCK command at the RMAN prompt, specifying the file and block numbers for the corrupted blocks.

RECOVER DATAFILE 8 BLOCK 13 DATAFILE 2 BLOCK 19;

You can also specify various options to control RMAN behavior. The following example indicates that only backups with the tag mondayam are used when searching for blocks. You could use the FROM BACKUPSET option to restrict the type of backup that RMAN searches, or the EXCLUDE FLASHBACK LOG option to restrict RMAN from searching the flashback logs.

RECOVER DATAFILE 8 BLOCK 13 DATAFILE 2 BLOCK 199 FROM TAG mondayam;

Recovering Individual Blocks Using the Data Recovery Advisor

To generate automated repair options and repair the failure using the Data Recovery Advisor:

Start RMAN and connect to the target database.
List the failures recorded by the Data Recovery Advisor using the following command:

LIST FAILURE;

Generate repair options for the failure listed Step 2.

ADVISE FAILURE;

Perform the automated repairs recommended by Data Recovery Advisor.

REPAIR FAILURE;

REPAIR FAILURE USING ADVISE OPTION 2;

Recovering All Blocks in V$DATABASE_BLOCK_CORRUPTION

To recover all blocks logged in V$DATABASE_BLOCK_CORRUPTION:

Start SQL*Plus and connect to the target database.
Query V$DATABASE_BLOCK_CORRUPTION to determine whether corrupt blocks exist.

SQL> SELECT * FROM V$DATABASE_BLOCK_CORRUPTION;

Start RMAN and connect to the target database.
Recover all blocks marked corrupt in V$DATABASE_BLOCK_CORRUPTION.

RMAN> RECOVER CORRUPTION LIST;

After the blocks are recovered, the database removes them from V$DATABASE_BLOCK_CORRUPTION.

DBMS_REPAIR Package

DBMS_REPAIR只是把table及index的corruption blocks标识后忽略，在下次查询或操作时不会对corruption blocks操作，不会对corruption block进行数据恢复

Limitations and Restrictions for DBMS_REPAIR Procedures

DBMS_REPAIR procedures have the following limitations:

Tables with LOB data types, nested tables, and varrays are supported, but the out-of-line columns are ignored.
Clusters are supported in the SKIP_CORRUPT_BLOCKS and REBUILD_FREELISTS procedures, but not in the CHECK_OBJECT procedure.
Index-organized tables and LOB indexes are not supported.
Global temporary tables are not supported.
The DUMP_ORPHAN_KEYS procedure does not operate on bitmap indexes or function-based indexes.
The DUMP_ORPHAN_KEYS procedure processes keys that are no more than 3,950 bytes long.

Using the DBMS_REPAIR Package

使用DBMS_REPAIR前要先创建repair table与orphan keys table

The DBA would create the repair and orphan keys tables once. Subsequent executions of the CHECK_OBJECT Procedure would add rows into the appropriate table indicating the types of errors found.

Repair Table

A repair table provides information about the corruptions that were found by the CHECK_OBJECT procedure and how these will be addressed if the FIX_CORRUPT_BLOCKS procedure is run. Further, it is used to drive the execution of the FIX_CORRUPT_BLOCKS procedure.

Orphan Key Table

Orphan Key Table provides information about index entries that point to corrupt rows. An orphan key table is used when the DUMP_ORPHAN_KEYS procedure is executed and it discovers index entries that point to corrupt rows. The DUMP_ORPHAN_KEYS procedure populates the orphan key table by logging its activity and providing the index information in a usable manner.

通过ADMIN_TABLES过程管理repair table与orphan key table

ADMIN_TABLES procedure is used to create, purge, or drop a repair table or an orphan key table.

2.1 Detect and Report Corruptions

There are several ways to detect corruptions.

Detection Method	Description
DBMS_REPAIR PL/SQL package	Performs block checking for a specified table, partition, or index. It populates a repair table with results.
DB_VERIFY utility	Performs block checking on an offline database
ANALYZE TABLE SQL statement	Used with the VALIDATE STRUCTURE option, the ANALYZE TABLE statement verifies the integrity of the structure of an index, table, or cluster; checks or verifies that tables and indexes are synchronized.
DB_BLOCK_CHECKING initialization parameter	When DB_BLOCK_CHECKING=TRUE, corrupt blocks are identified before they are marked corrupt. Checks are performed when changes are made to a block.

DBMS_REPAIR: Using the CHECK_OBJECT and ADMIN_TABLES Procedures

The CHECK_OBJECT procedure checks and reports block corruptions for a specified object.

The CHECK_OBJECT procedure is similar to the ANALYZE...VALIDATE STRUCTURE statement for indexes and tables, block checking is performed for index and data blocks.

Not only does CHECK_OBJECT report corruptions, but it also identifies any fixes that would occur if FIX_CORRUPT_BLOCKS is subsequently run on the object. This information is made available by populating a repair table, which must first be created by the ADMIN_TABLES procedure.

After you run the CHECK_OBJECT procedure, a simple query on the repair table shows the corruptions and repair directives for the object. With this information, you can assess how best to address the reported problems.

2.2 Evaluate the Costs and Benefits of Using DBMS_REPAIR

Before using DBMS_REPAIR you must weigh the benefits of its use in relation to the liabilities. You should also examine other options available for addressing corrupt objects.

Begin by answering the following questions:

What is the extent of the corruption?

To determine if there are corruptions and repair actions, execute the CHECK_OBJECT procedure and query the repair table.

What other options are available for addressing block corruptions? Consider the following:

If the data is available from another source, then drop, re-create, and repopulate the object.
Issue the CREATE TABLE...AS SELECT statement from the corrupt table to create a new one.
Ignore the corruption by excluding corrupt rows from SELECT statements.
Perform media recovery.

What logical corruptions or side effects副作用 are introduced when you use DBMS_REPAIR to make an object usable? Can these be addressed? What is the effort required to do so?

You might not have access to rows in blocks marked corrupt. However, a block can be marked corrupt even if there are rows that you can validly access.

It is also possible that referential integrity constraints are broken when blocks are marked corrupt. If this occurs, then disable and reenable the constraint; any inconsistencies are reported. After fixing all problems, you should be able to reenable the constraint.

Logical corruption can occur when there are triggers defined on the table. For example, if rows are reinserted, should insert triggers be fired or not? You can address these issues only if you understand triggers and their use in your installation.

If indexes and tables are not synchronized, then execute the DUMP_ORPHAN_KEYS procedure to obtain information from the keys that might be useful in rebuilding corrupted data. Then issue the ALTER INDEX...REBUILD ONLINE statement to synchronize the table with its indexes.

If repair involves loss of data, can this data be retrieved?

You can retrieve data from the index when a data block is marked corrupt. The DUMP_ORPHAN_KEYS procedure can help you retrieve this information.

2.3 Make Objects Usable

2.3.1 Corruption Repair: Using the FIX_CORRUPT_BLOCKS and SKIP_CORRUPT_BLOCKS Procedures

先标识corrupt blocks，再skip corrupt blocks

If corruptions involve a loss of data, such as a bad row in a data block, then all such blocks are marked corrupt by the FIX_CORRUPT_BLOCKS procedure. Then you can run the SKIP_CORRUPT_BLOCKS procedure, which skips blocks that are marked as corrupt. When the SKIP_FLAG parameter in the procedure is set, table and index scans skip all blocks marked corrupt. This applies to both media and software corrupt blocks.

2.3.2 Implications When Skipping Corrupt Blocks

Skip corrupt blocks后可能在下面情况查询返回结果不同，解决方法是不要在以下情况进行skip corrupt blocks query

When skipping corrupt blocks, a query can return different results in some situations.

If an index and table are not synchronized, then a SET TRANSACTION READ ONLY transaction can be inconsistent in situations where one query probes only the index, and a subsequent query probes both the index and the table. If the table block is marked corrupt, then the two queries return different results, thereby breaking the rules of a read-only transaction. One way to approach this is not to skip corruptions in a SET TRANSACTION READ ONLY transaction.

A similar issue occurs when selecting rows that are chained. A query of the same row may or may not access the corruption, producing different results.

2.4 Repair Corruptions and Rebuild Lost Data

2.4.1 Recover Data Using the DUMP_ORPHAN_KEYS Procedures

The DUMP_ORPHAN_KEYS procedure reports on index entries that point to rows in corrupt data blocks. All such index entries are inserted into an orphan key table that stores the key and rowid of the corruption.

After the index entry information has been retrieved, you can rebuild the index using the ALTER INDEX...REBUILD ONLINE statement.

2.4.2 Fix Segment Bitmaps Using the SEGMENT_FIX_STATUS Procedure

Use the SEGMENT_FIX_STATUS procedure if free space in segments is being managed by using bitmaps (SEGMENT SPACE MANAGEMENT AUTO).

This procedure recalculates the state of a bitmap entry based on the current contents of the corresponding block. Alternatively, you can specify that a bitmap entry be set to a specific value. Usually the state is recalculated correctly and there is no need to force a setting.

DBMS_REPAIR Examples

3.1 Examples:Building a Repair Table or Orphan Key Table

The following example creates a repair table for the users tablespace.

BEGIN

DBMS_REPAIR.ADMIN_TABLES (

TABLE_NAME => 'REPAIR_TABLE',

TABLE_TYPE => dbms_repair.repair_table,

ACTION => dbms_repair.create_action,

TABLESPACE => 'USERS');

END;

For each repair or orphan key table, a view is also created that eliminates any rows that pertain to objects that no longer exist. The name of the view corresponds to the name of the repair or orphan key table and is prefixed by DBA_ (for example, DBA_REPAIR_TABLE or DBA_ORPHAN_KEY_TABLE).

This example illustrates the creation of an orphan key table for the users tablespace.

BEGIN

DBMS_REPAIR.ADMIN_TABLES (

TABLE_NAME => 'ORPHAN_KEY_TABLE',

TABLE_TYPE => dbms_repair.orphan_table,

ACTION => dbms_repair.create_action,

TABLESPACE => 'USERS');

END;

查看表结构

DESC REPAIR_TABLE

DESC ORPHAN_KEY_TABLE

3.2 Examples:Detecting Corruption

An example illustrates detecting corruption with the CHECK_OBJECT procedure.You can optionally specify a range, partition name, or subpartition name when you want to check a portion of an object.

For each block, the transaction and data layer portions are checked for self consistency. During CHECK_OBJECT, if a block is encountered that has a corrupt buffer cache header, then that block is skipped.

The following is an example of executing the CHECK_OBJECT procedure for the scott.dept table.

SET SERVEROUTPUT ON

DECLARE num_corrupt INT;

BEGIN

num_corrupt := 0;

DBMS_REPAIR.CHECK_OBJECT (

SCHEMA_NAME => 'SCOTT',

OBJECT_NAME => 'DEPT',

REPAIR_TABLE_NAME => 'REPAIR_TABLE',

CORRUPT_COUNT => num_corrupt);

DBMS_OUTPUT.PUT_LINE('number corrupt: ' || TO_CHAR (num_corrupt));

END;

SQL*Plus outputs the following line, indicating one corruption:

number corrupt: 1

Querying the repair table produces information describing the corruption and suggesting a repair action.

SELECT OBJECT_NAME, BLOCK_ID, CORRUPT_TYPE, MARKED_CORRUPT,

CORRUPT_DESCRIPTION, REPAIR_DESCRIPTION

FROM REPAIR_TABLE;

OBJECT_NAME BLOCK_ID CORRUPT_TYPE MARKED_COR CORRUPT_DESCRIPTION REPAIR_DESCRIPTION

------------------------------------------------------------------------------

DEPT 3 1 FALSE kdbchk: row locked by non-existent transaction table=0 slot=0 lockid=32 ktbbhitc=1

mark block software corrupt

3.3 Example: Fixing Corrupt Blocks

此步只是标识出corruption blocks

Use the FIX_CORRUPT_BLOCKS procedure to fix the corrupt blocks in specified objects based on information in the repair table that was generated by the CHECK_OBJECT procedure.

Before changing a block, the block is checked to ensure that the block is still corrupt.

Corrupt blocks are repaired by marking the block software corrupt. When a repair is performed, the associated row in the repair table is updated with a timestamp.

This example fixes the corrupt block in table scott.dept that was reported by the CHECK_OBJECT procedure.

SET SERVEROUTPUT ON

DECLARE num_fix INT;

BEGIN

num_fix := 0;

DBMS_REPAIR.FIX_CORRUPT_BLOCKS (

SCHEMA_NAME => 'SCOTT',

OBJECT_NAME=> 'DEPT',

OBJECT_TYPE => dbms_repair.table_object,

REPAIR_TABLE_NAME => 'REPAIR_TABLE',

FIX_COUNT=> num_fix);

DBMS_OUTPUT.PUT_LINE('num fix: ' || TO_CHAR(num_fix));

END;

SQL*Plus outputs the following line:

num fix: 1

The following query confirms that the repair was done.

SELECT OBJECT_NAME, BLOCK_ID, MARKED_CORRUPT FROM REPAIR_TABLE;

OBJECT_NAME BLOCK_ID MARKED_COR

------------------------------ ---------- ----------

DEPT 3 TRUE

3.4 Example: Finding Index Entries Pointing to Corrupt Data Blocks

An example illustrates finding index entries pointing to corrupt data blocks using the DUMP_ORPHAN_KEYS procedure.

The DUMP_ORPHAN_KEYS procedure reports on index entries that point to rows in corrupt data blocks. For each index entry, a row is inserted into the specified orphan key table. The orphan key table must have been previously created.

This information can be useful for rebuilding lost rows in the table and for diagnostic purposes.

Note: This should be run for every index associated with a table identified in the repair table.

In this example, pk_dept is an index on the scott.dept table. It is scanned to determine if there are any index entries pointing to rows in the corrupt data block.

SET SERVEROUTPUT ON

DECLARE num_orphans INT;

BEGIN

num_orphans := 0;

DBMS_REPAIR.DUMP_ORPHAN_KEYS (

SCHEMA_NAME => 'SCOTT',

OBJECT_NAME => 'PK_DEPT',

OBJECT_TYPE => dbms_repair.index_object,

REPAIR_TABLE_NAME => 'REPAIR_TABLE',

ORPHAN_TABLE_NAME=> 'ORPHAN_KEY_TABLE',

KEY_COUNT => num_orphans);

DBMS_OUTPUT.PUT_LINE('orphan key count: ' || TO_CHAR(num_orphans));

END;

The following output indicates that there are three orphan keys:

orphan key count: 3

Index entries in the orphan key table implies that the index should be rebuilt. This guarantees that a table probe and an index probe return the same result set.

3.5 Example: Skipping Corrupt Blocks

An example illustrates skipping corrupt blocks using the SKIP_CORRUPT_BLOCKS procedure.

The SKIP_CORRUPT_BLOCKS procedure enables or disables the skipping of corrupt blocks during index and table scans of the specified object. When the object is a table, skipping applies to the table and its indexes. When the object is a cluster, it applies to all of the tables in the cluster, and their respective indexes.

The following example enables the skipping of software corrupt blocks for the scott.dept table:

BEGIN

DBMS_REPAIR.SKIP_CORRUPT_BLOCKS (

SCHEMA_NAME => 'SCOTT',

OBJECT_NAME => 'DEPT',

OBJECT_TYPE => dbms_repair.table_object,

FLAGS => dbms_repair.skip_flag);

END;

Querying scott's tables using the DBA_TABLES view shows that SKIP_CORRUPT is enabled for table scott.dept.

SELECT OWNER, TABLE_NAME, SKIP_CORRUPT FROM DBA_TABLES WHERE OWNER = 'SCOTT';

OWNER TABLE_NAME SKIP_COR

------------------------------ ------------------------------ --------

SCOTT ACCOUNT DISABLED

SCOTT BONUS DISABLED

SCOTT DEPT ENABLED

SCOTT DOCINDEX DISABLED

SCOTT EMP DISABLED

SCOTT RECEIPT DISABLED

SCOTT SALGRADE DISABLED

SCOTT SCOTT_EMP DISABLED

SCOTT SYS_IOT_OVER_12255 DISABLED

SCOTT WORK_AREA DISABLED

10 rows selected.

Manual Repair(NOT TEST)

SQL> CREATE TABLE BLOCK_BAK AS

SELECT /*+ ROWID(A)*/*

FROM BLOCK A

WHERE ROWID < 'AAATfAAQAAAAfQAAA'

UNION ALL

SELECT /*+ ROWID(A)*/*

FROM BLOCK A

WHERE ROWID >= 'AAATfAAQAAAAfRAAA'

THE 'LOW_RID' is the lowest rowid insert the corrupt block:

SELECT DBMS_ROWID.ROWID_CREATE(1,<DATA_OBJECT_ID>,<RFN>,<BL>,0) LOW_RID

FROM DUAL;

THE 'HI_RID' is the first rowid after the corrupt block:

SELECT DBMS_ROWID.ROWID_CREATE(1,<DATA_OBJECT_ID>,<RFN>,<BL>+1, 0) HI_RID

FROM DUAL;

其中:RFN表示数据文件编号，BL表示block编号, Data_object_id 就是表的object id

彦祖的小号

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Block Corrupt

Oracle有不同方法来监测及修复data block corruption，常用的修复方法重建对象对于表或索引它们自身包含数据，重建不能恢复数据，可以使用DBMS_REPAIR修复DBMS_REPAIR只是把table及index的corruption blocks标识后忽略，在下次查询或操作时不会对corruption blocks操作，不会对corruption block进行数据恢复，所以一般用于对块间的逻辑坏块进行恢复RMAN的RECOVER BLOCK来恢复Oracle Da..
复制链接

扫一扫

专栏目录