Data Recovery Advisor

Data Recovery Advisor is an Oracle Database tool that automatically diagnoses data failures, determines and presents appropriate repair options, and executes repairs at the user's request.

Data Recovery Advisor has both a command-line and GUI interface. The GUI interface is available in Oracle Enterprise Manager Cloud Control.

Basic Concepts of Data Recovery Advisor

1. Data Integrity Checks

数据完整性检查(data integrity check)可以被动或主动调用

The health assessment is known as a data integrity check and can be invoked reactively or proactively.

被动调用即在对corrupted data进行操作产生报错时,会自动对相关数据进行data integrity check。如果错误能被诊断出来则记录于ADR(Automatic Diagnostic Repository)中,只有记录在ADR中的FAILURES可通过DRA来产生repair advice and repair failures

A database operation involving corrupted data results in an error, which automatically invokes a data integrity check that searches the database for failures related to the error.

If failures are diagnosed, then they are recorded in the Automatic Diagnostic Repository (ADR), which is a directory structure stored outside of the database. You can use Data Recovery Advisor to generate repair advice and repair failures only after failures have been detected by the database and stored in ADR.

主动调用指通过Health Monitor主动执行检测,通过VALIDATE或BACKUP VALIDATE命令来检查block corruption

You can also invoke a data integrity check proactively. You can execute the check through the Health Monitor, which detects and stores failures in the same way as when the checks are invoked reactively.

You can also check for block corruption with the VALIDATE and BACKUP VALIDATE commands, as explained in "Checking for Block Corruptions by Validating the Database".

  1. Failures

Failure是通过data integrity check检测且能被诊断的问题

A failure is a persistent data corruption that is detected by a data integrity check.

A failure can manifest表现为 itself as observable symptoms症状 such as error messages and alerts, but a failure is different from a symptom because it represents a diagnosed problem. After a problem is diagnosed by the database as a failure, you can obtain information about the failure and potentially repair it with Data Recovery Advisor.

因为failure存于ADR而不是数据库中,可以在数据库未启动时访问

Because failure information is not stored in the database itself, the database does not need to be open or mounted for you to access it. You can view failures when the database is started in NOMOUNT mode. Thus, the availability of the control file and recovery catalog does not affect the ability to view detected failures, although it may affect the feasibility of some repairs.

Data Recovery Advisor can diagnose failures such as the following:

  1. Components such as data files and control files that are not accessible because they do not exist, do not have the correct access permissions, have been taken offline, and so on
  2. Physical corruptions such as block checksum failures and invalid block header field values
  3. Inconsistencies such as a data file that is older than other database files
  4. I/O failures such as hardware errors, operating system driver failures, and exceeding operating system resource limits (for example, the number of open files)

The Data Recovery Advisor may detect or handle some logical corruptions. In general, corruptions of this type require help from Oracle Support Services.

  1. Supported Database Configurations for Data Recovery Advisor
  1. About Data Recovery Advisor and Oracle Real Application Clusters

Data Recovery Advisor only supports single-instance databases. Oracle Real Application Clusters (Oracle RAC) databases are not supported.

If a data failure occurs that brings down all Oracle RAC instances, then you can mount the database in single-instance mode and use Data Recovery Advisor to detect and repair control file, SYSTEM data file, and data dictionary failures. You can also invoke data recovery checks proactively to test other database components for data failures. This approach does not detect data failures that are local to other cluster instances, for example, an inaccessible data file.

  1. About Data Recovery Advisor and Oracle Data Guard

There are some limitation with Data Recovery Advisor in an Oracle Data Guard environment.

In a Data Guard environment, Data Recovery Advisor cannot do the following:

  1. Use files transferred from a physical standby database to repair failures on a primary database
  2. Diagnose and repair failures on a standby database

However, if the primary database is unavailable, then Data Recovery Advisor may recommend a failover to a standby database. After the failover you can repair the old primary database.

  1. About Data Recovery Advisor and CDBs

Data Recovery Advisor can only be used to diagnose and repair data corruptions in non-CDBs and the root of a multitenant container database (CDB). Data Recovery Advisor is not supported for pluggable databases (PDBs).

Basic Steps of Diagnosing and Repairing Failures

The Data Recovery Advisor workflow begins when you either suspect or discover a failure. You can discover failures in many ways, including error messages, alerts, trace files, and failed data integrity checks.

使用DRA三步曲必须为LIST- ADVISE- REPAIR,不能少步骤或是顺序颠倒 

To respond to failures, start an RMAN session and perform all of the following steps in the same session and in the order they are listed:

  1. List failures by running the LIST FAILURE command.
  2. If you suspect that failures exist that have not been automatically diagnosed by the database, then run VALIDATE DATABASE to check for corrupt blocks and missing files.

If VALIDATE detects a problem, then RMAN triggers execution of a failure assessment. If a failure is detected, then RMAN logs it into the Automated Diagnostic Repository, where is can be accessed by Data Recovery Advisor.

  1. Determine repair options by running the ADVISE FAILURE command.
  2. Choose a repair option. You can repair the failures manually or run the REPAIR FAILURE command to fix them automatically.
  3. Return to the first step to confirm that all failures were repaired or determine which failures remain.

  1. Listing Failures

Failures are uniquely identified by failure numbers. These numbers are not consecutive, so gaps between failure numbers have no significance.

LIST FAILURE只是列出FAILURES并不发现FAILURES, 在FAILURE被手动或自动修复后会从LIST FAILURE中移除,可以多次执行刷新已诊断FAILURES是否已被修复

The LIST FAILURE command does not execute data integrity checks to diagnose new failures; rather, it lists the results of previously executed assessments. Thus, repeatedly executing LIST FAILURE reveals new failures only if the database automatically diagnosed them in response to errors that occurred in between command executions.

If a user fixed failures manually, or if transient failures disappeared, then Data Recovery Advisor removes these failures from the LIST FAILURE output.

如果FAILURE当前无法被验证则LIST FAILURE中显示为OPEN状态 

If a failure cannot be revalidated at this moment (for example, because of another failure), LIST FAILURE shows the failure as open.

RMAN> LIST FAILURE;

List of Database Failures

=========================

 

Failure ID Priority Status    Time Detected Summary

---------- -------- --------- ------------- -------

142        HIGH     OPEN      23-APR-13     One or more non-system datafiles are missing

101        HIGH     OPEN      23-APR-13     Datafile 1: '/disk1/oradata/prod/system01.dbf' contains one or more corrupt blocks

LIST FAULURE会合并相关subfailures为一个FAILURE,使用DETAIL查看组内详细subfailures

For clarity, Data Recovery Advisor groups related failures together. By default, Data Recovery Advisor lists information about the group of failures, although you can specify the DETAIL option to list information about the individual subfailures. A subfailure has the same format as a failure. You can get advice on a subfailure and repair it separately or in combination with any other failure.

The following example lists detailed information about failure 101.

RMAN> LIST FAILURE 101 DETAIL;

List of Database Failures

=========================

 

Failure ID Priority Status    Time Detected Summary

---------- -------- --------- ------------- -------

101        HIGH     OPEN      23-APR-13     Datafile 1: '/disk1/oradata/prod/system01.dbf' contains one or more corrupt blocks

  List of child failures for parent failure ID 101

  Failure ID Priority Status    Time Detected Summary

  ---------- -------- --------- ------------- -------

  104        HIGH     OPEN      23-APR-13     Block 56416 in datafile 1: '/disk1/oradata/prod/system01.dbf' is media corrupt

    Impact: Object BLKTEST owned by SYS might be unavailable

FAILURES状态分为OPEN/CLOSED,分别表示未修复及已修复报错

Failure优先级分为CRITICAL/HIGH/LOW. CRITICAL 为导致数据库不可用必须立刻修复; HIGH为导致数据库部分功能不可用需要尽快修复; LOW表示可以被忽略。DRA只使用前两个级别

默认LIST FAILURE只显示CRITICAL与HIGH priority以及OPEN状态的报错,如果显示全部分使用LIST FAILURE ALL

Failures with CRITICAL priority require immediate attention because they make the whole database unavailable. Failures with HIGH priority make a database partly unavailable or unrecoverable and usually have to be repaired quickly.

Every failure has a failure priority: CRITICAL, HIGH, or LOW. Data Recovery Advisor only assigns CRITICAL or HIGH priority to diagnosed failures.

A LOW priority indicates that a failure can be ignored until more important failures are fixed.

By default LIST FAILURE displays only failures with CRITICAL and HIGH priority.

LIST FAILURE可以对相关状态或优先级FAILURE过滤

LIST FAILURE ALL;

LIST FAILURE LOW;                     

LIST FAILURE CLOSED;

LIST FAILURE EXCLUDE FAILURE 234234;

  1. Checking for Block Corruptions

VALIDATE或BACKUP VALIDATE可同时检查物理和逻辑坏块

VALIDATE or BACKUP VALIDATE commands can check data files and control files for physical and logical corruption.

If RMAN discovers block corruptions, then it logs them into the Automatic Diagnostic Repository and creates one or more failures. You can then use Data Recovery Advisor to list information about the failures and repair them.

RMAN> VALIDATE CHECK LOGICAL SKIP INACCESSIBLE DATABASE;

  1. Determining Repair Options

Use the ADVISE FAILURE command to display repair options after running LIST FAILURE in an RMAN session. This command prints a summary of the failures and implicitly closes all open failures that are repaired.

ADVISE FAILURE列出的修复项分为手动与自动,手动修复项又分为强制及可选

Where appropriate, the ADVISE FAILURE command presents a list of manual and automated repair options. Manual options, which are categorized as either mandatory or optional, appear first. In some cases, an optional manual fix can avoid more extreme actions such as restoring and recovering data files. As a rule, use the repair technique that has the least effect on the database and the least possibility for error.

DRA在产生自动修复建议前会进行可行性检查,确保操作可执行

Data Recovery Advisor performs feasibility可行性 checks before recommending an automated repair. For example, Data Recovery Advisor checks that all backups and archived redo logs needed for media recovery are present and consistent. Data Recovery Advisor may need specific backups and archived redo logs. If the files needed for recovery are not available, then recovery is not possible.

For performance reasons, Data Recovery Advisor does not exhaustively详尽的 check every byte in every file. Thus, a feasible repair may still fail because of a corrupted backup or archived redo log file.

注意有时在数据库运行时进行advise failure是不会产生修复脚本的(因为oracle认为在open下没什么好的办法),要进入mount后再advise一遍

Determining Repair Options for All Failures

RMAN> ADVISE FAILURE;

Determining Repair Options for a Subset of Failures

RMAN> ADVISE FAILURE 101;

RMAN> ADVISE FAILURE LOW;

  1. Repairing Failures

优先进行manual repairs未修复再进行auto repairs

If ADVISE FAILURE suggests manual repairs, then try these first. If manual repairs are not possible, or if they do not repair all failures, then you can use REPAIR FAILURE to automatically fix failures suggested in the most recent ADVISE FAILURE command in your current RMAN session.

If you do not specify a particular repair option, then RMAN uses the first repair option of the most recent ADVISE FAILURE command in the current session. By default the repair script is displayed to standard output. You can use the SPOOL command to write the script to an editable file.

By default, REPAIR FAILURE prompts for confirmation before it begins executing. You can suppress the confirmation prompt by specifying the NOPROMPT option.

After it starts executing, the command indicates the current phase of repair. Depending on the circumstances, RMAN may prompt for a response. After executing a repair, RMAN reevaluates all existing failures on the chance that they may have been fixed during this repair.

While repairing a failure, wherever possible, RMAN takes a file online, restores and recovers it, and then brings it back online again. You can repair failures for a selected database, tablespace, or data file.

Before performing a repair, it is typically advisable to preview it by specifying the PREVIEW option. RMAN does not make any repairs and generates a script with all repair actions and comments.

RMAN> REPAIR FAILURE PREVIEW;

RMAN> REPAIR FAILURE;

  1. Changing Failure Status and Priority

使用CHANGE FAILURE来更改FAILURE的状态与优先级

CHANGE FAILURE不能更改CRITICAL为其它优先级,LIST FAILURE默认只显示CRITICAL与HIGH priority. 因此CHANGE主要用于改LOW不显示

If a failure was assigned a HIGH priority, but the failure has little impact on database availability and recoverability, then you can downgrade the priority to LOW.

You can use the CHANGE command to change the status for LOW and HIGH failures, but you cannot change the status of CRITICAL failures. The main reason for changing a priority to LOW is to reduce the LIST FAILURE output.

只在failure被修复但未自动closed,使用CHANGE FAILURE来改变failure状态

如果failure未解决而手动更改为closed状态只会让DRA在下次check时再重建一次failure

You can use CHANGE FAILURE to change the status of an open failure to CLOSED if you have fixed it manually. However, it makes sense to use CHANGE FAILURE ... CLOSED only if for some reason the failure was not closed automatically. If a failure still exists when you use CHANGE to close it manually, then Data Recover Advisor re-creates it with a different failure ID when the appropriate data integrity check is executed.

Typically, you specify the failures to change by failure number. You can also change failures in bulk by specifying ALL, CRITICAL, HIGH, or LOW. You can change a failure to CLOSED or to PRIORITY HIGH or PRIORITY LOW.

RMAN> CHANGE FAILURE 101 PRIORITY LOW;

RMAN> CHANGE FAILURE LOW PRIORITY HIGH;

RMAN> CHANGE FAILURE EXCLUDE FAILURE 101 PRIORITY LOW;

RMAN> CHANGE FAILURE 101 CLOSED;

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值