[OCP study & practice]2.Deal with database corruption

1.块损坏故障现象

ORA-01578“ORACLE data block corrupted (file # %s, block # %s)”

---发现损坏的数据块时生成此信息

---始终返回绝对文件号和绝对块号

---返回到发布查询的会话(该查询在发现损坏时执行)

--- alert.log 文件中显示

2.块损坏的原因

对块进行读写时会进行一致性检查,检查内容包括:1.块版本、2.高速缓存(Cache)中数据块地址(DataBlock Address)及块缓冲区(Block Buffer)中数据块地址比较、3.块校验和(Checksum),如果检查不通过,或者块不被oracle识别,说明发现了块损坏,块损坏的原因包括:

---介质损坏(Media corrupt)

---软件(逻辑)损坏(Logical/Software corrupt)

3.如何处理块损坏

---检查预警日志和操作系统日志文件、解决系统、硬件层面问题。

---使用可用的诊断工具,找出损坏的类型。

---多次运行检查功能,确定错误是否持续存在。(使用Analyse /Dbverify确认)

---根据需要,从损坏的对象中恢复数据。

4.处理块损坏常用工具(重点)

11751028_201112082239251.jpg 

 

 

1.建立实验环境、检查相关参数

SQL>create tablespace blktesttbs datafile 'F:\Oracle\product\10.2.0\oradata\oratest\blktesttbs.dbf'  size 20M;

SQL>alter user jesse default tablespace blktesttbs quota unlimited on blktesttbs;

SQL>select file_name,file_id,tablespace_name from dba_data_files;

SQL>create table t as select * from dba_objects;

SQL>select count(*) from jesse.t;

SQL>sho parameter db_block_checking;

SQL>sho parameter db_block_checksum;

2.确定实验数据

SQL>select header_file,header_block from dba_segments where segment_name='T' and wner='JESSE' ;

11751028_201112082240291.jpg

 

SQL>select rowid,

dbms_rowid.rowid_object(rowid) object_id,

dbms_rowid.rowid_relative_fno(rowid)  file_id,

dbms_rowid.rowid_block_number(rowid)  block_id ,

dbms_rowid.rowid_row_number(rowid)   num ,

rowidtochar(rowid) from jesse.t where rownum<100;

 

查询结果如下(关于rowid解释请参见Tips

11751028_201112082241381.jpg

 

可以看到6号文件,14block储存了86条记录,尝试破坏此数据块。

3.模拟数据块损坏

3.1 dump出未损坏的数据块备用

alter system dump datafile 6 block 14;

3.2

关闭数据库,并使用UltraEdit打开6号数据文件,即'F:\Oracle\product\10.2.0\oradata\oratest\blktesttbs.dbf'

Ctrl+G 查找指定数据块(block num * block size),本例为14*8192=114688

改变前

11751028_201112082242301.jpg

改变后

11751028_201112082243031.jpg

重新启动数据库验证

 11751028_201112082244021.jpg

 

4.开始坏数据块修复

4.1使用DBVERYFY检验坏块

C:\>dbv file=F:\Oracle\product\10.2.0\oradata\oratest\blktesttbs.dbf \ blocksize

=8192

 

DBVERIFY: Release 10.2.0.1.0 - Production on 星期三 12 7 17:14:40 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

DBVERIFY - 开始验证: FILE = F:\Oracle\product\10.2.0\oradata\oratest\blktesttbs.

dbf

5 标记为损坏

Corrupt block relative dba: 0x01800005 (file 6, block 5)

Bad header found during dbv:

Data in bad block:

 type: 14 format: 1 rdba: 0x01800007

 last change scn: 0x0000.001b6a17 seq: 0x1 flg: 0x04

 spare1: 0x2 spare2: 0x30 spare3: 0x0

 consistency value in tail: 0x6a171e01

 check value in block header: 0xc180

 computed block checksum: 0x3310

 

14 流入 - 很可能是介质损坏

Corrupt block relative dba: 0x0180000e (file 6, block 14)

Fractured block found during dbv:

Data in bad block:

 type: 22 format: 2 rdba: 0x0180000e

 last change scn: 0x0000.001b8c11 seq: 0x1 flg: 0x04

 spare1: 0x0 spare2: 0x0 spare3: 0x0

 consistency value in tail: 0x8c110601

 check value in block header: 0xdbc1

 computed block checksum: 0x10

 

DBVERIFY - 验证完成

检查的页总数: 2560

处理的页总数 (数据): 1439

失败的页总数 (数据): 0

处理的页总数 (索引): 0

失败的页总数 (索引): 0

处理的页总数 (其它): 39

处理的总页数 ()  : 0

失败的总页数 ()  : 0

空的页总数: 1080

标记为损坏的总页数: 2

流入的页总数: 1

最高块 SCN            : 1805388 (0.1805388)

 

4.2 analyze分析表

11751028_201112082244321.jpg 

4.3 使用rman

RMAN>backup validate database;(检查数据库,结果写入V$DATABASE_BLOCK_CORRUPTION)

SQL>select * from V$DATABASE_BLOCK_CORRUPTION;

11751028_201112082245441.jpg

 

 

5.修复错误

5.1 使用flashback

5.2 使用DBMS_REPAIR,包括如下过程

---CHECK_OBJECT

---FIX_CORRUPT_BLOCKS(常用,尝试修复坏块)

---DUMP_ORPHAN_KEYS

---REBUILD_FREELISTS

---SEGMENT_FIX_STATUS

---SKIP_CORRUPT_BLOCKS(常用,如尝试修复坏块失败,则跳过检查后导出并重建表) 

---ADMIN_TABLES

5.3 使用RMAN BMR(Block Media Recover)

5.4 使用BBED(Oracle Block Browser and EDit Tool)

---参考http://www.itpub.net/thread-1475582-1-1.html

 

6.OCP题目解析

11. You work as a database administrator at Certkiller .com. Your dat

abase is open and running in ARCHIVELOG mode. You take RMAN full bac

kups every Sunday night. On Monday morning, while querying the user1.

employees table, you receive the following error message:

01578: ORACLE data block corrupted (file # 5, block # 51)

ORA-01110: data file 5: '/u01/app/oracle/oradata/orcl/example01.dbf'

You need to rectify the corruption while ensuring the following:

The data file should remain online. The mean time to recover (MTTR) s

hould be minimal. You are not using a backup control file and all the

archived logs are accessible. Which option would you choose?

A. flash back the corrupted blocks

B. use the DBMS_REPAIR package

C. use the RMAN TSPITR command

D. use the RMAN BLOCKRECOVER command

E. use the RESTORE DATABASE and RECOVER DATABASE commands

F. investigate the time at which the corruption occurred and perf

orm a point-in-time recovery

答案:D

周日有一份最新的full backup,要求MTTR/DATAFILE ONLINE,所以rman BMR

 

53. There was media failure and you need to check the data files for

any block corruption. Which option would you use to create a report o

n any corruptions found within the database?

A. the DBNEWID utility

B. the DBVERIFY utility

C. the ANALYZE command

D. the RMAN REPORT command

E. the RMAN CROSSCHECK command

F. the CHECK_OBJECT procedure of the DBMS_REPAIR package

答案:B

DBVERIFY:是一种外部命令行实用程序,可以对脱机或联机的数据库执行

物理数据结构完整性检查。可以对备份文件与联机文件(或文件片段)运行此实

用程序。只能检查数据文件;不能检查重做日志文件

ANALYZE:使用 ANALYZE 命令可以验证表或表分区的结构,以及索引或索引

分区的结构。要分析的对象必须位于本地计算机,并且必须是在您自己的方案中,

或者必须拥有ANALYZE ANY 系统权限。CASCADE 选项可以验证对象,包括该对象

的所有相关对象。不将块标记为软损坏;只报告软损坏情况

RMAN CROSSCHECK:当手工删除了归档日志以后,Rman 备份会检测到日志缺

失,从而无法进一步继续执行。所以此时需要手工执行crosscheck 过程,之后

Rman 备份可以恢复正常。

 

85. A data file become corrupted in your database due to bad sectors

on the disk. Because of corruption, you lost all the important tables

in that data file. Which method would you use for recovery?

A. Flash back all the tables in the data file, one by one.

B. Restore the data file to a new location and perform. a media recove

ry.

C.Flash back the database, there is no need to restore the data file

D. Restore the data file from the most recent backup and flash the da

tabase.

Answer: B

因为磁盘坏道,数据文件损坏,导致许多重要的表丢失,你可以先restore

数据文件到一个新的磁盘位置,然后再recovery

Flashback Database 不能解决Media Failure,这种错误RMAN 恢复仍是唯一选择

 

88. You execute the following RMAN command in the order shown below:

BACKUP VALIDATE DATABASE;

BLOCKRECOVER CORRUPTION LIST;

What will these commands do?

A. Create a backup of the database and recover all corrupted blocks f

ound in the backup.

B. Run a backup validation and list all the logically corrupt blocks

as well as physically corrupt blocks in the database.

C. Run a backup validation to populate V$COPY_CORRUPTION view, and th

en list any corrupt blocks recorded in the view.

D. Run a backup validation to populate V$DATABASE_BLOCK_CORRUPTION vi

ew, and then repair any corrupt blocks recorded in the view.

E. Run a backup validation, repair any corrupt blocks found during th

e validation process, and then update V$DATABASE_BLOCK_CORRUPTION vie

w to indicate which corrupt blocks have been repaired.

Answer: D

backup validate database;

--检查所有数据文件是否存在坏块,并不执行实际备份, 这个时候,访问

v$database_block_corruption 可以看到详细的坏块的信息.

执行BLOCKRECOVER CORRUPTION LIST,会自动按V$DATABASE_BLOCK_CORRUPTION 进行修

复。V$DATABASE_BLOCK_CORRUPTION 视图显示当前损坏的数据库块的列表。

通过使用 CORRUPTION LIST 子句,可以恢复 V$DATABASE_BLOCK_CORRUPTION 中列出的

块。

BACKUP VALIDATE DATABASE

BLOCKRECOVER CORRUPTION LIST RESTORE UNTIL TIME 'sysdate 10';

因为损坏块数量超过可接受的数量,所以要使用BACKUP VALIDATE DATABASE 这个命令

备份,在备份的同时会在备份操作期间填写V$DATABASE_BLOCK_CORRUPTION,然后再使用

CORRUPTION LIST 子句,可以恢复 V$DATABASE_BLOCK_CORRUPTION 中列出的块。

 

96. You need to check the EMP_EAST partition in the EMPLOYEES table f

or physical corruptions. You also need to verify that the rows belong

to the correct partition. Which option could you use?

A. LogMiner

B. The DBNEWID utility

C. The DBVERIFY utility

D. The ANALYZE command

E. The RMAN REPORT command

F. The RMAN CROSSCHECK command.

G. The RMAN BLOCKRECOVER command.

答案:D

 

120. What are the two advantages of RMAN Block Media Recovery (BMR) o

ver file-level recovery? (Choose two)

A. BMR lowers the mean time to recover (MTTR).

B. BMR supports point-in-time recovery of individual data blocks.

C. BMR enables you to use incremental backups for block recovery.

D. BMR enables recovery even when the database is not mounted or open

E. BMR enables you to use proxy backups to perform. block media recove

ry.

F. BMR enables increased availability of data during recovery because

the data file requires a recovery can remain online.

Answer: A, F

Block oracle 数据库最小恢复单位。鉴于此,BMR 可以大大缩短恢复的时

间,另外,在进行BMR 时不需要停掉数据库,可以直接在数据库open 状态对数

据文件进行恢复,只是需要恢复的块被锁定无法访问,可以说对应用的影响非常

使用RMAN Block Media Recovery (BMR)的好处

a、减少MTTR 的时间

b、由于正在恢复的数据文件继续保持联机状态,所以可以提高数据在介质恢复

期间的可用性。

对于选项c 的结论:

You must have a full backup of the file containing the corrupt blocks:

block media recovery cannot use incremental backups.

Level 0,或者在无Level 0 可用的备份下做的Level 1 的增量备份都是一个完整的包含

了损坏的块的备份,所以应该也是可以用来执行Block Media Recovery .只是说当使用过

Level 0,或者在无Level 0 可用的备份下做的Level 1 的增量备份后不会使用更高Level

的增量备份进行恢复,而只会采用应用Archived log files 的方式进行恢复.

 

169. The EMPLOYEES table is stored in the USERS tablespace. You need

to check if the EMPLOYEES table is affected by the block corruption f

ound in the USERS tablespace. Which option would you use?

A. the RMAN LIST command

B. the DBNEWID utility

C. the RMAN REPORT command

D. the RMAN BLOCKRECOVER command

E. the RMAN CROSSCHECK command

F. the ANALYZE command

答案:F

 

5. You are performing a block media recovery on the tools01.dbf data file
in the SALES database using RMAN. Which two statements are correct in this
scenario? (Choose two.)
A. You must ensure that the SALES database is mounted or open.
B. You must restore a backup control file to perform. a block media
recovery.
C. You must take the tools01.dbf data file offline before you start a block
media recovery.
D. You must put the database in NOARCHIVELOG mode to perform. a block media
recovery.
E. You can perform. only a complete media recovery of individual blocks,
point-in-time recovery of individual data blocks is not supported.

答案:A E

数据库必须在mounted or open 状态下才可以使用RMAN BLOCKRECOVER
RMAN BLOCKRECOVER 始终执行完全恢复。使用BLOCKRECOVER 命令时,不能执行时间点恢复
point-in-time recovery=TSPITR

 

参考资料

1.       Oracle Database 10g-Administration Workshop II Chapter 7

2.       http://www.laoxiong.net/how_to_mark_corruption_block_and_recovery.html (老熊博客)

3.       http://www.51action.cn/?p=182 oracle中的rowid

4.       http://blog.csdn.net/tianlesoftware/article/category/714767 tianlesoftware OCP题解)

 

fj.png0.jpg

fj.png1.jpg

fj.png2.jpg

fj.png3.jpg

fj.png4.jpg

fj.png5.jpg

fj.png6.jpg

fj.png7.jpg

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/11751028/viewspace-712994/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/11751028/viewspace-712994/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值