一:故障描述
ebs应用时候报i/o错,查看发现数据库处于mount状态,想open时又报错
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01115: IO error reading block from file 8 (block # 135129)
ORA-01110: data file 8: '/export/zones/zone02/root/dev_data/DEV/db/apps_st/data/system08.dbf'
ORA-27063: number of bytes read/written is incorrect
SVR4 Error: 5: I/O error
Additional information: -1
Additional information: 8192
通过rman检查是否有坏块
RMAN> BACKUP VALIDATE DATAFILE 8;
Starting backup at 17-OCT-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=651 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00008 name=/export/zones/zone02/root/dev_data/DEV/db/apps_st/data/system08.dbf
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 10/17/2012 09:57:43
ORA-19501: read error on file "/export/zones/zone02/root/dev_data/DEV/db/apps_st/data/system08.dbf", block number 135040 (block size=8192)
ORA-27063: number of bytes read/written is incorrect
SVR4 Error: 5: I/O error
Additional information: -1
Additional information: 1048576
RMAN> blockrecover datafile 8 block 135040;
Starting recover at 17-OCT-12
using channel ORA_DISK_1
starting media recovery
media recovery complete, elapsed time: 00:00:00
Finished recover at 17-OCT-12
RMAN> list backup;
specification does not match any backup in the repository
RMAN>
RMAN> BACKUP VALIDATE DATAFILE 9;
Starting backup at 17-OCT-12
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00009 name=/export/zones/zone02/root/dev_data/DEV/db/apps_st/data/system09.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
9 OK 0 92701 209152 114648244
File Name: /export/zones/zone02/root/dev_data/DEV/db/apps_st/data/system09.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 77493
Index 0 36139
Other 0 2819
Finished backup at 17-OCT-12
RMAN> BACKUP VALIDATE DATAFILE 7;
Starting backup at 17-OCT-12
using channel ORA_DISK_1
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00007 name=/export/zones/zone02/root/dev_data/DEV/db/apps_st/data/system07.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:15
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
7 OK 0 34 106112 114647296
File Name: /export/zones/zone02/root/dev_data/DEV/db/apps_st/data/system07.dbf
Block Type Blocks Failing Blocks Processed
---------- -------------- ----------------
Data 0 77742
Index 0 28228
Other 0 108
Finished backup at 17-OCT-12
RMAN>
通过对比,发现7、9号数据文件完好,而8号数据文件有问题,怀疑是8号文件所在的扇区出现了问题,又做了如下验证
-bash-3.2$ cp system08.dbf system08.dbf.new
cp: system08.dbf: I/O error
-bash-3.2$ cp system07.dbf system07.dbf.new
-bash-3.2$
通过cp确实验证了是操作系统扇区的问题,一般情况下,如果出现system.dbf文件损坏
可以将原来报错的文件拷贝到另一目录,名字不变,然后进行恢复。
二:数据库原始备份情况
数据库已经开启了归档模式,但是并没有做数据库的rman备份
SQL> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /export/zones/zone02/root/dev_data/DEV/db/apps_st/archlog_dev
Oldest online log sequence 8
Next log sequence to archive 10
Current log sequence 10
SQL>
后来经过核查,发现保存了一份9月24日的DEV.tar包 ,归档日志文件从9月22日到10月16日都完好
三:数据库恢复步骤
1:备份原来DEV 目录下的归档日志文件
# mkdir archlog-bak
# su – oracle
$ cd /export/zones/zone02/root/dev_data/DEVold/db/apps_st/archlog_dev
$ cp ./* /archlog-bak
2:解压DEV.tar包
解压DEV.tar包前,先将DEV目录更改为DEV.OLD
# cd dev_data
# tar -xvf /EBS_DEV_20120924.tar
解压完成后,数据只能先启动到mount状态;因为tar出来的数据库是一个冷备,数据库三种文件的scn都一致,如果执行一个startup命令,数据库就可以直接open到9月24备份前的那个状态。本次恢复是采用最新的控制文件、利用旧的数据文件备份做不完全恢复。首先等tar解压完毕以后,将备份的归档日志拷贝到新解压的归档日志目录
3:将数据库启动到mount状态,找到控制文件创建语法
SQL> startup mount;
SQL> alter database backup controlfile to trace;
然后切换到trace文件所在的目录下,/dev_data/DEV/db/tech_st/11.2.0/admin/DEV_zone02/diag/rdbms/dev/DEV/trace
执行如下命令找到最新的以trc后缀结尾的文件
bash-3.2$ ls -lt |more
然后用more命令查看相关trc文件: DEV_ora_19797.trc
找到控制文件的创建语法,截取 Set #2. RESETLOGS case 中的语句,粘贴到ue工具中,去掉文字前的所有空格
(./符号前的空格必须去掉,临时文件可以不用创建),然后粘贴到/home/oracle/下的 ct.sh ,并赋予执行权限)
4:关闭数据库,备份原来的控制文件到 /home/oracle目录下,然后删除原来的控制文件,并将db启动到 nomount状态下,重建控制文件
SQL> shutdown immediate;
bash-3.2$ cd /export/zones/zone02/root/dev_data/DEV/db/apps_st/data/
bash-3.2$ cp cnt* /home/oracle/
bash-3.2$ ls /home/oracle/cnt*
/home/oracle/cntrl01.dbf /home/oracle/cntrl02.dbf /home/oracle/cntrl03.dbf
SQL> startup nomount;
SQL> @/home/oracle/ct.sh;
MicrosoftInternetExplorer402DocumentNotSpecified7.8Normal0
5:关闭数据库,拷贝备份的归档日志到新的归档路径下,并启动db到mount状态,执行数据库的恢复
SQL> shutdown immediate;
SQL> startup mount;
SQL> recover database using backup controlfile until to cancel;
恢复到最后,出现了错误:原因是归档日志最多只有116号,而控制文件是最新的,因此它试着去找117号归档日志,但是没找到,就报错。
MicrosoftInternetExplorer402DocumentNotSpecified7.8Normal0
然后继续执行 recover database using backup controlfile 命令,按cancel命令取消,最后以resetlogs的方式打开db,却报错。
会不会是116号归档日志恢复完,会去找之前的在线日志呢?因此又做了 recover database using backup controlfile命令,
贴上了原来备份的在线日志文件的路径,但是同样报错
MicrosoftInternetExplorer402DocumentNotSpecified7.8Normal0
6:然后备份spfile,关闭数据库,加入隐含参数
SQL> create pfile from spfile;
SQL> shutdown immediate;
手动修改initDEV.ora 文件,增加隐含参数 *._allow_resetlogs_corruption=true
SQL> startup mount;
然后再次执行 recover database using backup controlfile until cancel;
再以resetlogs的方式打开db
SQL> alter database open resetlogs;
以下网址是eygle关于该隐含参数的一次实验
http://www.eygle.com/archives/2005/10/oracle_hidden_allow_resetlogs_corruption.html
MicrosoftInternetExplorer402DocumentNotSpecified7.8Normal0
7:以resetlogs方式打开db后,手动删除resetlogs之前的归档日志,并立即做个数据库的全备份
RMAN> crosscheck archivelog all;
以下可以参考eygle大师的一些恢复方法
http://www.eygle.com/internal/How.to.Resolve.Ora-600.2662.error.htm
http://www.eygle.com/archives/2005/12/oracle_diagnostics_howto_deal_2662_error.html
http://www.eygle.com/archives/2010/12/fractured_controlfile_recovery.html
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/24862808/viewspace-747083/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/24862808/viewspace-747083/