【识记】文件系统损坏引起部分文件丢失,导致oracle 宕机记录

还没到公司,就接到电话,数据库宕机了。紧接着一天都忙着处理宕机事件,连午饭都没吃得上。

直到晚上才搞定,再此证明DBA的第一法则:数据备份高于一切呀!No 备份 Shi 翘翘!

---------------------------------------------------------------------------------------------------------------------------------------------------------------

1.最初告警信息
LTER DATABASE RECOVER  datafile 2 
Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 2 Seq 5216 Reading mem 0
  Mem# 0: /orashare/oradata/xxxx/redo02.log
  Mem# 1: /orashare/oradata/xxxx/redo12.rdo
Errors in file /opt/oracle/diag/rdbms/xxxx/xxxx/trace/xxxx_ora_26462.trc  (incident=78398):
ORA-00600: internal error code, arguments: [3020], [2], [9231], [8397839], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 9231, file offset is 75620352 bytes)
ORA-10564: tablespace SYSAUX
ORA-01110: data file 2: '/orashare/oradata/xxxx/sysaux01.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 6213
Incident details in: /opt/oracle/diag/rdbms/xxxx/xxxx/incident/incdir_78398/xxxx_ora_26462_i78398.trc
Media Recovery failed with error 600
ORA-283 signalled during: ALTER DATABASE RECOVER  datafile 2  ...

数据库只能mount,不能open.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01172: recovery of thread 1 stuck at block 9231 of file 2
ORA-01151: use media recovery to recover block, restore backup if needed

 

2.解决过程思路以及记录
STEP1:告警提示是有坏块了.解决方法:进行坏块修复.
RMAN> blockrecover datafile 2 block 9231;

Starting recover at 25-APR-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=233 device type=DISK

starting media recovery
media recovery complete, elapsed time: 00:00:00

Finished recover at 25-APR-13

修复完成,但是依旧open出错.
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01172: recovery of thread 1 stuck at block 9231 of file 2
ORA-01151: use media recovery to recover block, restore backup if needed

 

STEP2:解决方法:进行整个datafile 2的recover

RMAN> recover datafile 2;

Starting recover at 25-APR-13
using channel ORA_DISK_1

starting media recovery
media recovery failed
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 04/25/xxxx 10:19:08
ORA-00283: recovery session canceled due to errors
RMAN-11003: failure during parse/execution of SQL statement: alter database recover if needed
 datafile 2
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [3020], [2], [9231], [8397839], [], [], [], [], [], [], [], []
ORA-10567: Redo is inconsistent with data block (file# 2, block# 9231, file offset is 75620352 bytes)
ORA-10564: tablespace SYSAUX
ORA-01110: data file 2: '/orashare/oradata/xxxx/sysaux01.dbf'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 6213

STEP3:对datafile 2 先进行restore,然后进行recover

RMAN> restore datafile 2;

Starting restore at 25-APR-13
using channel ORA_DISK_1

channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00002 to /orashare/oradata/xxxx/sysaux01.dbf
channel ORA_DISK_1: reading from backup piece /orashare/rman/dbf/full__xxxx_516_1_g4o7jf9e.bak
channel ORA_DISK_1: piece handle=/orashare/rman/dbf/full__xxxx_516_1_g4o7jf9e.bak tag=TAGxxxx0421T002414
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:25
Finished restore at 25-APR-13

RMAN> recover datafile 2;

Starting recover at 25-APR-13
using channel ORA_DISK_1

starting media recovery

archived log for thread 1 with sequence 5189 is already on disk as file /orashare/arch/1_5189_729236380.dbf
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 04/25/xxxx 10:21:19
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of archived log for thread 1 with sequence 5188 and starting SCN of 12048143615401 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 5187 and starting SCN of 12048143582799 found to restore

STEP4:依旧报错,提示归档日志缺失.到归档日志目录下检查,发现确实丢失了归档日志.
rw-r----- 1 oracle oinstall 55921664 xxxx-04-22 19:52 1_5190_729236380.dbf
-rw-r----- 1 oracle oinstall 63299072 xxxx-04-22 00:05 1_5189_729236380.dbf
-rw-r----- 1 oracle oinstall 56599040 2011-03-17 22:00 1_159_729236380.dbf
-rw-r----- 1 oracle oinstall 64499200 2011-03-17 02:21 1_158_729236380.dbf
-rw-r----- 1 oracle oinstall 56789504 2011-03-16 20:00 1_157_729236380.dbf


STEP5:情况紧急,急于要恢复生产数据库.因此决定进行不完全恢复了.
检查RMAN备份,发现21号有完整的全库备份.决定将数据库恢复到21号.

run {
allocate channel d1 diveci type disk;
allocate channel d2 diveci type disk;
allocate channel d3 diveci type disk;
allocate channel d4 diveci type disk;
set until time="to_date('20110824 11:33:00','yyyymmdd hh24:mi:ss')";
restore database;
recover database;
release channel d1;
release channel d2;
release channel d3;
release channel d4;
}

============================== 出错信息 Start =============================
RMAN>  run {
2> allocate channel d1 device type disk;
3> allocate channel d2 device type disk;
4> allocate channel d3 device type disk;
5> allocate channel d4 device type disk;
6> set until time="to_date('xxxx0421 00:30:00','yyyymmdd hh24:mi:ss')";
7> restore database;
8> recover database;
9> release channel d1;
10> release channel d2;
11> release channel d3;
12> release channel d4;
13> }

using target database control file instead of recovery catalog
allocated channel: d1
channel d1: SID=105 device type=DISK

allocated channel: d2
channel d2: SID=109 device type=DISK

allocated channel: d3
channel d3: SID=113 device type=DISK

allocated channel: d4
channel d4: SID=117 device type=DISK

executing command: SET until clause

Starting restore at 25-APR-13

creating datafile file number=5 name=/orashare/oradata/xxxx/xxxx_dat.dbf
skipping datafile 2; already restored to file /orashare/oradata/xxxx/sysaux01.dbf
channel d1: starting datafile backup set restore
channel d1: specifying datafile(s) to restore from backup set
channel d1: restoring datafile 00003 to /orashare/oradata/xxxx/undotbs01.dbf
channel d1: restoring datafile 00004 to /orashare/oradata/xxxx/users01.dbf
channel d1: restoring datafile 00006 to /orashare/oradata/xxxx/xxxx_idx.dbf
channel d1: reading from backup piece /orashare/rman/dbf/full__xxxx_517_1_g5o7jf9e.bak
channel d2: starting datafile backup set restore
channel d2: specifying datafile(s) to restore from backup set
channel d2: restoring datafile 00001 to /orashare/oradata/xxxx/system01.dbf
channel d2: reading from backup piece /orashare/rman/dbf/full__xxxx_516_1_g4o7jf9e.bak
channel d2: piece handle=/orashare/rman/dbf/full__xxxx_516_1_g4o7jf9e.bak tag=TAGxxxx0421T002414
channel d2: restored backup piece 1
channel d2: restore complete, elapsed time: 00:00:35
channel d1: piece handle=/orashare/rman/dbf/full__xxxx_517_1_g5o7jf9e.bak tag=TAGxxxx0421T002414
channel d1: restored backup piece 1
channel d1: restore complete, elapsed time: 00:03:05
Finished restore at 25-APR-13

Starting recover at 25-APR-13

starting media recovery

released channel: d1
released channel: d2
released channel: d3
released channel: d4
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 04/25/xxxx 10:55:04
RMAN-06053: unable to perform media recovery because of missing log
RMAN-06025: no backup of archived log for thread 1 with sequence 5187 and starting SCN of 12048143582799 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3801 and starting SCN of 12048065258900 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3800 and starting SCN of 12048065251078 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3799 and starting SCN of 12048065182200 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3798 and starting SCN of 12048065112938 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3797 and starting SCN of 12048065112929 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3796 and starting SCN of 12048065107549 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3795 and starting SCN of 12048065043386 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3794 and starting SCN of 12048065009336 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3793 and starting SCN of 12048065008704 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3792 and starting SCN of 12048065008268 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3791 and starting SCN of 12048064994959 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3790 and starting SCN of 12048064994950 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3789 and starting SCN of 12048064964288 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3788 and starting SCN of 12048064894734 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3787 and starting SCN of 12048064824419 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3786 and starting SCN of 12048064755396 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3785 and starting SCN of 12048064755387 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3784 and starting SCN of 12048064718440 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3783 and starting SCN of 12048064662704 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3782 and starting SCN of 12048064642409 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3781 and starting SCN of 12048064642392 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3780 and starting SCN of 12048064635162 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3779 and starting SCN of 12048064590496 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3778 and starting SCN of 12048064526766 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3777 and starting SCN of 12048064458053 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3776 and starting SCN of 12048064387801 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3775 and starting SCN of 12048064387792 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3774 and starting SCN of 12048064362441 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3773 and starting SCN of 12048064298524 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3772 and starting SCN of 12048064297967 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3771 and starting SCN of 12048064297030 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3770 and starting SCN of 12048064285555 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3769 and starting SCN of 12048064285544 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3768 and starting SCN of 12048064268533 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3767 and starting SCN of 12048064222461 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3766 and starting SCN of 12048064153047 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3765 and starting SCN of 12048064082797 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3764 and starting SCN of 12048064082788 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3763 and starting SCN of 12048064049255 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3762 and starting SCN of 12048063989114 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3761 and starting SCN of 12048063978847 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3760 and starting SCN of 12048063978838 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3759 and starting SCN of 12048063965920 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3758 and starting SCN of 12048063895135 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3757 and starting SCN of 12048063826614 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3756 and starting SCN of 12048063756001 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3755 and starting SCN of 12048063755916 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3754 and starting SCN of 12048063728012 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3753 and starting SCN of 12048063665635 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3752 and starting SCN of 12048063665027 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3751 and starting SCN of 12048063653576 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3750 and starting SCN of 12048063653561 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3749 and starting SCN of 12048063627543 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3748 and starting SCN of 12048063558054 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3747 and starting SCN of 12048063486582 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3746 and starting SCN of 12048063486573 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3745 and starting SCN of 12048063464494 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3744 and starting SCN of 12048063400169 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3743 and starting SCN of 12048063388626 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3742 and starting SCN of 12048063388617 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3741 and starting SCN of 12048063382424 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3740 and starting SCN of 12048063312205 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3739 and starting SCN of 12048063240678 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3738 and starting SCN of 12048063240669 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3737 and starting SCN of 12048063237926 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3736 and starting SCN of 12048063171353 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3735 and starting SCN of 12048063138267 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3734 and starting SCN of 12048063137825 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3733 and starting SCN of 12048063136910 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3732 and starting SCN of 12048063125614 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3731 and starting SCN of 12048063125605 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3730 and starting SCN of 12048063105094 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3729 and starting SCN of 12048063033270 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3728 and starting SCN of 12048062965311 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3727 and starting SCN of 12048062965302 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3726 and starting SCN of 12048062946092 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3725 and starting SCN of 12048062890907 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3724 and starting SCN of 12048062839415 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3723 and starting SCN of 12048062839052 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3722 and starting SCN of 12048062838184 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3721 and starting SCN of 12048062811457 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3720 and starting SCN of 12048062811448 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3719 and starting SCN of 12048062766437 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3718 and starting SCN of 12048062722173 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3717 and starting SCN of 12048062649185 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3716 and starting SCN of 12048062576546 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3715 and starting SCN of 12048062504973 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3714 and starting SCN of 12048062504931 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3713 and starting SCN of 12048062447176 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3712 and starting SCN of 12048062381328 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3711 and starting SCN of 12048062367266 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3710 and starting SCN of 12048062367257 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3709 and starting SCN of 12048062353116 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3708 and starting SCN of 12048062298510 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3707 and starting SCN of 12048062226266 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3706 and starting SCN of 12048062154595 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3705 and starting SCN of 12048062154586 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3704 and starting SCN of 12048062102798 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3703 and starting SCN of 12048062039400 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3702 and starting SCN of 12048062039064 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3701 and starting SCN of 12048062038485 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3700 and starting SCN of 12048062026498 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3699 and starting SCN of 12048062026489 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3698 and starting SCN of 12048062016323 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3697 and starting SCN of 12048061968083 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3696 and starting SCN of 12048061897496 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3695 and starting SCN of 12048061825744 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3694 and starting SCN of 12048061825735 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3693 and starting SCN of 12048061825173 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3692 and starting SCN of 12048061755178 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3691 and starting SCN of 12048061682443 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3690 and starting SCN of 12048061667006 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3689 and starting SCN of 12048061666997 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3688 and starting SCN of 12048061650007 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3687 and starting SCN of 12048061612123 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3686 and starting SCN of 12048061541433 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3685 and starting SCN of 12048061471285 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3684 and starting SCN of 12048061471276 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3683 and starting SCN of 12048061467256 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3682 and starting SCN of 12048061398513 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3681 and starting SCN of 12048061334395 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3680 and starting SCN of 12048061334086 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3679 and starting SCN of 12048061331927 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3678 and starting SCN of 12048061300722 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3677 and starting SCN of 12048061300713 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3676 and starting SCN of 12048061290258 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3675 and starting SCN of 12048061248774 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3674 and starting SCN of 12048061174673 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3673 and starting SCN of 12048061105024 found to restore
RMAN-00567: Recovery Manager could not print some error messages
============================== 出错信息 End   =============================

再次尝试:

RMAN> run {
2> allocate channel d1 device type disk;
3> set until time="to_date('xxxx0421 00:30:00','yyyymmdd hh24:mi:ss')";
4> recover database;
5> release channel d1;
6> }

allocated channel: d1
channel d1: SID=105 device type=DISK

executing command: SET until clause

Starting recover at 25-APR-13

starting media recovery
media recovery failed
released channel: d1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 04/25/xxxx 11:01:22
RMAN-11003: failure during parse/execution of SQL statement: alter database recover if needed
 start until time 'APR 21 xxxx 00:30:00'
ORA-00275: media recovery has already been started


继续进行不完整恢复:

SQL> recover database until cancel;
ORA-00279: change 420560 generated at 03/02/2011 11:58:59 needed for thread 1
ORA-00289: suggestion : /orashare/arch/1_16_729236380.dbf
ORA-00280: change 420560 for thread 1 is in sequence #16


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-10879: error signaled in parallel recovery slave
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: '/orashare/oradata/xxxx/system01.dbf'
--------------------------------------------------------------------------------------
SQL> recover database until cancel;
ORA-00279: change 420560 generated at 03/02/2011 11:58:59 needed for thread 1
ORA-00289: suggestion : /orashare/arch/1_16_729236380.dbf
ORA-00280: change 420560 for thread 1 is in sequence #16


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
auto

ORA-00279: change 12032046604724 generated at 03/09/2011 10:43:51 needed for
thread 1
ORA-00289: suggestion : /orashare/arch/1_28_729236380.dbf
ORA-00280: change 12032046604724 for thread 1 is in sequence #28
ORA-00278: log file '/orashare/arch/1_27_729236380.dbf' no longer needed for
this recovery


ORA-00279: change 12032046645630 generated at 03/09/2011 10:44:06 needed for
thread 1
ORA-00289: suggestion : /orashare/arch/1_29_729236380.dbf
ORA-00280: change 12032046645630 for thread 1 is in sequence #29
ORA-00278: log file '/orashare/arch/1_28_729236380.dbf' no longer needed for
this recovery

ORA-00308: cannot open archived log '/orashare/arch/1_160_729236380.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

ORA-10879: error signaled in parallel recovery slave
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: '/orashare/oradata/xxxx/system01.dbf'

强行打开数据库.依旧失败.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01152: file 1 was not restored from a sufficiently old backup
ORA-01110: data file 1: '/orashare/oradata/xxxx/system01.dbf'
SQL> select open_mode from v$database;

OPEN_MODE
--------------------
MOUNTED

STEP6 不完成恢复也失败了.分析原因:归档日志缺失导致.下一步从RMAN中恢复归档日志.

RMAN> restore archivelog until sequence 5190;

Starting restore at 25-APR-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=105 device type=DISK

archived log for thread 1 with sequence 5189 is already on disk as file /orashare/arch/1_5189_729236380.dbf
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 04/25/xxxx 12:17:46
RMAN-06026: some targets not found - aborting restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3801 and starting SCN of 12048065258900 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3800 and starting SCN of 12048065251078 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3799 and starting SCN of 12048065182200 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3798 and starting SCN of 12048065112938 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3797 and starting SCN of 12048065112929 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3796 and starting SCN of 12048065107549 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3795 and starting SCN of 12048065043386 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3794 and starting SCN of 12048065009336 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3793 and starting SCN of 12048065008704 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3792 and starting SCN of 12048065008268 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3791 and starting SCN of 12048064994959 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3790 and starting SCN of 12048064994950 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3789 and starting SCN of 12048064964288 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3788 and starting SCN of 12048064894734 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 3787 and starting SCN of 12048064824419 found to restore

=======================================================================================
restore archivelog from sequence 5180 until sequence 5190;
channel ORA_DISK_1: reading from backup piece /orashare/rman/archivelog/archive_813327124_521
channel ORA_DISK_1: ORA-19870: error while restoring backup piece /orashare/rman/archivelog/archive_813327124_521
ORA-19501: read error on file "/orashare/rman/archivelog/archive_813327124_521", block number 1 (block size=512)
ORA-27072: File I/O error
Linux-x86_64 Error: 25: Inappropriate ioctl for device
Additional information: 4
Additional information: 1
Additional information: 60928

failover to previous backup

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 04/25/xxxx 12:22:17
RMAN-06026: some targets not found - aborting restore
RMAN-06025: no backup of archived log for thread 1 with sequence 5190 and starting SCN of 12048143585323 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 5188 and starting SCN of 12048143579197 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 5187 and starting SCN of 12048143561230 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 5186 and starting SCN of 12048143530508 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 5185 and starting SCN of 12048143463907 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 5184 and starting SCN of 12048143402088 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 5183 and starting SCN of 12048143367408 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 5181 and starting SCN of 12048143308022 found to restore
RMAN-06025: no backup of archived log for thread 1 with sequence 5180 and starting SCN of 12048143239962 found to restore
很糟糕!备份的归档日志也丢失了.

通过查询Checkpoint_change#进行不完全恢复。

SQL> SELECT Checkpoint_change# from v$database;

CHECKPOINT_CHANGE#
------------------
        1.2048E+13

SQL>  SELECT MAX(Checkpoint_change#) FROM v$datafile_header;

MAX(CHECKPOINT_CHANGE#)
-----------------------
             1.2048E+13
RMAN>  recover database until scn 12048000000000;

Starting recover at 25-APR-13
using channel ORA_DISK_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 04/25/xxxx 11:36:23
RMAN-06556: datafile 1 must be restored from backup older than SCN 12048000000000

 

STEP7 终于发现真正原因,存储有问题了.
现象1:
channel ORA_DISK_1: reading from backup piece /orashare/rman/archivelog/archive_813327124_521
channel ORA_DISK_1: ORA-19870: error while restoring backup piece /orashare/rman/archivelog/archive_813327124_521
ORA-19501: read error on file "/orashare/rman/archivelog/archive_813327124_521", block number 1 (block size=512)
ORA-27072: File I/O error
Linux-x86_64 Error: 25: Inappropriate ioctl for device
Additional information: 4
Additional information: 1
Additional information: 60928
现象2: 归档日志目录下,ls 报错:


STEP8 现场人员检查存储无任何告警,因此推断原因是:文件系统损坏了,只能抱着心理进行fsck修复文件系统了.
fsck 修复过程日志在附件.
这里把结果粘贴一下
修复命令:fsck.ext3 -y -f -v /dev/xxxxxxxx
修复结果:
/dev/xxxxxxxx: ***** FILE SYSTEM WAS MODIFIED *****

     264 inodes used (0.00%)
     188 non-contiguous inodes (71.2%)
         # of inodes with ind/dind/tind blocks: 239/229/5
54959625 blocks used (9.80%)
       0 bad blocks
      30 large files

     243 regular files
      11 directories
       0 character device files
       0 block device files
       0 fifos
       0 links
       1 symbolic link (1 fast symbolic link)
       0 sockets
--------
     245 files

结果很糟糕,修复后很多文件都不见了.归档日志大部分不见了,RMAN备份的归档日志也不见了.

STEP9 最后一步,修改数据库隐含参数,强行打开数据库

SQL> show parameter spfile;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
spfile                               string      /opt/oracle/product/11gR1/db/dbs/spfilexxxx.ora

SQL> alter system set "_allow_resetlogs_corruption"=true scope=spfile;


SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01248: file 6 was created in the future of incomplete recovery
ORA-01110: data file 6: '/orashare/oradata/xxxx/xxxx_idx.dbf'


SQL> recover database using backup controlfile until cancel;
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
cancel
ORA-10879: error signaled in parallel recovery slave
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below
ORA-01194: file 5 needs more recovery to be consistent
ORA-01110: data file 5: '/orashare/oradata/xxxx/xxxx_dat.dbf'
alter database open resetlogs;
SQL> alter database open resetlogs;

Database altered.

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ WRITE

谢天谢地,数据库终于被强行打开了.使用了隐含参数,数据库将变得很不稳定.

STEP10 exp/imp 备份数据.
别高兴太早.赶紧把数据exp出来,imp到测试库上面去.

exp system/password file='/orashare/xxxx.dmp' log='/orashare/xxxx.log' owner=xxxx

STEP11 把应用程序切换到测试库
STEP12 对生产库进行重建,重新建立表空间和imp数据文件

3.总结经验教训
(1)本次故障罪魁祸首是suse文件系统损坏,引起部分文件无法读取.
教训: 使用fsck务必慎重,修复的结果可能是料想不到的.本次修复的结果就是造成了部分文件丢失.
         异常关机会引起文件系统损坏.

(2)通过RMAN恢复的时候,发现RMAN的备份也有问题了.
1是每天的RMAN备份必须进行检查,否则某些备份失效也无人知晓;
2是RMAN备份跟数据库在同一个存储上,存储出现问题了,数据库和备份都有文件丢失.那就等于备份失效了.因此必须RMAN备份必须异机啊.

(3)skybility的HA软件,有时候居然可以主从同时挂载/orashare.
数据库的共享存储是通过SAN裸设备进行存储的。 存储没有无操作系统,对写入的数据不进行锁定.同时挂载/orashare的情况会引起读写冲突现象的.
个人感觉HA软件不是特别好使。

(4)xxxx的底层存储划分不合理.划分一个大盘2T.出现问题就整个2T不能用了.再者fsck修复2T的大硬盘也是相当长时间啊.

(5)底层存储出现问题的话,我们束手无策.不知道从哪里着手.

 

转载于:https://www.cnblogs.com/tango-dg/archive/2013/04/26/3045047.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值