最近某客户的一套核心数据库由于存储问题导致清掉Cache之后无法启动。首先我们来看看数据库在启动的时候报什么错误:
Thu Sep 21 19:35:55 2017
WARNING: Write Failed. group:1 disk:3 AU:53436 offset:95744 size:1024
Errors in file /u01/app/oracle/diag/rdbms/ods/xxx2/trace/xxx2_lgwr_14636.trc:
ORA-15080: synchronous I/O operation to a disk failed
WARNING: failed to write mirror side 1 of virtual extent 43 logical extent 0 of file 265 in group 1 on disk 3 allocation unit 53436
Errors in file /u01/app/oracle/diag/rdbms/xxx/xxx2/trace/xxx2_lgwr_14636.trc:
ORA-00345: redo log write error block 88251 count 2
ORA-00312: online log 3 thread 2: '+DATA/xxx/onlinelog/group_3.265.816035881'
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk
Errors in file /u01/app/oracle/diag/rdbms/xxx/xxx2/trace/xxx2_lgwr_14636.trc:
ORA-00340: IO error processing online log 3 of thread 2
ORA-00345: redo log write error block 88251 count 2
ORA-00312: online log 3 thread 2: '+DATA/xxx/onlinelog/group_3.265.816035881'
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk
LGWR (ospid: 14636): terminating the instance due to error 340
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
ThuSep2119:35:552017
WARNING:WriteFailed.group:1disk:3AU:53436offset:95744size:1024
Errorsinfile/u01/app/oracle/diag/rdbms/ods/xxx2/trace/xxx2_lgwr_14636.trc:
ORA-15080:synchronousI/Ooperationtoadiskfailed
WARNING:failedtowritemirrorside1ofvirtualextent43logicalextent0offile265ingroup1ondisk3allocationunit53436
Errorsinfile/u01/app/oracle/diag/rdbms/xxx/xxx2/trace/xxx2_lgwr_14636.trc:
ORA-00345:redologwriteerrorblock88251count2
ORA-00312:onlinelog3thread2:'+DATA/xxx/onlinelog/group_3.265.816035881'
ORA-15081:failedtosubmitanI/Ooperationtoadisk
ORA-15081:failedtosubmitanI/Ooperationtoadisk
Errorsinfile/u01/app/oracle/diag/rdbms/xxx/xxx2/trace/xxx2_lgwr_14636.trc:
ORA-00340:IOerrorprocessingonlinelog3ofthread2
ORA-00345:redologwriteerrorblock88251count2
ORA-00312:onlinelog3thread2:'+DATA/xxx/onlinelog/group_3.265.816035881'
ORA-15081:failedtosubmitanI/Ooperationtoadisk
ORA-15081:failedtosubmitanI/Ooperationtoadisk
LGWR(ospid:14636):terminatingtheinstanceduetoerror340
错误并不复杂。可以看到Oracle这里已经无法正常写Redo logfile了。由于这套数据库是非归档,只有逻辑备份,因此即使恢复成功也面临数据丢失的可能性。首先我在尝试进行恢复时,发现居然无法mount数据库,在mount过程中实例被直接终止了,感觉非常奇怪。也没有报非常明显的错误。mount过程出错,那么无疑是controlfile存在异常;由于没有controlfile备份,因此这里先手工重建控制文件,如下是脚本:
CREATE CONTROLFILE REUSE DATABASE "XXX" RESETLOGS NOARCHIVELOG
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
MAXINSTANCES 8
MAXLOGHISTORY 584
LOGFILE
GROUP 1 '+data/ods/ONLINELOG/group_1.257.816033845' SIZE 500M BLOCKSIZE 512,
GROUP 2 '+data/xxx/ONLINELOG/group_2.258.816033845' SIZE 500M BLOCKSIZE 512,
GROUP 3 '+data/xxx/ONLINELOG/group_3.265.816035881' SIZE 500M BLOCKSIZE 512,
GROUP 4 '+data/xxx/ONLINELOG/group_4.266.816035883' SIZE 500M BLOCKSIZE 512,
GROUP 5 '+data/xxx/ONLINELOG/group_5.275.816036347' SIZE 500M BLOCKSIZE 512,
GROUP 6 '+data/xxx/ONLINELOG/group_6.277.816036359' SIZE 500M BLOCKSIZE 512
DATAFILE
'+DATA/xxx/datafile/system.259.816033847',
'+DATA/xxx/datafile/sysaux.260.816033849',
'+DATA/xxx/datafile/undotbs1.261.816033851',
'+DATA/xxx/datafile/undotbs2.263.816033859',
'+DATA/xxx/datafile/users.264.816033859',
'+DATA/xxx/datafile/tbs_tbdata.278.816036381',
'+DATA/xxx/datafile/tbs_omdata.283.816036779',
'+DATA/xxx/datafile/tbs_cmdata.284.816036813',
'+DATA/xxx/datafile/tbs_dmdata.285.816036857',
'+DATA/xxx/datafile/tbs_dbetl.286.816036893',
'+DATA/xxx/datafile/tbs_schedule.287.816036909',
'+DATA/xxx/datafile/tbs_meast.288.816036915',
'+DATA/xxx/datafile/tbs_m1104.289.816036939',
'+DATA/xxx/datafile/tbs_mpisa.293.842192725',
'+DATA/xxx/datafile/tbs_mpfsc',
'+DATA/xxx/datafile/tbs_msafe',
'+DATA/xxx/datafile/tbs_mecsp',
'+DATA/xxx/datafile/tbs_mpbss',
'+DATA/xxx/datafile/tbs_mpbfc',
'+DATA/xxx/datafile/idx_cmdata'
CHARACTER SET ZHS16GBK;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
CREATECONTROLFILEREUSEDATABASE"XXX"RESETLOGSNOARCHIVELOG
MAXLOGFILES16
MAXLOGMEMBERS3
MAXDATAFILES100
MAXINSTANCES8
MAXLOGHISTORY584
LOGFILE
GROUP1'+data/ods/ONLINELOG/group_1.257.816033845'SIZE500MBLOCKSIZE512,
GROUP2'+data/xxx/ONLINELOG/group_2.258.816033845'SIZE500MBLOCKSIZE512,
GROUP3'+data/xxx/ONLINELOG/group_3.265.816035881'SIZE500MBLOCKSIZE512,
GROUP4'+data/xxx/ONLINELOG/group_4.266.816035883'SIZE500MBLOCKSIZE512,
GROUP5'+data/xxx/ONLINELOG/group_5.275.816036347'SIZE500MBLOCKSIZE512,
GROUP6'+data/xxx/ONLINELOG/group_6.277.816036359'SIZE500MBLOCKSIZE512
DATAFILE
'+DATA/xxx/datafile/system.259.816033847',
'+DATA/xxx/datafile/sysaux.260.816033849',
'+DATA/xxx/datafile/undotbs1.261.816033851',
'+DATA/xxx/datafile/undotbs2.263.816033859',
'+DATA/xxx/datafile/users.264.816033859',
'+DATA/xxx/datafile/tbs_tbdata.278.816036381',
'+DATA/xxx/datafile/tbs_omdata.283.816036779',
'+DATA/xxx/datafile/tbs_cmdata.284.816036813',
'+DATA/xxx/datafile/tbs_dmdata.285.816036857',
'+DATA/xxx/datafile/tbs_dbetl.286.816036893',
'+DATA/xxx/datafile/tbs_schedule.287.816036909',
'+DATA/xxx/datafile/tbs_meast.288.816036915',
'+DATA/xxx/datafile/tbs_m1104.289.816036939',
'+DATA/xxx/datafile/tbs_mpisa.293.842192725',
'+DATA/xxx/datafile/tbs_mpfsc',
'+DATA/xxx/datafile/tbs_msafe',
'+DATA/xxx/datafile/tbs_mecsp',
'+DATA/xxx/datafile/tbs_mpbss',
'+DATA/xxx/datafile/tbs_mpbfc',
'+DATA/xxx/datafile/idx_cmdata'
CHARACTERSETZHS16GBK;
重建完毕后。其实这里我首先尝试了进行noresetlogs创建,但是发现报错:
ORA-00600: internal error code, arguments: [2762], [4294967295], [1024000], [+DATA/xxx/onlinelog/group_3.265.816035881], [], [], [], [], [], [], [],
[]
1
2
ORA-00600:internalerrorcode,arguments:[2762],[4294967295],[1024000],[+DATA/xxx/onlinelog/group_3.265.816035881],[],[],[],[],[],[],[],
[]
很明显,Redo logfile有问题;看来还是只能Resetlogs方式创建。创建完毕之后,尝试进行了recover database using backup controlfile until cancel恢复操作;然后通过隐含参数强制open发现还是有如下错误:
Fri Sep 22 13:00:10 2017
SMON: enabling cache recovery
Instance recovery: looking for dead threads
Instance recovery: lock domain invalid but no dead threads
Errors in file /u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_ora_1593.trc (incident=120288):
ORA-00600: internal error code, arguments: [2662], [3], [3158008565], [3], [3159337219], [12582976], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xxx/xxx1/incident/incdir_120288/xxx1_ora_1593_i120288.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_ora_1593.trc:
ORA-00600: internal error code, arguments: [2662], [3], [3158008565], [3], [3159337219], [12582976], [], [], [], [], [], []
Errors in file /u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_ora_1593.trc:
ORA-00600: internal error code, arguments: [2662], [3], [3158008565], [3], [3159337219], [12582976], [], [], [], [], [], []
Error 600 happened during db open, shutting down database
USER (ospid: 1593): terminating the instance due to error 600
Instance terminated by USER, pid = 1593
ORA-1092 signalled during: alter database open resetlogs...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
FriSep2213:00:102017
SMON:enablingcacherecovery
Instancerecovery:lookingfordeadthreads
Instancerecovery:lockdomaininvalidbutnodeadthreads
Errorsinfile/u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_ora_1593.trc(incident=120288):
ORA-00600:internalerrorcode,arguments:[2662],[3],[3158008565],[3],[3159337219],[12582976],[],[],[],[],[],[]
Incidentdetailsin:/u01/app/oracle/diag/rdbms/xxx/xxx1/incident/incdir_120288/xxx1_ora_1593_i120288.trc
UseADRCIorSupportWorkbenchtopackagetheincident.
SeeNote411.1atMyOracleSupportforerrorandpackagingdetails.
Errorsinfile/u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_ora_1593.trc:
ORA-00600:internalerrorcode,arguments:[2662],[3],[3158008565],[3],[3159337219],[12582976],[],[],[],[],[],[]
Errorsinfile/u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_ora_1593.trc:
ORA-00600:internalerrorcode,arguments:[2662],[3],[3158008565],[3],[3159337219],[12582976],[],[],[],[],[],[]
Error600happenedduringdbopen,shuttingdowndatabase
USER(ospid:1593):terminatingtheinstanceduetoerror600
InstanceterminatedbyUSER,pid=1593
ORA-1092signalledduring:alterdatabaseopenresetlogs...
这是非常经典的错误了;由于这是scn的问题;而且数据库版本为11.2.0.3.0,未安装任何psu。因此这里是可以直接推进scn的。
直接通过10015 event来推进数据库的scn;另外由于是异常关机,那么这里Undo必然也无法进行正常恢复;因此同时设置undo_management参数为manual,并同时设置10015 event:
alter session set events ‘10015 trace name adjust_scn level 2’;
顺利打开了数据库。打开数据库之后立刻重建数据库Undo和temp,如下:
create undo tablespace undo1 datafile '+data' size 2048m;
create undo tablespace undo2 datafile '+data' size 2048m;
drop tablespace undotbs1 including contents and datafiles;
drop tablespace undotbs2 including contents and datafiles;
alter tablespace temp add tempfile '+data/rep/tempfile/TEMP.276.816036349' reuse;
alter tablespace temp add tempfile '+data/ods/tempfile/temp1607202' reuse;
alter tablespace temp add tempfile '+data/ods/tempfile/temp1607203' reuse;
1
2
3
4
5
6
7
createundotablespaceundo1datafile'+data'size2048m;
createundotablespaceundo2datafile'+data'size2048m;
droptablespaceundotbs1includingcontentsanddatafiles;
droptablespaceundotbs2includingcontentsanddatafiles;
altertablespacetempaddtempfile'+data/rep/tempfile/TEMP.276.816036349'reuse;
altertablespacetempaddtempfile'+data/ods/tempfile/temp1607202'reuse;
altertablespacetempaddtempfile'+data/ods/tempfile/temp1607203'reuse;
再次重启数据库之后,发现alert log仍然有一些错误;这是在所难免的。如下所示:
ORA-00604: error occurred at recursive SQL level 1
ORA-08102: index key not found, obj# 290, file 1, block 1029 (2)
ORA-12012: error on auto execute of job 4001
ORA-08102: index key not found, obj# 290, file 1, block 1029 (2)
Errors in file /u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_j003_8160.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-08102: index key not found, obj# 290, file 1, block 1029 (2)
ORA-12012: error on auto execute of job 4002
ORA-08102: index key not found, obj# 290, file 1, block 1029 (2)
Errors in file /u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_ora_8043.trc:
Fri Sep 22 13:30:40 2017
Errors in file /u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_ora_8043.trc:
Fri Sep 22 13:30:42 2017
Dumping diagnostic data in directory=[cdmp_20170922133042], requested by (instance=1, osid=8043), summary=[abnormal process termination].
Fri Sep 22 13:31:59 2017
Starting background process SMCO
Fri Sep 22 13:31:59 2017
SMCO started with pid=35, OS id=9375
Fri Sep 22 13:37:54 2017
Errors in file /u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_m000_10623.trc (incident=144379):
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xxx/xxx1/incident/incdir_144379/xxx1_m000_10623_i144379.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri Sep 22 13:37:55 2017
Dumping diagnostic data in directory=[cdmp_20170922133755], requested by (instance=1, osid=10623 (M000)), summary=[incident=144379].
Errors in file /u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_m000_10623.trc (incident=144380):
ORA-00600: internal error code, arguments: [kewrose_1], [600], [ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/xxx/xxx1/incident/incdir_144380/xxx1_m000_10623_i144380.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_m000_10623.trc:
ORA-00600: internal error code, arguments: [kewrose_1], [600], [ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], [], [], [], [], []
], [], [], [], [], [], [], [], [], []
Dumping diagnostic data in directory=[cdmp_20170922133757], requested by (instance=1, osid=10623 (M000)), summary=[incident=144380].
Fri Sep 22 13:37:58 2017
Sweep [inc][144380]: completed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
ORA-00604:erroroccurredatrecursiveSQLlevel1
ORA-08102:indexkeynotfound,obj# 290, file 1, block 1029 (2)
ORA-12012:erroronautoexecuteofjob4001
ORA-08102:indexkeynotfound,obj# 290, file 1, block 1029 (2)
Errorsinfile/u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_j003_8160.trc:
ORA-00604:erroroccurredatrecursiveSQLlevel1
ORA-08102:indexkeynotfound,obj# 290, file 1, block 1029 (2)
ORA-12012:erroronautoexecuteofjob4002
ORA-08102:indexkeynotfound,obj# 290, file 1, block 1029 (2)
Errorsinfile/u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_ora_8043.trc:
FriSep2213:30:402017
Errorsinfile/u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_ora_8043.trc:
FriSep2213:30:422017
Dumpingdiagnosticdataindirectory=[cdmp_20170922133042],requestedby(instance=1,osid=8043),summary=[abnormalprocesstermination].
FriSep2213:31:592017
StartingbackgroundprocessSMCO
FriSep2213:31:592017
SMCOstartedwithpid=35,OSid=9375
FriSep2213:37:542017
Errorsinfile/u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_m000_10623.trc(incident=144379):
ORA-00600:internalerrorcode,arguments:[kdsgrp1],[],[],[],[],[],[],[],[],[],[],[]
Incidentdetailsin:/u01/app/oracle/diag/rdbms/xxx/xxx1/incident/incdir_144379/xxx1_m000_10623_i144379.trc
UseADRCIorSupportWorkbenchtopackagetheincident.
SeeNote411.1atMyOracleSupportforerrorandpackagingdetails.
FriSep2213:37:552017
Dumpingdiagnosticdataindirectory=[cdmp_20170922133755],requestedby(instance=1,osid=10623(M000)),summary=[incident=144379].
Errorsinfile/u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_m000_10623.trc(incident=144380):
ORA-00600:internalerrorcode,arguments:[kewrose_1],[600],[ORA-00600:internalerrorcode,arguments:[kdsgrp1],[],[],[],[],[],[],[],[],[],[],[]
],[],[],[],[],[],[],[],[],[]
Incidentdetailsin:/u01/app/oracle/diag/rdbms/xxx/xxx1/incident/incdir_144380/xxx1_m000_10623_i144380.trc
UseADRCIorSupportWorkbenchtopackagetheincident.
SeeNote411.1atMyOracleSupportforerrorandpackagingdetails.
Errorsinfile/u01/app/oracle/diag/rdbms/xxx/xxx1/trace/xxx1_m000_10623.trc:
ORA-00600:internalerrorcode,arguments:[kewrose_1],[600],[ORA-00600:internalerrorcode,arguments:[kdsgrp1],[],[],[],[],[],[],[],[],[],[],[]
],[],[],[],[],[],[],[],[],[]
Dumpingdiagnosticdataindirectory=[cdmp_20170922133757],requestedby(instance=1,osid=10623(M000)),summary=[incident=144380].
FriSep2213:37:582017
Sweep[inc][144380]:completed
实际上当时在进行恢复时,我手工处理掉了obj# 290;但是进一步检查发现obj$,col_usage$ ,i_obj4# 都存在问题。而且不一致的记录还比较多:
select /*+ index(t i_obj4) */ DATAOBJ#,type#,owner# from obj$ t
minus
select /*+ full(t1) */ DATAOBJ#,type#,owner# from obj$ t1;
DATAOBJ# TYPE# OWNER#
---------- ---------- ----------
1451154 2 90
1589557 1 92
1589558 2 92
1589573 2 100
1589574 2 100
1589575 2 100
1589576 2 100
1589577 2 100
1589578 2 100
1589579 2 100
1589580 2 100
1589581 2 100
1589582 2 100
1589583 2 100
1589584 2 100
1589585 2 100
1589586 2 100
1589587 2 100
1589588 2 100
1589589 2 100
1589590 2 100
1589591 2 100
1589592 2 100
1589593 2 100
1589594 2 100
1589595 2 100
1589596 2 100
1589597 2 100
1589598 2 100
1589599 2 100
1589600 2 100
1589601 2 100
1589602 2 100
1589603 2 100
1589604 2 100
1589607 2 100
1589612 0 0
select /*+ full(t1) */ DATAOBJ#,type#,owner# from obj$ t1
minus
select /*+ index(t i_obj4) */ DATAOBJ#,type#,owner# from obj$ t ;
DATAOBJ# TYPE# OWNER#
---------- ---------- ----------
1587659 2 100
1587660 2 100
1587661 2 100
1587662 2 100
1587663 2 100
1587664 2 100
1587665 2 100
1587666 2 100
1587667 2 100
1587668 2 100
1587669 2 100
1587670 2 100
1587671 2 100
1587672 2 100
1587673 2 100
1587674 2 100
1587675 2 100
1587676 2 100
1587677 2 100
1587678 2 100
1587679 2 100
1587680 2 100
1587681 2 100
1587682 2 100
1587683 2 100
1587684 2 100
1587685 2 100
1587687 2 100
1587688 2 100
1587689 2 100
1587690 2 100
1587691 2 100
1587692 2 100
1587695 1 92
1587696 2 92
1587716 2 90
1587717 2 90
1587718 2 90
1587719 2 90
1587720 2 90
1587721 2 90
1587722 2 90
1587723 2 90
1587724 2 90
1587725 2 90
1587726 2 90
1587727 2 90
1587728 2 90
1587729 2 90
1587730 2 90
1587732 2 90
1587733 2 90
1589527 0 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
select/*+ index(t i_obj4) */DATAOBJ#,type#,owner# from obj$ t
minus
select/*+ full(t1) */DATAOBJ#,type#,owner# from obj$ t1;
DATAOBJ# TYPE# OWNER#
------------------------------
1451154290
1589557192
1589558292
15895732100
15895742100
15895752100
15895762100
15895772100
15895782100
15895792100
15895802100
15895812100
15895822100
15895832100
15895842100
15895852100
15895862100
15895872100
15895882100
15895892100
15895902100
15895912100
15895922100
15895932100
15895942100
15895952100
15895962100
15895972100
15895982100
15895992100
15896002100
15896012100
15896022100
15896032100
15896042100
15896072100
158961200
select/*+ full(t1) */DATAOBJ#,type#,owner# from obj$ t1
minus
select/*+ index(t i_obj4) */DATAOBJ#,type#,owner# from obj$ t ;
DATAOBJ# TYPE# OWNER#
------------------------------
15876592100
15876602100
15876612100
15876622100
15876632100
15876642100
15876652100
15876662100
15876672100
15876682100
15876692100
15876702100
15876712100
15876722100
15876732100
15876742100
15876752100
15876762100
15876772100
15876782100
15876792100
15876802100
15876812100
15876822100
15876832100
15876842100
15876852100
15876872100
15876882100
15876892100
15876902100
15876912100
15876922100
1587695192
1587696292
1587716290
1587717290
1587718290
1587719290
1587720290
1587721290
1587722290
1587723290
1587724290
1587725290
1587726290
1587727290
1587728290
1587729290
1587730290
1587732290
1587733290
158952700
最开始我还尝试通过bbed修复了2个Block;最后发现依然难以处理这个ora-08102错误;后续通过上述sql比较发现居然有如此多的记录不一致。修改起来太过麻烦了。
这里其实本来想尝试通过重建obj$,i_obj4$,col_usage$ 来解决的。但是担心有较大的风险,因此这里建议可以进行了数据库重建。由于obj$这里有问题,expdp操作都报错,无法执行任何ddl操作。因此最好通过exp拆分脚本来进行重建处理。整个数据库恢复➕重建过程将近20小时左右(2tb左右的库).
由于客户存储环境io较差,因此导致整个重建过程比较复杂,比较耗时。我们在开玩笑讲到,如果可能的数据库运行在我们的Zdata环境上,那么数据库重建过程在2小时内即可完成;而且也不会出现类似故障;因此Zdata的io操作上直接落盘或者写到Pcie上;不存在数据丢失的风险。
最后补充一点:
1) 由于数据库很多事务无法正常恢复,导致SMON在不断尝试进行事务恢复时报错,达到一定次数之后会crash实例;进而影响数据库的重建工作。可通过设置_smon_internal_errlimit 参数来避免该问题。
2) 为了加快exp和imp速度,这里我们利用了管道技术,脚本如下:
mknod /backup/omdata10
exp \'/ as sysdba\' parfile=omdata.par file=/backup/omdata10 rows=y indexes=n compress=n direct=y recordlength=65536 buffer=52488000 feedback=100000 volsize=0 log=omdata_other.log;
imp \'/ as sysdba\' file=/backup/omdata10 fromuser=omdata touser=omdata buffer=52488000
1
2
3
mknod/backup/omdata10
exp \'/assysdba\'parfile=omdata.parfile=/backup/omdata10rows=y indexes=n compress=n direct=yrecordlength=65536buffer=52488000 feedback=100000 volsize=0 log=omdata_other.log;
imp \'/assysdba\'file=/backup/omdata10fromuser=omdatatouser=omdatabuffer=52488000