2015年04月15日 某系统由于内存故障导致RAC的B节点发生宕机,在宕机其间tjs1b库由于宕机出现3个坏块,经分析3个坏块为逻辑坏块,共涉及到两个索引的两个分区。对索引分区进行重建完成修复;
现象:
ORACLE Instance ojs1a (pid = 26) - Error 1578 encountered while recovering transaction (173, 11) on object 4054036.
Errors in file /oracle/app/oracle/diag/rdbms/orajs1/ojs1a/trace/ojs1a_smon_176306.trc:
ORA-01578: ORACLE data block corrupted (file # 1004, block # 488211)
ORA-01110: data file 1004: '/dev/rvg06lv015'
Sat Apr 18 13:33:54 2015
Thread 1 advanced to log sequence 1547143 (LGWR switch)
Current log# 10 seq# 1547143 mem# 0: /dev/rd8_redolv1_1
ORACLE Instance ojs1a (pid = 26) - Error 1578 encountered while recovering transaction (176, 12) on object 4043711.
Errors in file /oracle/app/oracle/diag/rdbms/orajs1/ojs1a/trace/ojs1a_smon_176306.trc:
ORA-01578: ORACLE data block corrupted (file # 1133, block # 865723)
ORA-01110: data file 1133: '/dev/rvg06lv073'
Sat Apr 18 13:33:57 2015
Archived Log entry 2782056 added for thread 1 sequence 1547142 ID 0x41a4b0c0 dest 1:
ORACLE Instance ojs1a (pid = 26) - Error 1578 encountered while recovering transaction (86, 20) on object 4043711.
Errors in file /oracle/app/oracle/diag/rdbms/orajs1/ojs1a/trace/ojs1a_smon_176306.trc:
ORA-01578: ORACLE data block corrupted (file # 1133, block # 865723)
ORA-01110: data file 1133: '/dev/rvg06lv073'
ORACLE Instance ojs1a (pid = 26) - Error 1578 encountered while recovering transaction (89, 32) on object 4054036.
Errors in file /oracle/app/oracle/diag/rdbms/orajs1/ojs1a/trace/ojs1a_smon_176306.trc:
ORA-01578: ORACLE data block corrupted (file # 1087, block # 498755)
ORA-01110: data file 1087: '/dev/rvg06lv049'
Sat Apr 18 13:34:05 2015
ORACLE Instance ojs1a (pid = 26) - Error 1578 encountered while recovering transaction (102, 12) on object 4054036.
Errors in file /oracle/app/oracle/diag/rdbms/orajs1/ojs1a/trace/ojs1a_smon_176306.trc:
ORA-01578: ORACLE data block corrupted (file # 1087, block # 498755)
ORA-01110: data file 1087: '/dev/rvg06lv049'
ORACLE Instance ojs1a (pid = 26) - Error 1578 encountered while recovering transaction (105, 12) on object 4054036.
Errors in file /oracle/app/oracle/diag/rdbms/orajs1/ojs1a/trace/ojs1a_smon_176306.trc:
ORA-01578: ORACLE data block corrupted (file # 1087, block # 498755)
问题处理步聚:
1.查看操盘系统日志未发现操作系统报错因此可以说明该坏块是逻辑坏块;
2.用DBV或RMAN检查是否存在坏块(BDV可以检查物理和逻辑坏块);
--dbv校验
[oracle@tjs1a] /oracle> dbv file=/dev/rvg06lv049 blocksize=16384
DBVERIFY: Release 11.2.0.2.0 - Production on Sat Apr 18 13:44:09 2015
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
DBVERIFY - Verification starting : FILE = /dev/rvg06lv049
DBV-00200: Block, DBA 268934211, already marked corrupt
csc(0x0bd1.2eb80a06) higher than block scn(0x0000.00000000)
Page 498755 failed with check code 6054
DBVERIFY - Verification complete
Total Pages Examined : 1048512
Total Pages Processed (Data) : 823753
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 221725
Total Pages Failing (Index): 1
Total Pages Processed (Other): 2974
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 60
Total Pages Marked Corrupt : 1
Total Pages Influx : 0
Total Pages Encrypted : 0
Highest block SCN : 849573888 (3025.849573888)
[oracle@tjs1a] /oracle>
You have mail in /usr/spool/mail/oracle
--RMAN 校验
rman target /
backup check logical validate database;
通过下面这条命令,并不是做一个备份 ,只是对文件做一次校验,当然也可以发现坏块
RMAN> backup validate datafile 1004;
RMAN> exit
Recovery Manager complete.
检查完查看
SQL> select * from v$database_block_corruption;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
4 131 1 2.7119E+14 FRACTURED
3.查看坏块对应的对象
--通过object_id查看
select owner,object_name,object_type from dba_objects where object_id='4043711';
SQL> select owner,index_name,table_name from dba_indexes where index_name in ('UN_PK_STLGPRSCNGKEY','UN_PK_STLGPRSLTECNGKEY');
OWNER INDEX_NAME TABLE_NAME
------------------------------ ------------------------------ ------------------------------
SDSETTLE UN_PK_STLGPRSCNGKEY STLGPRSCNGKEY
SDSETTLE UN_PK_STLGPRSLTECNGKEY STLGPRSLTECNGKEY
---通过file_id 和block来查找(此方案相对慢一些)
Select owner,segment_name,segment_type from dba_extents where file_id=1087 and 498755 between block_id and block_id+blocks-1;
4.如果是索引可以进行重建;
5.如果是表且没有备份的情况下可以通过dbms_repair.skip_corrupt_blocks跳过坏块,然后重建(注:使用skip_corrupt_blocks只能使oracle跳过Oracle能够读出的块,而如果在操作系统层read调用就失败的,则不能跳过该过。甚至于该会话也可能会被中断。遇到这样的情况,使用dd命令或操作系统的copy(cp)命令都不能复制该文件,rman也不能备份该文件)
例:SQL> analyze table test.t1 VALIDATE STRUCTURE CASCADE;
analyze table test.t1
validate structure
*
ERROR 位于第 1 行:
ORA-01578: ORACLE
数据块损坏(文件号10,块号1447)
ORA-01110: 数据文件 10:
‘D:\ORACLE\ORADATA\XJ\TEST01.DBF’
SQL>> begin
2 dbms_repair.admin_tables (
3 table_name => ‘REPAIR_TABLE’,
4 table_type => dbms_repair.repair_table,
5 action => dbms_repair.create_action,
6 tablespace => ‘SYSTEM’);
7 end;
8 /
PL/SQL 过程已成功完成。
SQL> set serveroutput
on
SQL> declare
2 rpr_count int;
3 begin
4 rpr_count := 0;
5 dbms_repair.check_object (
6 schema_name => ‘TEST’,
7 object_name => ‘T1′,
8 repair_table_name => ‘REPAIR_TABLE’,
9 corrupt_count => rpr_count);
10 dbms_output.put_line(‘repair count: ‘ || to_char(rpr_count));
11 end;
12 /
repair
count: 1
PL/SQL 过程已成功完成。
SQL> select object_name, block_id, corrupt_type,
marked_corrupt,corrupt_description,
2 repair_description from
repair_table;
OBJECT_NAME BLOCK_ID CORRUPT_TYPE MARKED_COR CORRUPT_DESCRIPTION REPAIR_DESCRIPTION
————- ———- ———— ———- ——————– ——————–
T1 1447 6148 TRUE mark block software
corrupt
T1 1447 6148 TRUE mark block software
corrupt
SQL> declare
2 fix_count int;
3 begin
4 fix_count := 0;
5 dbms_repair.fix_corrupt_blocks (
6 schema_name => ‘TEST’,
7 object_name => ‘T1′,
8 object_type => dbms_repair.table_object,
9 repair_table_name => ‘REPAIR_TABLE’,
10 fix_count => fix_count);
11 dbms_output.put_line(‘fix count: ‘ || to_char(fix_count));
12 end;
13 /
fix count: 0
PL/SQL 过程已成功完成。
SQL> begin
2 dbms_repair.skip_corrupt_blocks (
3 schema_name => ‘TEST’,
4 object_name => ‘T1′,
5 object_type => dbms_repair.table_object,
6 flags => dbms_repair.skip_flag);
7 end;
8 /
PL/SQL 过程已成功完成。
SQL> select table_name, skip_corrupt from dba_tables where table_name = ‘T1′ and owner=’TEST’;
TABLE_NAME SKIP_COR
—————————— ——–
T1 ENABLED
SQL> select count(*) from test.t1;
COUNT(*)
———-
28762
6.如果是表且有备份,可以通过不完全恢复来解决;
7.如果是表且自从出现坏块还没有进行备份的话,可以通过rman直接对坏块进行恢复
blockrecover datafile 1087 block 498755;
本案例是分析索引出现坏块,且是逻辑坏块,因此只需要重建分区索引即可:
解决方法
alter index SDSETTLE.UN_PK_STLGPRSCNGKEY rebuild partition KEY20150418_0 online parallel 16;
alter index SDSETTLE.UN_PK_STLGPRSLTECNGKEY rebuild partition KEY20150418_0 online parallel 16;