上篇写了,如果段头块出现坏块,我们抽取数据的方法,这都是比较极端的情况了。如果坏块出现在普通的data block上面,而我们使用跳过坏块的方法不生效。坏块以外的数据是不是就要丢失哪?那我们有什么方法处理哪?很多人想到的是使用bbed,dul等工具来抢救,其实在很多时候我们使用pl/sql也可以完美的抢救数据.
在这里我们通过模拟错误,然后使用plsql来找回数据. 这中处理方法相对于bbed风险小,但是缺点是如果数据量大处理时间可能比较长,可能比dul有的一比,但是dul的工具不是任何人都有的.所以整体来说,在大部分情况下,这种方法处理某个数据块错误,抢救某个对象数据,还是很好的方法.
表上存在非空索引-- 使用索引获取ROWID
–创建测试表,并创建非空唯一索引
SQL> create table zhuo.zhuo
as
select * from dba_objects where object_id is not null;
Table created.
SQL> alter table zhuo.zhuo add constraint PK_zhuo primary key (object_id) ;
Table altered.
–表总记录
SQL> select count(*) from zhuo.zhuo;
COUNT(*)
----------
80066
–extent的分布情况
SQL> set pages 100
SQL> select file_id,block_id,block_id+blocks-1
2 from dba_extents
3 where segment_name ='ZHUO' AND owner='ZHUO';
FILE_ID BLOCK_ID BLOCK_ID+BLOCKS-1
---------- ---------- -----------------
5 1408 1415
5 1416 1423
5 1424 1431
5 1432 1439
5 1440 1447
5 1448 1455
5 1456 1463
5 1464 1471
5 1472 1479
5 1480 1487
5 1488 1495
5 1496 1503
5 1504 1511
5 1512 1519
5 1520 1527
5 1528 1535
5 1536 1663
5 1664 1791
5 1792 1919
5 1920 2047
5 2048 2175
5 2176 2303
5 2304 2431
5 2432 2559
5 2560 2687
25 rows selected.
–随便选择一个block,2300数据块包含记录数:
SQL> select count(*) from zhuo.zhuo where dbms_rowid.rowid_block_number(rowid)=2300;
COUNT(*)
----------
66
–关闭数据库
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
–破坏数据块
[root@oracle11g ~]# dd if=/dev/zero of=/u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf bs=8192 count=1 seek=2300 conv=notrunc
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.00312173 s, 2.6 MB/s
–启动数据库
–查询结果
SQL> select /*+full(t) */ count(*) from zhuo.zhuo t;
select /*+full(t) */ count(*) from zhuo.zhuo t
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 5, block # 2300)
ORA-01110: data file 5:
'/u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf'
–创建备份表
SQL> create table zhuo.zhuo_new
2 as
3 select * from zhuo.zhuo where 1=2;
Table created.
–创建坏块相关rowid记录表
create table zhuo.bad_rows(table_name varchar2(60),row_id rowid, oracle_error_code number);
–执行plsql脚本
set serveroutput on;
set timing on;
DECLARE
TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
CURSOR c1 IS select /*+index(t pk_zhuo)*/ rowid from zhuo.zhuo t;
r RowIDTab;
rows NATURAL := 20000;
bad_rows number := 0 ;
errors number;
error_code number;
myrowid rowid;
BEGIN
OPEN c1;
LOOP
FETCH c1 BULK COLLECT INTO r LIMIT rows;
EXIT WHEN r.count=0;
BEGIN
FORALL i IN r.FIRST..r.LAST SAVE EXCEPTIONS
insert into zhuo.zhuo_new
select /*+ ROWID(A) */ *
from zhuo.zhuo A where rowid = r(i);
EXCEPTION
when OTHERS then
BEGIN
errors := SQL%BULK_EXCEPTIONS.COUNT;
FOR err1 IN 1..errors LOOP
error_code := SQL%BULK_EXCEPTIONS(err1).ERROR_CODE;
myrowid := r(SQL%BULK_EXCEPTIONS(err1).ERROR_INDEX);
bad_rows := bad_rows + 1;
insert into zhuo.bad_rows values('zhuo.zhuo',myrowid, error_code);
END LOOP;
END;
END;
commit;
END LOOP;
commit;
CLOSE c1;
dbms_output.put_line('Total Bad Rows: '||bad_rows);
END;
/
–查询错误记录
SQL> select count(*) from zhuo.bad_rows;
COUNT(*)
----------
66
SQL> set lines 1000
SQL> col table_name for a10
TABLE_NAME ROW_ID ORACLE_ERROR_CODE
---------- ------------------ -----------------
zhuo.zhuo AAATruAAFAAAAj8AAA 1578
zhuo.zhuo AAATruAAFAAAAj8AAB 1578
zhuo.zhuo AAATruAAFAAAAj8AAC 1578
zhuo.zhuo AAATruAAFAAAAj8AAD 1578
zhuo.zhuo AAATruAAFAAAAj8AAE 1578
zhuo.zhuo AAATruAAFAAAAj8AAF 1578
zhuo.zhuo AAATruAAFAAAAj8AAG 1578
zhuo.zhuo AAATruAAFAAAAj8AAH 1578
zhuo.zhuo AAATruAAFAAAAj8AAI 1578
9 rows selected.
–查询备份表记录
SQL> select count(*) from zhuo.zhuo_new;
COUNT(*)
----------
80000
80066-80000=66和被破坏块中记录一致,证明所有好块中记录全部被找回来。
说明:
SQL> exec dbms_repair.skip_corrupt_blocks('ZHUO','ZHUO');
PL/SQL procedure successfully completed.
SQL> select skip_corrupt from dba_tables where table_name='ZHUO' AND OWNER='ZHUO';
SKIP_COR
--------
ENABLED
SQL> select /*+full(t) */ count(*) from zhuo.zhuo t;
COUNT(*)
----------
80000
此处跳过坏块和pl/sql抽取数据的作用是一样的。
不存在非空索引–dbms_rowid.ROWID_CREATE来构造ROWID
–创建表并收集表统计信息
SQL> create table zhuo.zhuo as
select * from dba_objects;
Table created.
SQL> exec dbms_stats.gather_table_stats('ZHUO','ZHUO');
PL/SQL procedure successfully completed.
–查询表记录
SQL> select count(*) from zhuo.zhuo;
COUNT(*)
----------
80066
–查询extent信息
SQL> set pages 100
SQL> select file_id,block_id,block_id+blocks-1
2 from dba_extents
3 where segment_name ='ZHUO' AND owner='ZHUO';
FILE_ID BLOCK_ID BLOCK_ID+BLOCKS-1
---------- ---------- -----------------
5 1408 1415
5 1416 1423
5 1424 1431
5 1432 1439
5 1440 1447
5 1448 1455
5 1456 1463
5 1464 1471
5 1472 1479
5 1480 1487
5 1488 1495
5 1496 1503
5 1504 1511
5 1512 1519
5 1520 1527
5 1528 1535
5 1536 1663
5 1664 1791
5 1792 1919
5 1920 2047
5 2048 2175
5 2176 2303
5 2304 2431
5 2432 2559
5 2560 2687
25 rows selected.
–block 2500中的记录数
SQL> select count(*) from zhuo.zhuo where dbms_rowid.rowid_block_number(rowid)=2500;
COUNT(*)
----------
69
–关闭数据库
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
–破坏block 2500的数据块
[root@oracle11g ~]# dd if=/dev/zero of=/u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf bs=8192 count=1 seek=2500 conv=notrunc
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.00135099 s, 6.1 MB/s
–启动数据库
SQL> startup
ORACLE instance started.
Total System Global Area 521936896 bytes
Fixed Size 2254824 bytes
Variable Size 352323608 bytes
Database Buffers 163577856 bytes
Redo Buffers 3780608 bytes
Database mounted.
Database opened.
–查询报错
SQL> select count(*) from zhuo.zhuo;
select count(*) from zhuo.zhuo
*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 5, block # 2500)
ORA-01110: data file 5:
'/u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf'
–创建备份表
SQL> create table zhuo.zhuo_new
2 as
3 select * from zhuo.zhuo where 1=2;
Table created.
–找回记录–参考mos:Doc ID 422547.1
set serveroutput on
set concat off
DECLARE
nrows number;
rid rowid;
dobj number;
ROWSPERBLOCK number;
BEGIN
ROWSPERBLOCK:=300; --估算最大的一个块中记录条数
nrows:=0;
select data_object_id into dobj
from dba_objects
where owner = 'ZHUO'
and object_name = 'ZHUO'
-- and subobject_name = '<table partition>' Add this condition if table is partitioned
;
for i in (select relative_fno, block_id, block_id+blocks-1 totblocks
from dba_extents
where owner = 'ZHUO'
and segment_name = 'ZHUO'
-- and partition_name = '<table partition>' Add this condition if table is partitioned
-- and file_id != <OFFLINED DATAFILE> This condition is only used if a datafile needs to be skipped due to ORA-376 (A)
order by extent_id)
loop
for br in i.block_id..i.totblocks loop
for j in 1..ROWSPERBLOCK loop
begin
rid := dbms_rowid.ROWID_CREATE(1,dobj,i.relative_fno, br , j-1);
insert into ZHUO.ZHUO_NEW
select /*+ ROWID(A) */ *
from ZHUO.ZHUO A
where rowid = rid;
if sql%rowcount = 1 then nrows:=nrows+1; end if;
if (mod(nrows,10000)=0) then commit; end if;
exception when others then null;
end;
end loop;
end loop;
end loop;
COMMIT;
dbms_output.put_line('Total rows: '||to_char(nrows));
END;
/
Total rows: 80000
PL/SQL procedure successfully completed.
Elapsed: 00:00:36.46
说明:针对这种方式,mos 文档建议我们先执行如下类似sql:
select AVG_ROW_LEN from dba_tables where table_name=upper(‘t_seg_header’);
然后估算一个block 中容纳的最大行数,以免在后面执行脚本的过程中,导致部分block数据无法抽取。
oracle的建议是,如果你上面查询的avg_row_len 是50,那么设置最大block rows就设置为100. 建议最低设置其2倍。 当然,我们可以的更大一些。
SQL> select AVG_ROW_LEN from dba_tables where table_name=upper('zhuo');
AVG_ROW_LEN
-----------
98
所以上面 ROWSPERBLOCK:=300; 设置了300,稍微大了一点。
–记录查询
SQL> select count(*) from zhuo.zhuo_new;
COUNT(*)
----------
79997
SQL> select 80066-79997 from dual;
80066-79997
-----------
69
证明非坏块中的数据都被完全寻找回来。
参考maclean的脚本:
创建错误表:
create table bad_rows (row_id rowid,oracle_error_code varchar2(50));
创建备份表:
SQL> create table zhuo_new as select * from zhuo where 1=2;
抽取数据:
SQL> set serveroutput on;
SQL> set timing on;
SQL>
SQL> DECLARE
2 TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
3 CURSOR Crowid_info IS
4 select Do.DATA_OBJECT_ID dataid,
5 DE.FILE_ID fid,
6 DE.BLOCK_ID blkid,
7 DE.BLOCKS blkcnt
8 from dba_objects DO, dba_extents DE
9 where DO.OBJECT_NAME = 'ZHUO'
10 and nvl(DO.SUBOBJECT_NAME,'-1') = nvl(DE.PARTITION_NAME,'-1')
11 and DO.OBJECT_NAME = DE.SEGMENT_NAME
12 and DO.owner = 'ZHUO'
13 order by 1, 2, 3 asc;
14 bad_rows number := 0;
15 errors varchar2(500);
16 error_code varchar2(500);
17 myrowid rowid;
18 BEGIN
19 execute immediate 'alter session set commit_write=''batch,nowait'' ';
20 for i in Crowid_info loop
21 for j in 0 .. i.blkcnt - 1 loop
22 for z in 0 .. 300 loop
23 begin
24 myrowid := dbms_rowid.ROWID_CREATE(1,i.dataid,i.fid,i.blkid + j,z);
25 insert into zhuo_new select /*+ ROWID(A) */ * from zhuo.zhuo A where rowid = myrowid;
26 EXCEPTION
27 when OTHERS then
28 BEGIN
29 errors := SQLERRM;
30 error_code := SQLCODE;
31 if (error_code like '%1410%' or error_code like '%8103%' or error_code like '%1578%') then
32 bad_rows := bad_rows + 1;
33 insert into bad_rows values (myrowid, error_code);
34 commit;
35 else
36 raise;
37 end if;
38 END;
39 commit;
40 end;
41 end loop;
42 end loop;
43 end loop;
44 dbms_output.put_line('Total Bad Rows: ' || bad_rows);
45 commit;
46 END;
47 /
Total Bad Rows: 40635
PL/SQL procedure successfully completed.
Elapsed: 00:00:39.25
时间差不读,原理是一样的
至此,测试完成。
下面再补充一个分区表有块损坏,没有索引的情况:
创建表:
SQL> conn zhuo/zhuo;
create table zhuo (t1 int,t2 date default sysdate)
partition by range(t1)
(partition p1 values less than (10000),
partition p2 values less than (20000),
partition p3 values less than (30000),
partition p4 values less than (40000),
partition p5 values less than (50000),
partition p6 values less than (60000),
partition p7 values less than (70000),
partition p8 values less than (80000),
partition p9 values less than (90000),
partition p10 values less than (100000),
partition p11 values less than (110000),
partition p12 values less than (120000),
partition p13 values less than (130000),
partition p14 values less than (140000),
partition p15 values less than (150000),
partition p16 values less than (160000))
;
Table created.
SQL> insert into zhuo(t1) select rownum from dual connect by level<160000;
159999 rows created.
SQL> commit;
Commit complete.
SQL> conn / as sysdba
Connected.
SQL> exec dbms_stats.gather_table_stats('ZHUO','ZHUO');
PL/SQL procedure successfully completed.
SQL> alter system flush buffer_cache;
System altered.
SQL> /
System altered.
随即采样一些块来做 坏块 约涉及到5个块的数据
set linesize 200 pagesize 1400
select dbms_rowid.rowid_block_number(rowid) blkid,
dbms_rowid.rowid_relative_fno(rowid) rfile
from zhuo.zhuo sample(0.01)
where rownum <= 200
group by dbms_rowid.rowid_block_number(rowid),
dbms_rowid.rowid_relative_fno(rowid)
order by 1;
SQL> set linesize 200 pagesize 1400
SQL>
SQL> select dbms_rowid.rowid_block_number(rowid) blkid,
2 dbms_rowid.rowid_relative_fno(rowid) rfile
3 from zhuo.zhuo sample(0.01)
4 where rownum <= 200
5 group by dbms_rowid.rowid_block_number(rowid),
6 dbms_rowid.rowid_relative_fno(rowid)
7 order by 1;
BLKID RFILE
---------- ----------
4520 5
4535 5
5556 5
7578 5
7602 5
7610 5
9646 5
10685 5
11691 5
12707 5
12734 5
13745 5
13758 5
16829 5
14 rows selected.
SQL> select count(*) from zhuo.zhuo where dbms_rowid.rowid_block_number(rowid) in (4520,4535,5556,7578,7602);
COUNT(*)
----------
2015
这5个块里面共有2015条数据。
制造坏块
RMAN> blockrecover datafile 4 block 4520,4535,5556,7578,7602 clear;
Starting recover at 08-DEC-20
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=33 device type=DISK
Finished recover at 08-DEC-20
创建备份表
drop table zhuo.zhuo_bak;
create table zhuo.zhuo_bak as select * from zhuo.zhuo where 1=0;
drop table zhuo.bad_rows;
create table zhuo.bad_rows (row_id rowid,oracle_error_code varchar2(50));
根据mos,确认每个块存储的最大数据行数
SQL> select AVG_ROW_LEN from dba_tables where table_name=upper('zhuo');
AVG_ROW_LEN
-----------
13
抽取数据:
set serveroutput on;
set timing on;
DECLARE
TYPE RowIDTab IS TABLE OF ROWID INDEX BY BINARY_INTEGER;
CURSOR Crowid_info IS
select Do.DATA_OBJECT_ID dataid,
DE.FILE_ID fid,
DE.BLOCK_ID blkid,
DE.BLOCKS blkcnt
from dba_objects DO, dba_extents DE
where DO.OBJECT_NAME = 'ZHUO'
and nvl(DO.SUBOBJECT_NAME,'-1') = nvl(DE.PARTITION_NAME,'-1')
and DO.OBJECT_NAME = DE.SEGMENT_NAME
and DO.owner = 'ZHUO'
order by 1, 2, 3 asc;
bad_rows number := 0;
errors varchar2(500);
error_code varchar2(500);
myrowid rowid;
BEGIN
execute immediate 'alter session set commit_write=''batch,nowait'' ';
for i in Crowid_info loop
for j in 0 .. i.blkcnt - 1 loop
for z in 0 .. 1000 loop
begin
myrowid := dbms_rowid.ROWID_CREATE(1,i.dataid,i.fid,i.blkid + j,z);
insert into zhuo.zhuo_bak select /*+ ROWID(A) */ * from zhuo.zhuo A where rowid = myrowid;
EXCEPTION
when OTHERS then
BEGIN
errors := SQLERRM;
error_code := SQLCODE;
if (error_code like '%1410%' or error_code like '%8103%' or error_code like '%1578%') then
bad_rows := bad_rows + 1;
insert into zhuo.bad_rows values (myrowid, error_code);
commit;
else
raise;
end if;
END;
commit;
end;
end loop;
end loop;
end loop;
dbms_output.put_line('Total Bad Rows: ' || bad_rows);
commit;
END;
/
Total Bad Rows: 15663648
PL/SQL procedure successfully completed.
Elapsed: 02:35:29.33
SQL> select count(*) from zhuo.zhuo_bak;
COUNT(*)
----------
157984
耗时很长。损失了5个块的数据。
经测试,当z取值100的时候,缺少数据,才抽取了10分之一的数据,按mos说法,取2倍即可,但是在此处并不适用,但是耗时只有15min,所以此处使用1000行来循环。所以必须严格确定z即每个块数据行数的大小。
对前面的文章做个梳理:
参考:
Extract rows from a CORRUPT table creating ROWID from DBA_EXTENTS (Doc ID 422547.1)
https://www.xifenfei.com/2012/09/%e4%bd%bf%e7%94%a8plsql%e6%8a%a2%e6%95%91%e6%95%b0%e6%8d%ae.html