一、 简介
在Oracle11.2版本之后,若搭建实时应用日志的物理备库,那么在主库数据文件少量坏块的情况下,可以利用ABCR技术快速修复坏块。
Starting in Oracle Database 11g Release 2 (11.2), the primary database automatically attempts to repair the corrupted block in real time by fetching a good version of the same block from a physical standby database. This capability is referred to as automatic block repair, and it allows corrupt data blocks to be automatically repaired as soon as the corruption is detected. Automatic block repair reduces the amount of time that data is inaccessible due to block corruption. It also reduces block recovery time by using up-to-date good blocks in real-time, as opposed to retrieving blocks from disk or tape backups, or from Flashback logs.
ABCR 功能是否开启由隐含参数 _auto_bmr 控制,默认为 ENABLE
col NAME for a30
col VALUE for a30
col describ for a40
SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ FROM x$ksppi x,x$ksppcv y
WHERE x.inst_id = USERENV ('Instance')
AND y.inst_id = USERENV ('Instance')
AND x.indx = y.indx
AND x.ksppinm LIKE '%_auto_bmr%';
NAME VALUE DESCRIB
------------------------------ ------------------------------ ----------------------------------------
_auto_bmr ENABLE enable/disable Auto BMR
_auto_bmr_req_timeout 60 Auto BMR Requester Timeout
_auto_bmr_sess_threshold 30 Auto BMR Request Session Threshold
_auto_bmr_pub_timeout 10 Auto BMR Publish Timeout
_auto_bmr_fc_time 60 Auto BMR Flood Control Time
_auto_bmr_bg_time 3600 Auto BMR Process Run Time
_auto_bmr_sys_threshold 100 Auto BMR Request System Threshold
_auto_bmr_max_rowno 1024 x$krbabrstat Max number of rows
二、 DB运行时的坏块修复
创建测试表
create table T_OBJS as select * from all_objects ;
create index baiyang.idx_name on baiyang.T_OBJS(OBJECT_NAME);
select segment_name,header_file , header_block,blocks from dba_segments where OWNER ='BAIYANG' and segment_name ='T_OBJS';
OWNER SEGMENT_NAME HEADER_FILE HEADER_BLOCK BLOCKS
------------------------------ ----------------- ----------- ------------ ----------
SYS T_OBJS 4 300 1280
首先测试下在不开启ABCR功能的情况下,坏块是否可以自动修复
alter system set "_auto_bmr" = disabled;
搞破坏
dd if=/dev/zero of=/u01/app/oracle/oradata/orcl/users01.dbf bs=8192 seek=300 count=2 conv=notrunc
查询报错
alter system flush buffer_cache;
select count(*) from baiyang.T_OBJS;
> select count(*) from baiyang.T_OBJS
> *
> ERROR at line 1:
> ORA-01578: ORACLE data block corrupted (file # 4, block # 300)
> ORA-01110: data file 4: '/u01/app/oracle/oradata/orcl/users01.dbf'
确认坏块的对象,并人为修复
select * from v$database_block_corruption;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
4 300 2 0 ALL ZERO
RMAN> blockrecover datafile 4 block 301;
RMAN> blockrecover datafile 4 block 301;
开启ABMR时又怎么样呢?
alter system set "_auto_bmr" = enabled;
搞破坏
dd if=/dev/zero of=/u01/app/oracle/oradata/orcl/users01.dbf bs=8192 seek=400 count=2 conv=notrunc
sys@ORCL> select * from v$database_block_corruption;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
4 400 2 0 ALL ZERO
alter system flush buffer_cache;
select count(*) from baiyang.T_OBJS;
COUNT(*)
----------
84428
#查询正常
查看告警日志,自动修复坏块
Wed Dec 05 15:55:59 2018
Automatic block media recovery successful for (file# 4, block# 401)
Automatic block media recovery successful for (file# 4, block# 402)
三、 用户表有坏块且强制关闭实例
现在测试一下,当用户表 T_OBJS 有块损坏并强制关闭实例的情况下,会不会做自动修复。
select segment_name,header_file , header_block,blocks from dba_segments where OWNER ='BAIYANG' and segment_name ='T_OBJS';
SEGMENT_NAME HEADER_FILE HEADER_BLOCK BLOCKS
--------------------------------------------------------------------------------- ----------- ------------ ----------
T_OBJS 4 298 48
sys@ORCL> select name ,file# from v$datafile;
NAME FILE#
---------------------------------------------------------------------------------------------------- ----------
/u01/app/oracle/oradata/standby/datafile/system01.dbf 1
/u01/app/oracle/oradata/standby/datafile/sysaux01.dbf 2
/u01/app/oracle/oradata/standby/datafile/undotbs01.dbf 3
/u01/app/oracle/oradata/standby/datafile/users01.dbf 4
/u01/app/oracle/product/11.2.0/db_1/dbs/lxx2.dbf 5
搞破坏并强制关闭主库(建议提前备份)
dd if=/dev/zero of=/u01/app/oracle/oradata/standby/datafile/users01.dbf bs=8192 seek=298 count=3 conv=notrunc
2+0 records in
2+0 records out
16384 bytes (16 kB) copied, 0.000394806 s, 41.5 MB/s
dd if=/dev/zero of=/u01/app/oracle/oradata/standby/datafile/users01.dbf bs=8192 seek=300 count=2 conv=notrunc
2+0 records in
2+0 records out
16384 bytes (16 kB) copied, 0.000394806 s, 41.5 MB/s
# 强制关闭数据库
sys@ORCL> shutdown abort
ORACLE instance shut down.
# 打开数据库,数据库可以正常打开
sys@ORCL> startup
ORACLE instance started.
Total System Global Area 2455228416 bytes
Fixed Size 2255712 bytes
Variable Size 620758176 bytes
Database Buffers 1811939328 bytes
Redo Buffers 20275200 bytes
Database mounted.
Database opened.
先使用RMAN验证下数据,并确认有坏块
RMAN> backup validate database datafile 4;
sys@ORCL> select * from v$database_block_corruption;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
4 304 2 0 ALL ZERO
4 299 3 0 ALL ZERO
如果直接查询表,那么坏块会被自动修复
sys@ORCL> select count(*) from BAIYANG.T_OBJS;
COUNT(*)
----------
7513
# 以下全表查询时会检索有所的数据块,此时会修复坏块
sys@ORCL> select * from BAIYANG.T_OBJS;
…………
DBA_STREAMS_STMT_HANDLERS
_DBA_STREAMS_STMTS
DBA_STREAMS_STMTS
DBA_STREAMS_STMTS
_DBA_APPLY_CHANGE_HANDLERS
DBA_APPLY_CHANGE_HANDLERS
DBA_APPLY_CHANGE_HANDLERS
8001 rows selected.
alert日志输出如下
Corrupt Block Found
TSN = 4, TSNAME = USERS
RFN = 4, BLK = 299, RDBA = 16777515
OBJN = 87356, OBJD = 87356, OBJECT = T_OBJS, SUBOBJECT =
SEGMENT OWNER = BAIYANG, SEGMENT TYPE = Table Segment
Corrupt Block Found
TSN = 4, TSNAME = USERS
RFN = 4, BLK = 300, RDBA = 16777516
OBJN = 87356, OBJD = 87356, OBJECT = T_OBJS, SUBOBJECT =
SEGMENT OWNER = BAIYANG, SEGMENT TYPE = Table Segment
Sat Dec 08 13:50:31 2018
Automatic block media recovery successful for (file# 4, block# 301)
Sat Dec 08 13:50:31 2018
Automatic block media recovery successful for (file# 4, block# 299)
Automatic block media recovery successful for (file# 4, block# 299)
………………
Corrupt Block Found
TSN = 4, TSNAME = USERS
RFN = 4, BLK = 304, RDBA = 16777520
OBJN = 87356, OBJD = 87356, OBJECT = T_OBJS, SUBOBJECT =
SEGMENT OWNER = BAIYANG, SEGMENT TYPE = Table Segment
Automatic block media recovery successful for (file# 4, block# 305)
Automatic block media recovery successful for (file# 4, block# 304)
Automatic block media recovery successful for (file# 4, block# 304)
Automatic block media recovery successful for (file# 4, block# 305)
再次检查坏块情况
select * from v$database_block_corruption;
no rows selected
说明用户表块损坏不影响实例启动,启动时不验证用户表是否正常。启动后若有ADG,访问数据表时会自动修复坏块。
四、 系统表有坏块且强制关闭实例
下面测试系统表OBJ$ 有坏块并强制关闭实例的情况下,会不会做自动修复。
--查看对象OBJ$的segment状态
select segment_name,header_file , header_block,blocks from dba_segments where OWNER ='SYS' and segment_name ='OBJ$';
SEGMENT_NAME HEADER_FILE HEADER_BLOCK BLOCKS
--------------------------------------------------------------------------------- ----------- ------------ ----------
OBJ$ 1 240 1152
搞破坏,并强制重启数据库
dd if=/dev/zero of=/u01/app/oracle/oradata/standby/datafile/system01.dbf bs=8192 seek=240 count=2 conv=notrunc
2+0 records in
2+0 records out
16384 bytes (16 kB) copied, 0.000247249 s, 66.3 MB/s
sys@ORCL> shutdown abort
sys@ORCL> startup
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01578: ORACLE data block corrupted (file # 1, block # 241)
ORA-01110: data file 1: '/u01/app/oracle/oradata/standby/datafile/system01.dbf'
Process ID: 533
Session ID: 5 Serial number: 3
会发现已经无法open了,尝试启动到mount状态,使用RMAN验证数据文件。
# 启动到mount状态
idle> startup mount
ORACLE instance started.
Total System Global Area 2455228416 bytes
Fixed Size 2255712 bytes
Variable Size 620758176 bytes
Database Buffers 1811939328 bytes
Redo Buffers 20275200 bytes
Database mounted.
# 使用RMAN验证文件状态
RMAN> backup validate database datafile 1;
idle> select * from v$database_block_corruption;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
1 240 2 0 ALL ZERO
修复并打开数据库
# 修复数据块
RMAN> blockrecover datafile 1 block 241;
Starting recover at 08-DEC-18
using channel ORA_DISK_1
searching flashback logs for block images until SCN 3125428
finished flashback log search, restored 1 blocks
starting media recovery
media recovery complete, elapsed time: 00:00:01
Finished recover at 08-DEC-18
# 打开数据库
idle> alter database open;
Database altered.
# 查询坏块已经被修复
idle> select * from v$database_block_corruption;
no rows selected
通过测试可以看到ABCR无法修复系统表坏块,系统表坏块会导致数据库无法open,但可以利用RMAN进行修复。若坏块在索引,可以考虑删除或者重建。在这个测试场景中,还可以直接闪回数据库到正常状态。
数据库块的修复方式还有多种,如RMAN blockrecover、RMAN datafilerecover、DatabaseRecoveryAdvisor、DGswitchover等,ABCR无疑是最省心的一种,建议搭建ADG的主备架构,避免故障触发。
参考
https://developer.aliyun.com/article/675344