优化实例恢复性能:快速启动故障恢复

最新推荐文章于 2021-08-19 15:10:51 发布

Rhys-Oracle

最新推荐文章于 2021-08-19 15:10:51 发布

阅读量1.6k

点赞数

分类专栏： ORACLE 之备份与恢复

本文链接：https://blog.csdn.net/rhys_oracle/article/details/17011771

版权

ORACLE 之备份与恢复专栏收录该内容

21 篇文章 0 订阅

订阅专栏

优化实例恢复性能:快速启动故障恢复

今天是2013-11-28，今天我一个同事问了我一个实例恢复的问题，暂且不谈这个问题，我整理了一下之前学习的笔记，现在发布出来。
1）about instance recovery
当数据库突然宕机（如shutdown abort）或系统突然停电宕机，那么数据库实例就会自动的进行应用redo log进行crash recovery。当数据库正常关闭如（shutdown immediate，shutdown，shutdown transactional）那么系统dbwr就会把内存的数据在进行checkpoint之前写入到数据文件磁盘。
然而，如果一个单实例或是多实例（如rac）突然crash了。那么数据库将在下一次启动的世界进行数据库实例恢复，这个回复是通过pmon进行的。当在rac环境下存在多个instance，当一个或是多个实例crash了，那么存活的实例将会进行自动的实例恢复操作。实例恢复分为两个步骤，一个是cache recovery 然后是transaction recovery。
注意只要完成了个了cache recovery 那么数据库就会被打开了，这个地方需要注意，所以提高cache recovery的效率对高可用性非常的重要。
明白几个数语。
1、cache recovery
cache recovery 或者叫做前滚，当数据库在进行恢复的步骤时，它会应用所有已经提交的或是没有提交的redo log 文件去改变数据块。这个时间受检查点到宕机的时间和执行的事务大小有关系。
2、transaction recovery
transaction recovery 又叫回滚，它是为了让数据库达到一致性，没有进行提交的事务认为是无效的需要undone，在这个过程中oracle应用回滚段去重做没有提交的数据改变。
（这里就先不介绍redo和undo的知识了，后续这块是有的。）
2）检查点和cache recovery
定期的，oracle会进行检查点，检查点事最高的scn，例如所有的数据块的scn都要小于或者等于这个scn，如果等于这个scn那么说明所有的数据改变就会写入到了数据文件中了。如果故障发生了，然后只有这个日志记录包含了scn高于检查点需要应用日志进行recovery，恢复的持续时间有两个因素决定，一个是所有改变的数据块高于检查点的scn的数目，另一个是日志块数目需要进行读取和应用这些改变。
3）how checkpoints affect performance
Frequent checkpointing writes dirty buffers to the data files more often than otherwise, and so reduces cache recovery time in the event of an instance failure. If checkpointing is frequent, then applying the redo records in the redo log between the current checkpoint position and the end of the log involves processing relatively few data blocks. This means that the cache recovery phase of recovery is fairly short.

However, in a high-update system, frequent checkpointing can reduce run-time performance, because checkpointing causes DBWn processes to perform writes.
总结一下：可以看出，减少cache recovery的时间，需要经常的让数据库进行检查点，然后可以应用少量的日志去进行故障恢复。但是在一个高的update系统中，频繁的checkpoint 势必会加剧数据库的性能开销。
如果系统运作效率更加重要，那么就需要减少checkpoint给系统带啦的压力了，那么这也会增大了系统在故障后进行的实例恢复时间了。
那么这个时候呢，我们在分析一下：fast_start_mttr_target参数的影响。
开始进入我今天要学习的fast_start_mttr_target 参数的设置与优化影响。

先看一下官方文档对该参数的简单描述：
FAST_START_MTTR_TARGET

Property	Description
Parameter type	Integer
Default value	`0`
Modifiable	`ALTER SYSTEM`
Range of values	0 to 3600 seconds
Basic	No
Oracle RAC	Multiple instances can have different values, and you can change the values at runtime.

FAST_START_MTTR_TARGET enables you to specify the number of seconds the database takes to perform crash recovery of a single instance. When specified, FAST_START_MTTR_TARGET is overridden by LOG_CHECKPOINT_INTERVAL.

从这点我们了解到，value最大设置为1小时，这个参数需要用alter system进行设置等等。
在看一下官网是怎么解释的，呵呵：

The Fast-Start Fault Recovery feature reduces the time required for cache recovery, and makes the recovery bounded and predictable by limiting the number of dirty buffers and the number of redo records generated between the most recent redo record and the last checkpoint.

The foundation of Fast-Start Fault Recovery is the Fast-Start checkpointing architecture. Instead of conventional event-driven (that is, log switching) checkpointing, which does bulk writes, fast-start checkpointing occurs incrementally. Each DBWn process periodically writes buffers to disk to advance the checkpoint position. The oldest modified blocks are written first to ensure that every write lets the checkpoint advance. Fast-Start checkpointing eliminates bulk writes and the resultant I/O spikes that occur with conventional checkpointing.

With the Fast-Start Fault Recovery feature, the FAST_START_MTTR_TARGET initialization parameter simplifies the configuration of recovery time from instance or system failure. FAST_START_MTTR_TARGET specifies a target for the expected mean time to recover (MTTR), that is, the time (in seconds) that it should take to start up the instance and perform cache recovery. After FAST_START_MTTR_TARGET is set, the database manages incremental checkpoint writes in an attempt to meet that target. If you have chosen a practical value for FAST_START_MTTR_TARGET, you can expect your database to recover, on average, in approximately the number of seconds you have chosen.

Note:

You must disable or remove the FAST_START_IO_TARGET, LOG_CHECKPOINT_INTERVAL, and LOG_CHECKPOINT_TIMEOUT initialization parameters when using FAST_START_MTTR_TARGET. Setting these parameters interferes with the mechanisms used to manage cache recovery time to meet FAST_START_MTTR_TARGET.

总结一下：你可以给这个参数设置一个value，然后呢数据库会根据这个value进行limit checkpoint之前的diry buffer number和redo number。然后呢这个参数可以优化了以前常规检查点（完全检查点）的一些弊端，而是采用了增量检查点机制，当然了这个参数也有一些注意的地方。（简略了，文档谁都能看懂）。
然后设置这个值有需要看看下面这个说法：

In principle, the minimum value for FAST_START_MTTR_TARGET is one second. However, the fact that you can set FAST_START_MTTR_TARGET this low does not mean that this target can be achieved. There are practical limits to the minimum achievable MTTR target, due to such factors as database startup time.

修改这个参数使用如下命令：
alter system set fast_start_mttr_target=value;

The MTTR target that your database can achieve given the current value of FAST_START_MTTR_TARGET is called the effective MTTR target. You can view your current effective MTTR by viewing the TARGET_MTTR column of the V$INSTANCE_RECOVERY view.

好了，然后看一下log_checkpoint_interval:

官方介绍如下：

LOG_CHECKPOINT_INTERVAL

Property	Description
Parameter type	Integer
Default value	`0`
Modifiable	`ALTER SYSTEM`
Range of values	`0` to 2³¹ - 1
Basic	No
Oracle RAC	Multiple instances can have different values.

LOG_CHECKPOINT_INTERVAL specifies the frequency of checkpoints in terms of the number of redo log file blocks that can exist between an incremental checkpoint and the last block written to the redo log. This number refers to physical operating system blocks, not database blocks.

Regardless of this value, a checkpoint always occurs when switching from one online redo log file to another. Therefore, if the value exceeds the actual redo log file size, checkpoints occur only when switching logs. Checkpoint frequency is one of the factors that influence the time required for the database to recover from an unexpected failure.

Notes:

Specifying a value of 0 (zero) for LOG_CHECKPOINT_INTERVAL has the same effect as setting the parameter to infinity and causes the parameter to be ignored. Only nonzero values of this parameter are considered meaningful.
Recovery I/O can also be limited by setting the LOG_CHECKPOINT_TIMEOUT parameter or by the size specified for the smallest redo log. For information on which mechanism is controlling checkpointing behavior, query theV$INSTANCE_RECOVERY view.

总结：log_checkpoint_intervla ；指定从增量检查点到下一次写入redo log 的block number，这个数据块是以操纵系统的数据块为准的。

在看一下，log_checkpoint_timeout:

LOG_CHECKPOINT_TIMEOUT

Property	Description
Parameter type	Integer
Default value	`1800`
Modifiable	`ALTER SYSTEM`
Range of values	`0` to 2³¹ - 1
Basic	No
Oracle RAC	Multiple instances can have different values.

LOG_CHECKPOINT_TIMEOUT specifies (in seconds) the amount of time that has passed since the incremental checkpoint at the position where the last write to the redo log (sometimes called the tail of the log) occurred. This parameter also signifies that no buffer will remain dirty (in the cache) for more than integerseconds.

Specifying a value of 0 for the timeout disables time-based checkpoints. Hence, setting the value to 0 is not recommended unless FAST_START_MTTR_TARGET is set.

Notes:

A checkpoint scheduled to occur because of this parameter is delayed until the completion of the previous checkpoint if the previous checkpoint has not yet completed.
Recovery I/O can also be limited by setting the LOG_CHECKPOINT_INTERVAL parameter or by the size specified for the smallest redo log. For information on which mechanism is controlling checkpointing behavior, query theV$INSTANCE_RECOVERY view.

在来看这么一个参数fast_start_parallel_rollback ：

FAST_START_PARALLEL_ROLLBACK

Property	Description
Parameter type	String
Syntax	`FAST_START_PARALLEL_ROLLBACK = { HIGH \| LOW \| FALSE }`
Default value	`LOW`
Modifiable	`ALTER SYSTEM`
Basic	No

FAST_START_PARALLEL_ROLLBACK specifies the degree of parallelism used when recovering terminated transactions. Terminated transactions are transactions that are active before a system failure. If a system fails when there are uncommitted parallel DML or DDL transactions, then you can speed up transaction recovery during startup by using this parameter.

Values:

FALSE

Parallel rollback is disabled
LOW

Limits the maximum degree of parallelism to 2 * CPU_COUNT
HIGH

Limits the maximum degree of parallelism to 4 * CPU_COUNT

If you change the value of this parameter, then transaction recovery will be stopped and restarted with the new implied degree of parallelism.

可知，假如恢复缓慢，可以设置该参数加快恢复进度，orale 使用另个特点增加事务的恢复，一个是fast-start on_demand rollback 另一就是现在的fast-start parallel rollback,他们都属于fast-start fault recovery的一部分。

案例演示：

SQL> create table amy as select * from scott.emp;

Table created.

SQL> begin
2 for i in 1..20 loop
3 insert into amy select * from amy;
4 end loop;
5 end;
6 /

PL/SQL procedure successfully completed.

SQL> select count(*) from amy;

COUNT(*)
----------
14680064

SQL>

SQL> show parameter fast_start_

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_io_target                 integer     0
fast_start_mttr_target               integer     0
fast_start_parallel_rollback         string      LOW
SQL>

SQL> show parameter cpu_count

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cpu_count integer 1
SQL>

SQL> delete from amy;
另一个session：

SQL> conn sys/root as sysdba
Connected.
SQL>
SQL> shutdown abort;
ORACLE instance shut down.
SQL>

恶魔到临：

SQL> startup
ORACLE instance started.

Total System Global Area 471830528 bytes
Fixed Size                  2254344 bytes
Variable Size             197134840 bytes
Database Buffers          268435456 bytes
Redo Buffers                4005888 bytes
Database mounted.
Database opened.
SQL> select usn,extents,rssize,writes,xacts,status from v$rollstat where xacts>0;

USN EXTENTS RSSIZE WRITES XACTS STATUS
---------- ---------- ---------- ---------- ---------- ---------------
5 206 526245888 0 1 ONLINE

SQL> select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions;

       USN        SLT        SEQ STATE            UNDOBLOCKSDONE UNDOBLOCKSTOTAL
---------- ---------- ---------- ---------------- -------------- ---------------
   CPUTIME
----------
         5         27       1329 RECOVERING                19032           64072
        39

SQL> set linesize 200
SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 19582 64072 44

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 19620 64072 47

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 20383 64072 48

SQL>
SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 20416 64072 49

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 20535 64072 51

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 20766 64072 52

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 20961 64072 53

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 21160 64072 54

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 21550 64072 56

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 21937 64072 57

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 22225 64072 58

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 22245 64072 59

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 22491 64072 60

SQL> show parameter fast_start

System altered.

SQL> select usn,extents,rssize,writes,xacts,status from v$rollstat where xacts>0;

select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions;

USN EXTENTS RSSIZE WRITES XACTS STATUS
---------- ---------- ---------- ---------- ---------- ---------------
5 206 526245888 0 1 ONLINE

SQL> SQL>
USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 626 35288 6

SQL>
SQL> select * from v$fast_start_servers;

STATE       UNDOBLOCKSDONE        PID XID
----------- -------------- ---------- ----------------
RECOVERING             626         24 05001B0031050000
RECOVERING               0         29 05001B0031050000
RECOVERING               0         31 05001B0031050000
RECOVERING               0         32 05001B0031050000

SQL> select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions;

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 2288 35288 14

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 2530 35288 15

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 2883 35288 16

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 2948 35288 18

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 3091 35288 19

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 3232 35288 21

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 3397 35288 22

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 4257 35288 26

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 4374 35288 27

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 4582 35288 28

SQL> r
1* select usn,slt,seq,state,undoblocksdone,undoblockstotal,cputime from v$fast_start_transactions

USN SLT SEQ STATE UNDOBLOCKSDONE UNDOBLOCKSTOTAL CPUTIME
---------- ---------- ---------- ---------------- -------------- --------------- ----------
5 27 1329 RECOVERING 4811 35288 29

SQL>

可以看到设置high后，数据库的恢复显然要高出很多。恢复的时间也大大缩短了。

that's all!