checkpoint改语言设置_Metalink《CHECKPOINT TUNING AND ERROR HANDLING》翻译版本-CSDN博客

本文链接：https://blog.csdn.net/weixin_31164189/article/details/111946795

CHECKPOINT TUNING AND ERROR HANDLING

检查点优化和错误处理

1.What is a Checkpoint?

什么是检查点？

A Checkpoint is a database event which synchronizes the modified

data blocks in memory with the datafiles on

disk. 一个检查点是使在内存中被修改的数据和磁盘上的数据文件进行同步的一个数据库事件。

It offers Oracle the means for ensuring the consistency of data

modified by transactions. 它提供ORACLE作为确报被事务修改的数据的一致性的用途。

The mechanism of writing modified blocks on disk in Oracle is

not synchronized with the commit of the corresponding

transactions.

原理就是在ORACLE中写被修改的块到磁盘中不是和对应事务提交同步的。

A checkpoint has two purposes: (1) to establish data

consistency, and (2) enable faster database

recovery. 一个检查点有两个目的：(1)确保数据一致性，和(2)更快的数据库恢复。

How is recovery faster?

怎么使恢复更快？

Because all database changes up to the checkpoint have been

recorded in the datafiles, making it unnecessary to apply redo log

entries prior to the checkpoint.

原因是所有的数据库变化到检查点已经被记录在数据文件中，使它没有必要再应用检查点之前的日志。

The checkpoint must ensure that all the modified buffers in the

cache are really written to the corresponding datafiles to avoid

the loss of data which may occur with a crash (instance or disk

failure).

检查点必须保证内存中所有被修改的缓存是真正的被写到了对应的数据文件中从而避免因崩溃(实例或者磁盘故障)导致数据丢失。

Oracle writes the dirty buffers to disk only on certain

conditions:

Oracle 只有在某些条件下才把脏缓存写到磁盘：

- A shadow process must scan more than

one-quarter of the db_block_buffer parameter.

后台进程扫描超过1/4

db_block_buffer参数(有点变扭，自己理解为当寻找数据块在链表上超过1/4)

注：database buffer

cache的大小8i以前是db_block_buffers*db_block_size的值，从9i开始就只需要设置db_cache_size

- Every three seconds.

每三秒钟

- When a checkpoint is produced.

当检查点产生

A checkpoint is realized o的n five types of events:

一个检查点在5种事件情况下产生：

- At each switch of the redo log files.

日志文件的每次切换

- When the delay for LOG_CHECKPOINT_TIMEOUT is

reached.

当LOG_CHECKPOINT_TIMEOUT 参数达到

- When the size in bytes corresponding to

(LOG_CHECKPOINT_INTERVAL* size of IO OS blocks)

is written on the current redo log file.

当(LOG_CHECKPOINT_INTERVAL * size of OS blocks)字节被写到当前日志文件

- Directly by the ALTER SYSTEM SWITCH LOGFILE

command.

直接使用ALTER

SYSTEM SWITCH LOGFILE 命令

- Directly with the ALTER SYSTEM CHECKPOINT

command.

直接使用ALTER

SYSTEM CHECKPOINT 命令

During a checkpoint the following occurs:

当一个检查点发生：

- The database writer (DBWR)

writes all modified database blocks in the buffer cache back to

datafiles,

DBWR进程把内存中所有修改过的数据块写到数据文件中

- Log writer (LGWR) updates

both the controlfile and the datafiles to indicate when the last

checkpoint occurred (SCN)

LGWR更新控制文件和数据文件的SCN以表明上次检查点发生

2. Checkpoints and Performance

检查点和性能

Checkpoints present a tuning dilemma for the Database

Administrator. 检查点使DBA处于优化进退两难的地步。

Frequent checkpoints will enable faster recovery, but can cause

performance degradation.

频繁的检查点可以使恢复更快，但会导致性能下降。

How then should the DBA address this?

然后DBA怎么处理处理这个？

Depending on the number of datafiles in a database, a checkpoint

can be a highly resource intensive operation,

since all datafile headers are frozen during the

checkpoint. 依赖于数据库中的数据文件的数量，一个检查点可以被认为是高度资源集中的操作，因为在检查点发生时所有数据文件头部都被冻结。

There is a performance trade-off regarding frequency of

checkpoints. 关于频繁的检查点是一个性能权限。

More frequent checkpoints enable faster database recovery after

a crash. 更频繁的检查点使数据库在崩溃后很快恢复。

This is why some customer sites which have a very low tolerance

for unscheduled system downtime will often choose this

option.

这也是一些对于事先未安排的系统宕机有很差容度的服务站点会选择这种操作的原因。

However, the performance degradation of frequent checkpoints may

not justify this philosophy in many cases.

但是在很多情况下，频繁的检查点导致性能的下降

Let''s assume the database is up and running 95% of the time,

and unavailable 5% of the time from infrequent instance

crashes

or hardware failures requiring database

recovery. 让我们来假设数据库95%的时间在运行，和5%由于极少情况下的实例CRASH或者需要数据库恢复的硬件错误而不可用

For most customer sites, it makes more sense to tune for the 95%

case rather than the rare 5% downtime.

对于大多数服务站点，调正95%的情况比5%的停机时间更有意义。

This bulletin assumes that performance is your number one

priority and so recommendations are made accordingly.

无法翻译

Therefore, your goal is to minimize the frequency of checkpoints

through tuning.

因此，你的目标就是通过优化减少检查点的频率。

Tuning checkpoints involves four key initialization

parameters:

优化检查点包括四个核心初始化参数：

- FAST_START_MTTR_TARGET

- LOG_CHECKPOINT_INTERVAL

- LOG_CHECKPOINT_TIMEOUT

- LOG_CHECKPOINTS_TO_ALERT

These parameters are discussed in detail below.

这些参数在下面将详细讨论.

Recommendations are also given for handling "checkpoint not

complete" messages found in the alert log,

which indicate a need to tune redo logs and checkpoints.

处理在日志文件中表明需要调整日志和检查点的 "checkpoint not complete" 信息的方法也会被建议。

3. Parameters related to incremental checkpointing

和增量检查点相关的参数

Note: Log file switches will always override checkpoints caused by

following parameters.

注意:日志文件切换经常会覆盖有下面参数引起的检查点

3.1 FAST_START_MTTR_TARGET

Since Oracle 9i FAST_START_MTTR_TARGET parameter is the preferred

method of tuning incremental checkpoint target.

从9I以来，参数FAST_START_MTTR_TARGET 是作为优化增加检查点的首选方法。

FAST_START_MTTR_TARGET enables you to specify the number of

seconds the database takes to perform crash recovery of a single

instance.

FAST_START_MTTR_TARGET 使你指定单实例数据库CRASH恢复的秒数。

Based on internal statistics, incremental checkpoint

automatically adjusts the checkpoint target to meet the requirement

of FAST_START_MTTR_TARGET.

根据内部统计，增量检查点自动调整检查点目标从而满足FAST_START_MTTR_TARGET的需要。

V$INSTANCE_RECOVERY.ESTIMATED_MTTR shows the current estimated

mean time to recover (MTTR) in seconds.

V$INSTANCE_RECOVERY.ESTIMATED_MTTR 表明当前MTTR(平均恢复时间)的估算秒数。

This value is shown even if FAST_START_MTTR_TARGET is not

specified.

及时FAST_START_MTTR_TARGET没有被指定，这个值也会被显示。

V$INSTANCE_RECOVERY.TARGET_MTTR shows the effective MTTR target

in seconds enforced by the system.

V$INSTANCE_RECOVERY.TARGET_MTTR 表示被系统执行有效的MTTR目标(秒)

V$MTTR_TARGET_ADVICE shows the number of I/Os resulted by the

current workload under the current MTTR setting

and the estimated number of I/Os that would be resulted by the

current workload under other MTTR settings.

V$MTTR_TARGET_ADVICE

表示在当前MTTR设置下的当前系统负载所产生的I/O数量和在其他MTTR设置下当前负载的IO估算量。

This view helps the user to assess the trade-off between runtime

performance and setting FAST_START_MTTR_TARGET to achieve better

recovery time.

这个视图帮助用户评估运行时候的性能和设置FAST_START_MTTR_TARGET达到更好恢复时间的平衡。

3.2 LOG_CHECKPOINT_INTERVAL

LOG_CHECKPOINT_INTERVAL parameter specifies the maximum number of

redo blocks the incremental checkpoint target should lag the

current log tail.

理解，但无法用语言来表达^_^

If FAST_START_MTTR_TARGET is specified, LOG_CHECKPOINT_INTERVAL

should not be set or set to 0.

如果FAST_START_MTTR_TARGET被指定了，LOG_CHECKPOINT_INTERVAL不应该设置或者设置为0.

On most Unix systems the operating system block size is 512

bytes.

在去多UNIX操作系统上块大小为512字节。

This means that setting LOG_CHECKPOINT_INTERVAL to a value of

10,000 would mean the incremental checkpoint

target should not lag the current log tail by more than 5,120,000

(5M) bytes.

这意思是设置LOG_CHECKPOINT_INTERVAL为10,000 增量检查点不应该落后于当期日志末尾5M

(注意对当期日志末尾的理解，不是日志文件的末尾，是当前在日志文件LGWR写的位置)

If the size of your redo log is 20M, you are taking 4

checkpoints for each log.

如果日志文件的大小为20M，你将在每个日志文件发生4次检查点。

LOG_CHECKPOINT_INTERVAL influences when a checkpoint occurs,

which means careful attention should be given to the

setting of this parameter,keeping it updated as the size of the

redo log files is changed. 当LOG_CHECKPOINT_INTERVAL影响一个检查点的发生，应该更加小心当日志文件的大小被改变，必须更新这个参数的设置

The checkpoint frequency is one of the factors which impacts the

time required for the database to recover from an unexpected

failure. 检查点的频率是数据库从意外故障恢复所需时间的重要一个因素

Longer intervals between checkpoints mean that if the system

crashes, more time will be needed for the database to

recover. 检查点之间较长的间隔意味着如果系统CRASH，将花费更多的时间来恢复。

Shorter checkpoint intervals mean that the database will recover

more quickly,

at the expense of increased resource utilization during the

checkpoint operation.

较短的检查点间隔意味着数据库会更快恢复，以在检查点操作更多资源的代价

This parameter also impacts the time required to complete a

database recovery operation during the roll forward phase of

recovery. 这个参数也反映在完成实例前滚恢复操作所需的时间

The actual recovery time is dependent upon this time, and other

factors, such as the type of failure

(instance or system crash, media failure, etc.), and the number of

archived redo logs which need to be applied.

实际恢复的时间依赖于这个时间，和其他因素，比如故障的类型(实例或者系统CRASH，介质故障，等等)，还有有多少归档文件需要被恢复的数目。

3.3 LOG_CHECKPOINT_TIMEOUT

The LOG_CHECKPOINT_TIMEOUT parameter specifies the maximum number

of seconds the incremental checkpoint target should lag the current

log tail.

LOG_CHECKPOINT_TIMEOUT 参数指定最大的时间增量检查点落后于当前日志末尾

In another word, it specifies how long a dirty buffer in buffer

cache can remain dirty.

换句话说，它指定在缓存中赃块能保证多长时间。

Oracle recommends using LOG_CHECKPOINT_INTERVAL to control the

checkpoint interval rather than LOG_CHECKPOINT_TIMEOUT,

which will initiate a checkpoint every "n" seconds, regardless of

the transaction frequency. ORACLE

建议使用LOG_CHECKPOINT_INTERVAL控制检查点的间隔而不是用会每隔N秒发起一个检查点，而不管事务的频率LOG_CHECKPOINT_TIMEOUT。

This can cause unnecessary checkpoints in cases where

transaction volumes vary.

倘若事务这可能导致不必要的检查点。

Unnecessary checkpoints must be avoided whenever possible for

optimal performance.

为了最优性能应该避免不必要的检查点。

It is a misconception that setting LOG_CHECKPOINT_TIMEOUT to a

given value will

initiate a log switch at that interval, enabling a recovery window

used for a stand-by database configuration. 这是个错误的想法通过设置LOG_CHECKPOINT_TIMEOUT参数值来发起周期的日志切换，为STANDBY数据库能够有恢复窗口。(翻译的有点烂)

Log switches cause a checkpoint,but a checkpoint does not cause a

log switch. 日志切换导致一个检查点，但检查点不会导致日志切换。

The only way to cause a log switch is manually with ALTER SYSTEM

SWITCH LOGFILE

or resizing the redo logs to cause more frequent

switches. 唯一的使日志切换的办法就是手工执行ALTER SYSTEM SWITCH LOGFILE 或者

改变日志文件的大小从而触发更频繁的切换。

This is controlled by operating system blocks, not a timed

interval.

这是由操作系统块控制而不是时间间隔。

Sizing of the online redo logs is critical for performance and

recovery.

日志文件的大小对于性能和恢复很重要。

See additional sections below on redo logs and

checkpoints.

下面有关于日志文件和检查点额外的部分

3.4 LOG_CHECKPOINTS_TO_ALERT

LOG_CHECKPOINTS_TO_ALERT lets you log your checkpoints to the alert

file.

LOG_CHECKPOINTS_TO_ALERT 让你在日志文件中记录你的检查点。

Doing so is useful for determining whether checkpoints are

occurring at the desired frequency.

这样做对于决定检查点在否在周期的发生很有用。

Prior to Oracle9i this parameter was STATIC.

在9I之前这个参数是静态的。

Oracle generally advises this be set to TRUE as the overhead is

negligible but the information in the alert log may be

useful.

ORALCE 通常建议设置为TRUE 因为代价几乎微不足道，而已这个在日志中的信息很有用

See Note 76713.1 to have more detail on How those instance

parameters can influence the checkpoint.

在Note 76713.1 有关于这些实例参数如何影响检查点有更详细的解释

4. Redo logs and Checkpoint

日志文件和检查点

A checkpoint occurs at every log switch. 在每次日志切换都发生检查点。

If a previous checkpoint is already in progress, the checkpoint

forced by the log switch will override the current

checkpoint.

如果先前的检查点已经在进行中，由日志切换引起的检查点会覆盖当期的检查点。

This necessitates well-sized redo logs to avoid unnecessary

checkpoints as a result of frequent log switches.

这个需求精心设计日志文件大小从而不会由于频繁的日志切换避免不必要的检查点。

The lag between the incremental checkpoint target and the log tail

is also limited by 90% of the smallest online log file size.

增量检查点和日志末尾之间的间隔被最小在线日志文件大小的90%所限制。

This makes sure that in most cases log switch would not need to

wait for checkpoint.

这样保证在大多数情况下日志切换没必要等检查点。

Because of this, log file sizes should be configured large

enough.

因为这个，日志文件的大小应该被设置足够大。

A good rule of thumb is to switch logs at most every twenty

minutes.

最好的经验法则就是至多20分钟切换日志。

Having your log files too small can increase checkpoint activity

and reduce performance.

你的日志文件太小会增加检查点的活跃和降低性能。

Oracle recommends the user to set all online log files to be the

same size, and have at least two log groups per

thread. Oracle 建议用户设置所有在线日志相同的大小和每个线程至少两个日志组。

The alert log is a valuabletool for monitoring the rate that log

switches occur, and subsequently, checkpoints occur.

日志文件作为监控日志切换速率和检查点发生的可视化工具 The following is an example of quick log switches from the alert

log:

下面就是日志文件中快速切换的例子

Fri May 16 17:15:43 1997

Thread 1 advanced to log sequence 1272

Current log# 3 seq# 1272 mem# 0:

/prod1/oradata/logs/redologs03.log

Thread 1 advanced to log sequence 1273

Current log# 1 seq# 1273 mem# 0:

/prod1/oradata/logs/redologs01.log

Fri May 16 17:17:25 1997

Thread 1 advanced to log sequence 1274

Current log# 2 seq# 1274 mem# 0:

/prod1/oradata/logs/redologs02.log

Thread 1 advanced to log sequence 1275

Current log# 3 seq# 1275 mem# 0:

/prod1/oradata/logs/redologs03.log

Fri May 16 17:20:51 1997

Thread 1 advanced to log sequence 1276

Current log# 1 seq# 1276 mem# 0:

/prod1/oradata/logs/redologs01.log

If redo logs switch every 3 minutes, you will see performance

degradation.

如果日志每三分钟切换，你将看到性能下降。

This indicates the redo logs are not sized large enough to

efficiently handle

the transaction load. size of the redolog files.

这表明日志文件设置的不够大来有效率的处理事务压力。改变日志文件的大小。

5. Understanding Checkpoint Error messages (“Cannot allocate new log” and “Checkpoint not complete”)

理解检查点错误信息("Cannot allocate new log"和"Checkpoint not complete")

Sometimes, you can see in your alert.log file, the following

corresponding messages:

有时候，你可以在日志文件里看到类型下面的信息

Thread 1 advanced to log sequence 248

Current log#

2 seq# 248 mem# 0: /prod1/oradata/logs/redologs02.log

Thread 1 cannot allocate new log, sequence

249

Checkpoint not complete

This message indicates that Oracle wants to reuse a redo log

file, but

the current checkpoint position is still in that log.

这个信息表明ORACLE想重用一个日志文件，但当前的检查点还是在那个日志中。

In this case, Oracle must wait until the checkpoint position

passes that log.

这种情况下，Oracle必须等待直到检查点经过那个日志。

Because the incremental checkpoint target never lags the current

log tail by more than 90%

of the smallest log file size,

因为增量检查点从不延迟离当期的日志末尾超过最小日志文件的90%

this situation may be encountered if DBWR writes too slowly, or

if a log switch happens before the log is completely full,

or if log file sizes are too small.

这种情况可能在DBWR写比较慢的时候，或者日志在未完全满而发生切换或者日志文件太小

When the database waits on checkpoints,redo generation is

stopped until the log switch is done.

当数据库在检查点上等待，日志为停止生产直到日志切换完成。

6. Oracle Release Information

Oracle 版本信息

In Oracle8i initialization parameter FAST_START_IO_TARGET causes

incremental

checkpoint to automatically adjusting its target so that the number

of data

blocks needed by recovery would be no more than

FAST_START_IO_TARGET.

在8I初始化参数FAST_START_IO_TARGET使增量检查点子弟哦那个调整因此

恢复所需要的数据块数量不会超过FAST_START_IO_TARGET

This parameter has been deprecated since Oracle 9i in favor of

parameter FAST_START_MTTR_TARGET.

这个参数从9I开发被废弃，由FAST_START_MTTR_TARGET代替

7. Using Statspack to determine Checkpointing problems

用Statspack确定检查点问题

Statspack snapshots can be taken every 15 minutes or so, these

reports gather useful

information about number of checkpoints started and checkpoints

completed and number

of database buffers written during checkpointing for that window of

time .

Statspack 大概每15分钟被采集，这些报告收集有用的数据，关于检查点开始结束的数目和在这段时间

有多少数据缓存写入。

It also contains statistics about redo activity.

它也包括关于日志活跃的统计信息

Gathering and comparing these snapshot reports gives you a

complete idea about checkpointing performance at different periods

of time.

收集和比较这些snapshot报告可以给你一个完整的关于在不同时间内检查点性能的概念

Another important thing to watch in statspack report is the

following wait events, they could be a good indication

about problems with the redo log throughput and

checkpointing:

另外重要的事情就是在statspack中观察下面等待事件，它们可能是关于日志文件吞吐量和检查点很好的指示

log file switch (checkpoint incomplete)

log file switch (archiving needed)

log file switch/archive

log file switch (clearing log file)

log file switch completion

log switch/archive

log file sync

In the case when one or more of the above wait events is repeated

frequently

with considerable values then you need to take an action like

adding More

online redo log files or increasing their sizes and/or modifying

checkpointing parameters

既然如此，当一个或者多个上面等待事件频繁出现到值得考虑的值时，

你需要采取如增加更多的日志文件或者增加日志文件大小和/或修改检查点参数的行动。