alert日志中出现Private Strand Flush Not Complete的处理方法-CSDN博客

alert.log日志还报了如下的错误：

Fri Oct 17 19:59:51 2014

Thread 1 cannot allocate new log, sequence 4722

Private strand flush not complete

Current log# 1 seq# 4721 mem# 0: /oradata/sgomp5/redo01.log

Thread 1 advanced to log sequence 4722 (LGWR switch)

Current log# 2 seq# 4722 mem# 0: /oradata/sgomp5/redo02.log

在MOS社区中找到了一篇关于这个问题的文章：

Historically, Every user session wrote the changes to redo log buffer and changes from redo log buffer are flushed to redo logs on disk by lgwr. As number of users increased, the race and the need to get latch for redo allocation and redo copy on the public redo buffer increased.

So, starting from 10g, Oracle came up with concept ofprivate redo (x$kcrfstrand) and in-memory undo (x$ktifp). Every session has private redo where session writes to and then a (small) batch of changes is written to public redo and finally from public redo log buffer to redo log files on disk. This mechanismreduces the gets/sleeps on redo copy and redo allocation latches on the public redo buffer and hence makes the architecture more scalable.

It is also worth noting that oracle falls back to old redo mechanism in case transaction is too big (with lots of changes) and if changes done by that transaction can't fit into private redo buffers.

当数据库切换日志时，所有private strand都必须刷新到当前日志，然后才能继续。此信息表示我们在尝试切换时，还没有完全将所有 redo信息写入到日志中。这有点类似于“checkpoint not complete”，不同的是，它仅涉及到正在被写入日志的redo。在写入所有redo前，无法切换日志。

Private Strands是10gR2才有的，它用于处理redo的latch（redo allocation latch）。是一种允许进程利用多个allocation latch更高效地将redo写入redo buffer cache的机制，它与9i中出现的log_parallelism参数相关。提出Strand的概念是为了确保实例的redo生成率达到最佳，并能确保在出现某种redo争用时，可以动态调整strand的数量进行补偿。初始分配的strand数量取决于CPU的数量，最少两个strand，其中一个strand用于active的redo生成。

对于大型的oltp系统，redo生成量非常大，因此当前台进程遇到redo争用时，这些strand会被激活。shared strand总是与多个private strand共存。Oracle 10g的redo（和undo）机制有一些重大变化，目的是为了减少争用。此机制不再实时记录redo，而是先记录在一个private area，并在commit时flush到redo log buffer中去。在这种新机制引入后，一旦用户进程申请到private strand，redo不再保存到pga中，因此不再需要redo copy latch这个过程。

如果新事务申请不到private strand的redo allocation latch，则会继续遵循旧的redo buffer机制，申请写入shared strand中。对于这个新的机制，在进行redo被写出到logfile时，LGWR需要将shared strand与private strand的内容写出。当redo flush发生时，所有的public strands的redo allocation latch需要被获取，所有的public strands的redo copy latch需要被检查，所有包含活动事务的private strands需要被持有。

其实，对于这个现象也可以忽略，除非“cannot allocate new log”信息和“advanced to log sequence”信息之间有明显的时间差。

如果想要在alert.log中避免出现Private strand flush not complete事件，那么可以通过增加参数db_writer_processes的值来实现，因为DBWn会触发LGWR将redo写入到logfile，如果有多个DBWn进程一起写，可以加速redo buffer cache写入redo logfile。

可以使用以下命令修改：

SQL> alter system set db_writer_processes=4 scope=spfile; --该参数时静态参数，必需重启数据库后生效

注意，DBWR进程数应该与逻辑CPU数相当。另外地，当oracle发现一个DB_WRITER_PROCESS不能完成工作时，也会自动增加其数量，前提是已经在初始化参数中设定过最大允许的值。

关于DB_WRITER_PROCESSES和DBWR_IO_SLAVES参数的一些说明：

DB_WRITER_PROCESSES replaces the Oracle7 parameter DB_WRITERS and specifies the initial number of database writer processes for an instance. If you use DBWR_IO_SLAVES, only one database writer process will be used, regardless of the setting for DB_WRITER_PROCESSES

DB_WRITER_PROCESSES参数就是在Oracle 7中的DB_WRITERS参数，用来指定数据库实例的DBWR进程个数，当系统中还配置了DBWR_IO_SLAVES参数时（默认为0），则只能利用到一个DBWn进程，而忽略其他的。

DBWR_IO_SLAVES

If it is not practical to use multiple DBWR processes, then Oracle provides a facility whereby the I/O load can be distributed over multiple slave processes. The DBWR process is the only process that scans the buffer cache LRU list for blocks to be written out. However, the I/O for those blocks is performed by the I/O slaves. The number of I/O slaves is determined by the parameter DBWR_IO_SLAVES.

当使用单一DBWR进程时，Oralce提供了使用多个I/O slave进程来完成模拟异步IO，去完成全本应该由DBWR做的事情（写LRU上的数据块到磁盘文件），这个slave的数量是通过DBWR_IO_SLAVES参数来指定的

DBWR_IO_SLAVES is intended for scenarios where you cannot use multiple DB_WRITER_PROCESSES (for example, where you have a single CPU). I/O slaves are also useful when asynchronous I/O is not available, because the multiple I/O slaves simulate nonblocking, asynchronous requests by freeing DBWR to continue identifying blocks in the cache to be written. Asynchronous I/O at the operating system level, if you have it, is generally preferred.

DBWR_IO_SLAVES参数通常被用在单CPU的场景中，因为单CPU即使设置了多DBWR进程数也是没有效果的。无论操作系统是否支持异步IO，使用多个I/O slaves都是有效的，可以分担DBWR的任务。如果使用了异步IO，那就更加推荐设置了

DBWR I/O slaves are allocated immediately following database open when the first I/O request is made. The DBWR continues to perform all of the DBWR-related work, apart from performing I/O. I/O slaves simply perform the I/O on behalf of DBWR. The writing of the batch is parallelized between the I/O slaves.

DBWR的I/O slaves当数据库open时发生第一次I/O请求时被分配，DBWR进程继续完成与自身相关任务，而分离出部分I/O处理任务给I/O slaves，各个I/O slaves之间的I/O处理都是并行的

Choosing Between Multiple DBWR Processes and I/O Slaves

Configuring multiple DBWR processes benefits performance when a single DBWR process is unable to keep up with the required workload. However, before configuring multiple DBWR processes, check whether asynchronous I/O is available and configured on the system. If the system supports asynchronous I/O but it is not currently used, then enable asynchronous I/O to see if this alleviates the problem. If the system does not support asynchronous I/O, or if asynchronous I/O is already configured and there is still a DBWR bottleneck, then configure multiple DBWR processes.

关于如何选择多个DBWR进程和I/O slaves进程

当单一的DBWR进程无法胜任大量的写工作负载，配置多个DBWR进程是有效的。但是在配置多个DBWR进程前，需要先检查OS上是否支持异步I/O，如果支持但未开启，那么先开启；如果系统不支持或已经配置了异步IO后，仍然有DBWR瓶颈，那么就可以配置多个DBWR进程

Using multiple DBWRs parallelizes the gathering and writing of buffers. Therefore, multiple DBWn processes should deliver more throughput than one DBWR process with the same number of I/O slaves. For this reason, the use of I/O slaves has been deprecated in favor of multiple DBWR processes. I/O slaves should only be used if multiple DBWR processes cannot be configured.

开启多个DBWR进程就意味着可以并行写更多的脏缓存（dirty buffer）到数据文件，而多个DBWR的吞吐量，也要比1个DBWR+相当数量的I/O slaves的要高，因此，当开启了多个DBWR进程时，就不应该再配置DBWR_IO_SLAVES（如果原来是非零的话），可以把这个参数设置为0

总结：

DBWR_IO_SLAVES主要用于模拟异步环境，在不支持异步操作的OS上，可以提高IO的读写速度。

多个DBWR进程可以并行地从data buffer中获取dirty block并且并行地写入磁盘。但是，在单DBWR+多个I/O slaves的场景下，只能是一个DBWR负责从data buffer中获取，而多个I/O slaves并行写入。如果系统支持AIO（disk_async_io=true），一般不用设置多dbwr 或io slaves。

如果在有多个cpu的情况下建议使用DB_WRITER_PROCESSES，因为这样的情况下不用去模拟异步模式，但要注意进程数量不能大于cpu数量。而在只有一个cpu的情况下建议使用DBWR_IO_SLAVES来模拟异步模式，以便提高数据库性能。

---------------------

作者：aaron8219

来源：CSDN

原文：https://blog.csdn.net/aaron8219/article/details/40398797

--------------------- --------------------- --------------------- --------------------- --------------------- --------------------- --------------------- --------------------- --------------------- --------------------- --------------------- --------------------- --------------------- --------------------- --------------------- --------------------- ---------------------

网友发来告警日志，原本是关于一个死锁的情形，而另外的一个问题则是从redo log buffer写出到redo log file出现了不能分配新的日志，Private strand flush not complete的等待事件。这是个和redo log相关的话题，从Meatlink也找到了对此的描述如下文。

1、错误消息

Tue Sep 24 14:27:48 2013

Thread 1 cannot allocate new log, sequence 22120

Private strand flush not complete

Current log# 4 seq# 22119 mem# 0: /u01/app/oracle/oradata/orcl/redo04.log

2、Meatlink 对此的描述(Doc ID 372557.1)

Oracle Database - Enterprise Edition - Version 10.2.0.1 to 11.2.0.3 [Release 10.2 to 11.2]

Information in this document applies to any platform.

Private strand flush not complete

Symptoms

"Private strand flush not complete" messages are being populated to the alert log, example:

Mon Jan 23 16:09:36 2012

Thread 1 cannot allocate new log, sequence 18358

Private strand flush not complete

Current log# 7 seq# 18357 mem# 0: /u03/oradata/bitst/redo07.log

Thread 1 advanced to log sequence 18358

Current log# 8 seq# 18358 mem# 0: /u03/oradata/bitst/redo08.log

Changes

When you switch logs all private strands have to be flushed to the current log before the switch is allowed to proceed.

--切换日值前，所有的private strands必须写入到当前的redo logfile

Cause

The message means that we haven't completed writing all the redo information to the log when we are trying to switch. It is similar in nature to a "checkpoint not complete" except that is only involves the redo being written to the log. The log switch can not occur until all of the redo has been written.

-->该消息意味着在日志切换前我们不能够完整的写出redo到日志文件。其本质类似于checkpoint not complete等待事件。所不同的是它仅仅涉及到正在被写入到日志的redo

A "strand" is new terminology for 10g and it deals with latches for redo . -->strand是一个用于处理redo latch的新术语

Strands are a mechanism to allow multiple allocation latches for processes to write redo more efficiently in the redo buffer and is related to the log_parallelism parameter present in 9i.

The concept of a strand is to ensure that the redo generation rate for an instance is optimal and that when there is some kind of redo contention then the number of strands is dynamically adjusted to compensate.

-->最大的作用是用于确保redo产生的速率达到最佳，并在出现相关redo竞争的时候动态调整strand的值进行补偿

The initial allocation for the number of strands depends on the number of CPU's and is started with 2 strands with one strand for active redo generation.

For large scale enterprise systems the amount of redo generation is large and hence these strands are *made active* as and when the foregrounds encounter this redo contention (allocated latch related contention) when this concept of dynamic strands comes into play.

There is always shared strands and a number of private strands .

Oracle 10g has some major changes in the mechanisms for redo (and undo), which seem to be aimed at reducing contention.

-->在10g中有很大的变化，最主要的目的还是为了减少竞争

Instead of redo being recorded in real time, it can be recorded 'privately' and pumped into the redo log buffer on commit.

Similarly the undo can be generated as 'in memory undo' and applied in bulk. This affect the memory used for redo management and the possibility to flush it in pieces. The message you get is related to internal Cache Redo File management.

...You can disregard these messages as normal messages. --->可以当作常规消息被忽略

Solution

These messages are not a cause for concern unless there is a significant time gap between the "cannot allocate new log" message and the "advanced to log sequence" message. --->如果"cannot allocate new log" 与"advanced to log sequence"有明显的时间间隔，应考虑增加db_writer_processes

Increasing the value for db_writer_processes can in some situations help to avoid the message from being generated. Why, because one of the DBWR main function is to keep the buffer cache clean by writing out dirty buffer blocks. So having multiple db_writer_processes should be able to produce a higher throughput.

Finally, these messages have also been seen when there are issues with the storage side or network for the archive log destination, as this leads to delay or hang in LGWR switch.

3、延伸思考

在高并发，多用户的数据库系统中，所有客户端进程都是通过向redo log buffer写入重做数据来确保数据的完整与一致性。对于redo log buffer的管理，则通过latch的机制来实现。和redo相关的latch主要有两个，一个是redo allocation latch，一个是redo copy latch。前者负责将为新的redo在redo log buffer中分配空间，后者则是pga中的redo复制到redo log buffer。下面是描述一下redo产生的流程。

用户进程产生redo(位于PGA中)====>服务器进程获取Redo Copy latch(存在多个取决于CPU_COUNT*2)====>服务进程获取redo allocation latch(仅1个)====>分配log buffer====>释放redo allocation latch====>将Redo Entry写入Log Buffer====>释放Redo Copy latch

如前文Doc ID 372557.1所述，Oracle 9.2之后引入了log_parallelism机制，当该参数的值大于1的时候，数据库会分配多个共享的redo log buffer，也就是说redo log buffer被再次细分，使得每个共享的buffer使用独立的redo allocation latch来进行保护以提高redo的并发性。这些个共享的redo log buffer就被称之为 shared strand。在10gR2以后了又多出了一个private strand，这个东东是从shared pool中分配而不是先前的log buffer。private strand为大量小的私有内存，通常每个大小在64kb-128kb左右，被独立的redo allocation latch所保护。每个特定的小事务会绑定到独立且空闲的private redolog strand，即绑定到一个活动事务。在这种新机制引入后，一旦用户进程申请到private strand，redo不再保存的pga中，因此不再需要redo copy latch这个过程。如果新事务申请不到private strand的redo allocation latch，则会继续遵循旧的redo buffer机制，申请写入shared strand中。由于新机制的引入，相应的redo的产生发生了一些变化，如下：

新事务开始====>申请private strand的redo allocation latch(申请失败则申请shared strand的redo allocation latch)====>在private strand中生产redo Entry====>flush/commit====>申请redo copy latch====>LGWR将redo entry批量写入log File====>释放redo copy latch====>释放Private strand的redo allocation latch

对于这个新的机制，在进行redo被写出到logfile时，LGWR需要将shared strand与private strand的内容写出。当redo flush发生时，所有的publicredo allocation latch需要被获取，所有的public strands的redo copy latch需要被检查，所有包含活动事务的private strands需要被持有。

由上可知，Private strand flush not complete事件的出现是通过增加参数DBWn的值来避免。因为DBWn会触发LGWR将redo写入到logfile。

---------------------

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/26736162/viewspace-2219167/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/26736162/viewspace-2219167/