REDO strand log LGWR worker

1.  

只是提示 ,不代表磁盘性能或者redo log数量不足。

When you switch logs, all private strands have to be flushed to the current log before the switch is allowed to proceed.

SOLUTION

This is expected behavior according to development.  Following is an extract from unpublished Bug 5241081 on why this can occur during manual log switch.

"Technically all the strands need to be flushed when the log switch is being initiated and hence the messages for checkpoint not complete and private strand flush not complete are synonymous. The crux is that the strand still have transactions active which need to be flushed before this redo can be overwritten, would recommend letting Oracle perform the checkpointing by default and tune fast_start_mttr_target to achieve what one is looking for."

This issue is very easy to duplicate on any database running in archivelog mode.  When the online logs are manually switched with no sessions connected, the problem messages are not displayed in the alert log.

SQL> alter system switch logfile;
System altered.

Alert log shows the following.

Thread 1 advanced to log sequence 206
Current log# 3 seq# 206 mem# 0: +GROUP01/oasm/onlinelog/group_3.258.609176403
Current log# 3 seq# 206 mem# 1: +GROUP01/oasm/onlinelog/group_3.263.609176613 

Now connect to another session (using scott/tiger schema object in the example) and leave an update TX active.

SQL> update emp set sal = sal * 1;
14 rows updated.
... do not commit and keep session active

Then manually switch the logs in the original session, and now the alert log shows the following messages. 

Thread 1 cannot allocate new log, sequence 207
Checkpoint not complete
Current log# 3 seq# 206 mem# 0: +GROUP01/oasm/onlinelog/group_3.258.609176403
Current log# 3 seq# 206 mem# 1: +GROUP01/oasm/onlinelog/group_3.263.609176613

Or the following.

Thread 1 cannot allocate new log, sequence 85
Private strand flush not complete
Current log# 3 seq# 84 mem# 0: C:\ORACLE\ORA11\ORADATA\ORCL11\REDO03.LOG

This is not a problem, but expected behavior. So when you manually switch logs on a DB with any activity at all, you will likely see these messages written to the alert log.  This can also occur when have archive_lag_target set.

提示不是误报只是期待行为。

2...

Frequent Log Switches

These are the most relevant factors:

1. log_buffer size
If this is not explicitly set by the DBA then we use a default; at instance startup oracle calculates the number of shared redo strands as ncpus/16, and the size of each strand is 128Kb * ncpus (where ncpus is the number of CPUs in the system). The log buffersize is the number of stands multiplied by the strand size.

2. System load
Initially only one redo strand is used, ie the number of "active" redo strands is 1, and all the processes copy their redo into that one strand. When/if there is contention for that strand then the number of active redo strands is raised to 2. As contention for the active strands increases, the number of active strands increases. The maxmum possible number of active redo strands is the number of strands initially allocated in the log buffer.
(This feature is called "dynamic strands", and there is a hidden parameter to disable it which then allows processes to use all the strands from the outset).

3. Log file size
It may not always be possible to provide a specific size recommendation for redo log files, but redo log files in the range of a hundred megabytes to a few gigabytes are considered reasonable. Size your online redo log files according to the amount of redo your system generates. A rough guide is to switch logs at most once every twenty minutes."

4. the logfile space reservation algorithm
When oracle switches into a new online redo logfile, all the log buffer redo strand memory is "mapped" to the logfile space. If the logfile is larger than the log buffer then each strand will map/reserve its strand size worth of logfile space, and the remaining logfile space (the "log residue") is still available.
If the logfile is smaller than the log buffer, then the whole logfile space is divided/mapped/reserved equally among all the strands, and there is no unreserved space (ie no log residue).
When any process fills a strand such that all the reserved underlying logfile space for that strand is used, AND there is no log residue, then a log switch is scheduled.

RDBMS parameter LOG_BUFFER_SIZE

If this is not explicitly set by the DBA then we use a default;
at instance startup the RDBMS  calculates the number of shared redo
strands as ncpus/16, and the size of each strand is 128Kb * ncpus
(where ncpus is the number of CPUs in the system). The log buffer
size is the number of stands multiplied by the strand size.
The calculated or specified size is rounded up to a multiple of the granule size
of a memory segment in the SGA. For 11.2 if
SGA size >= 128GB then granule size is 512MB
64GB <= SGA size < 128GB then granule size is 256MB
32GB <= SGA size < 64GB then granule size is 128MB
16GB <= SGA size < 32GB then granule size is 64MB
8GB <= SGA size < 16GB then granule size is 32MB
1GB <= SGA size < 8GB then granule size is 16MB
SGA size < 1GB then granule size is 4MB
There are some minimums and maximums enforced.


b) System load

Initially only one redo strand is used, ie the number of "active"
redo strands is 1, and all the processes copy their redo into
that one strand. When/if there is contention for that strand then
the number of active redo strands is raised to 2. As contention
for the active strands increases, the number of active strands
increases. The maxmum possible number of active redo strands is
the number of strands initially allocated in the log buffer.
(This feature is called "dynamic strands", and there is a hidden
parameter to disable it which then allows processes to use all
the strands from the outset).
 


c) Log file size

This is the logfile size decided by the DBA when the logfiles are created.


d) The logfile space reservation algorithm

redo strand 映射redolog file,导致redo 可能不能全部写满就要切换。导致redo大小不一。

切换的时候保整redo strand要写入redo file

When the RDBMS switches into a new online redo logfile, all the
log buffer redo strand memory is "mapped" to the logfile space.
If the logfile is larger than the log buffer then each strand
will map/reserve its strand size worth of logfile space, and the
remaining logfile space (the "log residue") is still available.
If the logfile is smaller than the log buffer, then the whole
logfile space is divided/mapped/reserved equally among all the
strands, and there is no unreserved space (ie no log residue).
When any process fills a strand such that all the reserved
underlying logfile space for that strand is used, AND there is
no log residue, then a log switch is scheduled.

Example : 128 CPU's so the RDBMS allocates a
log_buffer of size 128Mb containing 8 shared strands of size 16Mb.
It may be a bit larger than 128Mb as it rounds up to an SGA granule boundary.
The logfiles are 100Mb, so when the RDBMS switches into a
new online redo logfile each strand reserves 100Mb/8 = 25600 blocks
and there is no log residue. If there is low system load, only one
of the redo strands will be active/used and when 25600 blocks of
that strand are filled then a log switch will be scheduled - the created
archive logs have a size around 25600 blocks.

With everything else staying the same (128 cpu's and low load),
using a larger logfile would not really reduce the amount of
unfilled space when the log switches are requested, but it would
make that unfilled space less significant as a percentage of the
total logfile space, eg

- with a 100Mb logfile, the log switch happens with 7 x 16Mb
logfile space unfilled (ie the logfile is 10% full when the
log switch is requested)

- with a 1Gb logfile, the log switch would happen with 7 x 16Mb
logfile space unfilled (ie the logfile is 90% full when the
log switch is requested)

With a high CPU_COUNT, a low load and a redo log file size smaller than
the redolog buffer, you may see small archived log files because of log switches
at about 1/8 of the size of the define log file size.
This is because CPU_COUNT defines the number of redo strands (ncpus/16).
With a low load only a single strand may be used. With redo log file size smaller
than the redolog buffer, the log file space is divided over the available strands.
When for instance only a single active strand is used, a log switch can already occur
when that strand is filled.

3.。。。。CPU 越多,redo write 越慢

SYMPTOMS

  • The higher the value for CPU_COUNT, the longer log file parallel writes take and hence the higher the waits for log file sync.
  • In a test using the same workload on the same server, but with different settings for the cpu_count parameter, the following results were observed:

    cpu_count = 64
    log file parallel write = 4 ms
    log file sync = 9 ms

    cpu_count = 256
    log file parallel write = 10 ms
    log file sync = 20 ms

    cpu_count = 1024
    log file parallel write = 20 ms
    log file sync = 39 ms

  • Other tests using the same workload on the same server, but with different settings for the _log_parallelism_max and _log_simultaneous_copies, indicate that these also affect the log file parallel write times.

CAUSE

This is expected behavior as described in an unpublished bug.

The higher the value for CPU_COUNT, the more strands can be used for writing redo. Having an increased number of strands can lead to an increase in the average waits on Log File Parallel Write, and hence Log File Sync.

CPU多 Strand多,一起写 异步或者同步都会导致平均写得慢

Filesystems that do not support CIO (Concurrent IO) are more likely to be affected by this, compared to ASM (Automatic Storage Management) or raw devices. However, it can also impact filesystems that support CIO.

SOLUTION

1) Reduce  _log_parallelism_max to a lower value than calculated by default.  It should not be reduced too low since reducing can cause waits for the redo allocation latch.

redo strand太少,回到旧版本情况,DML过多导致redo latch资源分配效率低

The parameter _log_parallelism_max specifies the maximum number of redo strands in the system. The minimum value of this parameter is 1 and maximums is 256. The default value of this parameter is max(2, no_of_CPUs/16). In earlier versions, having log parallelism set to greater than 8 was not recommended . This was written when the number of cpus on high end servers typically did not exceed 128, hence, the calculated default value also did not exceed 8.

It was determined that the behaviour observed is expected due to the increased number of strands. Reducing the number of strand, can reduce the log file parallel write times, but may lead to a increase in waits for the redo allocation latch, which could lead to overall performance degradation. Therefore, there is a trade off between waits for log file sync/log file parallel write and waits for redo allocation latch. Hence, the parameter _log_parallelism_max should not be changed by customers unless directed by Oracle support. A setting of 8 may be suitable for some customers, but a different value may be better for others given the trade off between log file parallel writes and redo latch allocation.

2) _log_simultaneous_copies

Adjusting this parameter may also influence the log file parallel write time. Since LGWR needs to get all the copy latches in turn when forcing the log, it is not necessarily a good thing to add lots of copy latches, especially for OLTP.  Adding extra copy latches reduces the probability that foregrounds will miss on getting a copy latch and have to get another copy latch.  When the number of misses exceeds the number of gets of copy latches performed by LGWR, it is beneficial to add more copy latches. The number of gets of copy performed by LGWR is equal to the number of redo writes times the number of copy latches. Changing this parameter may reduce log file parallel write times, but could cause higher latch waits. There is no correlation between this parameter and _log_parallelism_max.

The _log_simultaneous_copies specifies the number of copy latches in the system. The default value for this parameter, number of copy latches, is the 2*no_of_CPUs. Copy latches are used to allow no more redo generators (flow control for FGs) than the number of CPUs. These latches also serve other purposes like validation of redo, dealing with process death during redo generation etc.

The current recommendation is _log_simultaneous_copies should not be changed from its default value.

在12C里增加了LGnn进程,用于实际写入REDO数据,LGWR完全不管写Redo Log文件的事情,只负责发布一些和REDO落盘的消息了。

目前我看的关于LGWR worker的资料不多,从一些资料和我对LGWR的理解,LGWR worker应该是和Oracle Redo Strand有关的。Oracle的LGWR worker都是分配到GROUP的,GROUP的数量如果是和Redo public Strand相关,那么每个group就之间就不需要通过锁机制来同步写入工作。LGWR 也不需要在多个worker之间做协同,而仅仅需要做和消息公告相关的共组了,这种机制应该是最为高效的。如果多个worker之间写REDO文件还需要闩锁来做串行化,那么效率肯定是不会好的。

Redo Strand从Oracle 11开始就已经被用来加速REDO性能了,Strand的目的是为了提高并发写入Redo Log buffer和Redo Log文件时候的性能,减少因为串行化闩锁等待导致的REDO性能问题。Oracle会根据CPU_COUNT的值,自动的调整Redo srand的数量。

Oracle会根据CPU_COUNT/16来设定Strand的数量,在LOG BUFFER中会按照Strand

数量划分为N个子池,写入REDO数据的时候,可以并发的写入不同的STRAND,这样可以减少高并发LOG BUFFER写入的性能。为了确保这一机制起作用,在Redo Log文件中,也是按照Strand的方式分配Redo Log文件,这种模式可以让Redo Log文件的写入也可以高速并发。Redo Strand为12C的LGWR worker称为默认开启的功能打下了一个良好的基础。

我这个环境的CPU_COUNT是16,而每个实例的Redo Strand最小值是2,因此启动实例的时候也启动了2个LGWR worker,这说明数据库实例有两个LGWR worker group,当系统空闲,没有什么需要写入的REDO数据的时候,LGnn都在等待空闲等待事件LGWR worker group idle,而lgwr进程在等待rdbms ipc message。

通过strace看lgwr,也只是在做一些信号量方面的操作。我们再来看看空闲时的LGnn。

也是在相同的信号量上休眠。

可以看出LGWR的等待事件发生了变化,而LGnn的等待事件也和以前的LGWR十分类似。从等待事件上看,当一个worker完成工作后,会处于Ordering等待,等待获取另外的写任务。在具体实现算法上,还并没有和我想象的一样不需要调度。我们再来TRACE一下LGWR。

可以看出LGWR还是十分频繁的在操作那个信号量,这很可能是LGWR在积极参与日志写的调度协调。

从worker的行为上也看到了和LGWR之间的互动。这说明Oracle并发日志写还是需要多进程之间的同步行为,不是完全自主的无阻塞的。因此在某些场景下,可能会导致当WORKER数量过多时,引发Log file parallel write的等待时间过长,从而引起LOG FILE SYNC的增加,影响数据库的性能。

当年Oracle的REDO STRAND成为默认开启的时候,也出现过类似的问题,因为STRANDS数量时和CPU_COUNT相关的。十多年前在Oracle 11g上就有人发现了当CPU数量很多的时候,log file sync会莫名其妙的变坏。

当时的建议时通过_log_parallelism_max参数来减少Strand的数量,解决过多的Strand带来的性能问题。对于LGWR worker机制,Oracle也提供了一个类似的参数来进行控制,这个参数就是“_max_log_write_parallelism”。

在Oracle 12C或者以后版本中,也经常会出现因为LGWR worker导致的性能问题。Oracle可以通过“_use_single_log_writer”参数来进行调整。默认情况下这个参数的值时ADAPTIVE,这意味着Oracle会自己根据工作负载来选择工作模式。如果遇到这方面的性能问题的时候,可以将这个参数设置为TRUE,强制使用单个LGWR,也就是恢复以前的工作模式。如果你发现你的数据库从11G升级到12C之后,log file sync变坏了,从而导致了一些性能问题,你可以考虑调整这个参数。

 

实际上我看到一些国产数据库现在也在考虑使用多个WAL WRITER提升高并发WAL写入的性能,从而更为充分的利用SSD等现代硬件。不过WAL写入对于延时十分敏感,算法写不好,就容易引发更为严重的闩锁串行问题。Oracle的Redo Strand与LGWR worker相结合的机制应该是目前最值得借鉴的方法了。如果不针对WAL BUFFER做Strand分区,那么多个WAL WRITER的并发控制的成本会更高。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值