The placeholder holds those events above whilst the process is waiting to get this block image. Once the wait is over you will know what has happened and the wait_time will be non zero value and you will have the actual event the process have been waiting in the event column of the v$session_wait instead of the placeholder event and the wait time for the actual event in the wait_time column in the v$session_wait. Let’s look at some of the events that are relevant.
Gc [current/cr] [2/3]-way – If you have 2 node cluster you cannot get 3-way as only two RAC instances are available and therefore only 2-way is possible as you can have at most two hops. If you have three or more RAC instances then 2-way or 3-way is possible. Blocks are received after 2 or 3 network hops immediately. The event is not a subject to any tuning except increasing private interconnects bandwidth and decreasing the private interconnects latency. Monitor if average ms > 1ms or close to Disk I/O latency. Look at reducing latency.
Gc [current/cr] grant 2-way – Event when grant is received immediately. Grant is always local or 2-way. Grant occurs when a request is made for a block image current or cr and no instance have the image in its local buffer cache. The requesting instance is required to do an I/O from data file to get the blocks. The grant simply is a permission from the LMS this to happen that is, the process to read the block from the data file. Grant can be either cr or current. Gc current grant is go read the block from the database files, while
gc cr grant is read the block from disk and build a read consistent block once is read.
Gc [current/cr][block/grant] congested – means that it has been received eventually but with a delay because of intensive CPU consumption, memory lack, LMS overload due to much work in the queues, paging, swapping. This is worth investigating as it provides a room for improvement. You should look at it as it indicates that LMS could not dequeue message fast enough.
Gc [current/cr] block busy – Received but not sent immediately due to high concurrency or contention. This means that the block is busy for example somebody issue block recover command from RMAN. Variety of reasons for being busy just means cannot be sent immediately but not because of memory, LMS or system oriented reasons but Oracle oriented reasons. It is also worth investigating.
Gc current grant busy – Grant is received but there is a delay due to many shared block images or load. For example you are extending the high water mark and you are formatting the block images or blocks with block headers.
Gc [current/cr][failure/retry] - Not received because of failure, checksum error usually in the protocol of the private interconnect due to network errors or hardware problems. This is something worth investigating. Failure means that cannot receive the block image while retry means that the problems recovers and ultimately the block image can be received but it needs to retry.
Gc buffer busy – time between block accesses less than buffer pin time. Pin buffers can be in exclusive or shared mode depending if buffers can be modified or read only. Obviously if there is a lot of contention for the same block by different processes this event can manifest itself in grater magnitude. Buffer busy are global cache events as a request is made from one instance and the block is available in another instance and the block is busy due to contention.
The key to understand is that there are separate wait events for the placeholder and when the event is over this event is replaced in v$session_wait with different event depending on how many hops there were, what kind of request was, what happened, was there a congestion, busy, failure or retry. Looking at (g)v$ views or AWR reports you need to see if you observe congestion, busy, failure, retry and investigate further.
1.Gc [current/cr] [2/3]-way block正在从远程instance传输过来,经过2或3次跳跃,取决于网络的吞吐量
2.Gc [current/cr] grant 2-way 查询GRD后,发现远程instance并没有需求的block,需从datafile读取
3.Gc [current/cr][block/grant] congested 远程instance由于硬件瓶颈导致不能及时返回信息
4.Gc current grant busy 远程instance对象自身的原因,延迟返回信息
5.Gc [current/cr][failure/retry] 远程instance的硬件或是网络的问题,导致传输的block错误
6.Gc buffer busy 与单instance的处理方式类似
总结:
DBA最应该关注和调优的是gc buffer busy。处理方式与单instance类似。需要先需找到hot block ,然后对index 和table分别处理。index可以采用reverse,hash partition等方式,table可以采用增大pctfree或是hash partition等方式
今天监控一直报等待事件异常,查了下数据库基本都是gc buffer busy acquire等待事件。这个等待事件之前一直没接触过,今天特意了解下。
参考文档:Oracle Mos
一、简要定义
该等待事件仅适用于RAC环境,类似于非RAC环境中的"buffer busy"等待。
当会话正在等待访问另外一个会话正在适用和持有的块且无法共享该块时,会发生这种情况。多个会话可能会排队等待同一个块。
在11.1和更早版本中,这种类型的等待被分为"gc buffer busy"等待。
从Oracle 11.2开始"gc buffer busy"等待被分为两个新的等待类别:
- gc buffer busy acquire
- gc buffer busy release
gc buffer busy acquire:是当session 1尝试请求访问远程实例(remote instance)的buffer,但是session 1之前已经有相同实例上的另外一个session 2正在请求访问了相同的buffer,并且没有完成,那么session 1就是在等待gc buffer busy acquire。
gc buffer busy release:是在本地实例session 1之前已经有远程实例session 2请求访问了本地实例的相同buffer,并且没有完成,那么本地实例的session 1就是在等待gc buffer busy release。
二、一般原因
- High contention in particular HOT blocks of the objects
- Other waits like "gc block busy" and "enq: TX - row lock contention
- High network latency or a problem with network
-
Busy server or active paging/swapping due to low free memory
Individual waits-(用于在GV$SESSION_WAIT中看到的等待)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
System Wide wait-(用于在V$SYSTEM_EVNET中看到的等待)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
三、故障排查
1)特定HOT块的争用较高
这是由于大量并发插入导致过多的索引块拆分或带有从序列生成的键的右增长索引。
buffer busy 会频繁伴随着这一点。如果问题仍然存在,可以使用 System Wide wait-(用于在V$SYSTEM_EVNET中看到的等待)说明寻找热块。或者从问题时期的AWR报告的 Segments by Global Cache Buffer Busy获取问题segment。
2)gc block busy、enq: TX - row lock contention以及其他等待可能会影响阻止会话或者LMS进程。
如果还有其他等待可能会使块的持有者放慢速度,则解决该问题是当务之急,因为gc buffer busy acquire/release可能只是该等待的副作用。
检查AWR报告中的 Top 10 Foreground Events by Total Wait Time部分,以查看其他等待是否显着影响数据库的性能。
3)高网络延迟或网络问题
发出 "ping -s 10000 <数据库使用的HAIP IP地址或私有IP地址>"并按照文档执行网络检查( How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc ID 1054902.1))
对于过去发生的问题的RCA,请检查OSWatcher以获取ping延迟时间。
AWR报告将包含 Interconnect Ping Latency Stats,这对于检查网络延迟也很有用。
OSWatcher中的netstat和CHM输出中的Nic&Protocol部分可以提供有关网络运行状况的信息。
4)繁忙的服务器或活动的页面调度/交换(由于可用内存不足)
检查vmstat输出或CHM输出,以查看服务器是否繁忙或大量的分页/交换。
对于过去发生的问题的RCA,请检查CHM或OSWatcher输出。
5)低效SQL语句
低效SQL语句会导致不必要的buffer被请求访问,增加了buffer busy的机会。在AWR中可以找到TOP SQL。解决方法可以优化SQL语句减少buffer访问。这点与单机数据库中的buffer busy waits类似。
关于select是否会导致gc buffer busy acquire:
- 查询一般以shared模式请求buffer,但是如果buffer不在buffer cache中,那么需要IO将buffer 读到内存中,这个过程需要以exclusive模式,如果同时有大量其他的session也请求查询该buffer(以shared 模式请求),那么就会有buffer等待了,此时可能buffer cache不够大。
- 如果查询请求的block已经被修改了,查询需要访问CR块,为了重构CR块,需要读取对应的undo block,如果undo block不在buffer中,需要IO把undo block读到内存,如果有大量查询访问这个CR块,那么都会有buffer busy等待了。
6)数据在节点间交叉访问
RAC数据库,同一数据在不同数据库实例上被请求访问。
如果应用程序可以实现,那么我们建议不同的应用功能/模块数据分布在不同的数据库实例上被访问,避免同一数据被多个实例交叉访问,可以减少buffer的争用,避免gc等待。
7)Oracle Bug
四、可能的解决方案
对于高争用和热块:
1 2 3 4 5 6 7 8 9 10 11 |
|
对于enq: TX - row lock contention:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc ID 1054902.1)

How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc ID 1054902.1).pdf