​RAC等待事件:gc 等待事件

The placeholder holds those events above whilst the process is waiting to get this block image. Once the wait is over you will know what has happened and the wait_time will be non zero value and you will have the actual event the process have been waiting in the event column of the v$session_wait instead of the placeholder event and the wait time for the actual event in the wait_time column in the v$session_wait. Let’s look at some of the events that are relevant.

Gc [current/cr] [2/3]-way – If you have 2 node cluster you cannot get 3-way as only two RAC instances are available and therefore only 2-way is possible as you can have at most two hops. If you have three or more RAC instances then 2-way or 3-way is possible. Blocks are received after 2 or 3 network hops immediately. The event is not a subject to any tuning except increasing private interconnects bandwidth and decreasing the private interconnects latency. Monitor if average ms > 1ms or close to Disk I/O latency. Look at reducing latency.

Gc [current/cr] grant 2-way – Event when grant is received immediately. Grant is always local or 2-way. Grant occurs when a request is made for a block image current or cr and no instance have the image in its local buffer cache. The requesting instance is required to do an I/O from data file to get the blocks. The grant simply is a permission from the LMS this to happen that is, the process to read the block from the data file. Grant can be either cr or current. Gc current grant is go read the block from the database files, while

gc cr grant is read the block from disk and build a read consistent block once is read.

Gc [current/cr][block/grant] congested – means that it has been received eventually but with a delay because of intensive CPU consumption, memory lack, LMS overload due to much work in the queues, paging, swapping. This is worth investigating as it provides a room for improvement. You should look at it as it indicates that LMS could not dequeue message fast enough.

Gc [current/cr] block busy – Received but not sent immediately due to high concurrency or contention. This means that the block is busy for example somebody issue block recover command from RMAN. Variety of reasons for being busy just means cannot be sent immediately but not because of memory, LMS or system oriented reasons but Oracle oriented reasons. It is also worth investigating.

Gc current grant busy – Grant is received but there is a delay due to many shared block images or load. For example you are extending the high water mark and you are formatting the block images or blocks with block headers.

Gc [current/cr][failure/retry] - Not received because of failure, checksum error usually in the protocol of the private interconnect due to network errors or hardware problems. This is something worth investigating. Failure means that cannot receive the block image while retry means that the problems recovers and ultimately the block image can be received but it needs to retry.

Gc buffer busy – time between block accesses less than buffer pin time. Pin buffers can be in exclusive or shared mode depending if buffers can be modified or read only. Obviously if there is a lot of contention for the same block by different processes this event can manifest itself in grater magnitude. Buffer busy are global cache events as a request is made from one instance and the block is available in another instance and the block is busy due to contention.

The key to understand is that there are separate wait events for the placeholder and when the event is over this event is replaced in v$session_wait with different event depending on how many hops there were, what kind of request was, what happened, was there a congestion, busy, failure or retry. Looking at (g)v$ views or AWR reports you need to see if you observe congestion, busy, failure, retry and investigate further.

1.Gc [current/cr] [2/3]-way block正在从远程instance传输过来,经过2或3次跳跃,取决于网络的吞吐量

2.Gc [current/cr] grant 2-way 查询GRD后,发现远程instance并没有需求的block,需从datafile读取

3.Gc [current/cr][block/grant] congested 远程instance由于硬件瓶颈导致不能及时返回信息

4.Gc current grant busy 远程instance对象自身的原因,延迟返回信息

5.Gc [current/cr][failure/retry] 远程instance的硬件或是网络的问题,导致传输的block错误

6.Gc buffer busy 与单instance的处理方式类似

总结:

DBA最应该关注和调优的是gc buffer busy。处理方式与单instance类似。需要先需找到hot block ,然后对index 和table分别处理。index可以采用reverse,hash partition等方式,table可以采用增大pctfree或是hash partition等方式

今天监控一直报等待事件异常,查了下数据库基本都是gc buffer busy acquire等待事件。这个等待事件之前一直没接触过,今天特意了解下。

参考文档:Oracle Mos

一、简要定义

该等待事件仅适用于RAC环境,类似于非RAC环境中的"buffer busy"等待。

当会话正在等待访问另外一个会话正在适用和持有的块且无法共享该块时,会发生这种情况。多个会话可能会排队等待同一个块。

在11.1和更早版本中,这种类型的等待被分为"gc buffer busy"等待。

从Oracle 11.2开始"gc buffer busy"等待被分为两个新的等待类别:

  • gc buffer busy acquire
  • gc buffer busy release

gc buffer busy acquire:是当session 1尝试请求访问远程实例(remote instance)的buffer,但是session 1之前已经有相同实例上的另外一个session 2正在请求访问了相同的buffer,并且没有完成,那么session 1就是在等待gc buffer busy acquire。

gc buffer busy release:是在本地实例session 1之前已经有远程实例session 2请求访问了本地实例的相同buffer,并且没有完成,那么本地实例的session 1就是在等待gc buffer busy release。

二、一般原因

  • High contention in particular HOT blocks of the objects
  •  Other waits like "gc block busy" and "enq: TX - row lock contention
  • High network latency or a problem with network
  • Busy server or active paging/swapping due to low free memory

Individual waits-(用于在GV$SESSION_WAIT中看到的等待)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

P1                 File #

P2                 Block #

P3                 Mode requested/mode held/block class

SECONDS_IN_WAIT    Amount of time waited for the current event

file#              This is the file# of the file that Oracle is trying to read from.

block#             This is the starting block number in the file from where Oracle starts reading the blocks.

blocks             This parameter specifies the number of blocks that Oracle is rying to read from the file# starting at block#

Inst_id            instance number

To determine the root blocker for sessions waiting on the gc wait events use the below options

1.system state dump at cluster level

2. oratop displays waiters/blockers

3. v$wait_chains can be used to find the root blocker for sessions that are blocked,Troubleshooting Database Contention With V$Wait_Chains (Doc ID 1428210.1)

4. Using v$hang_info, v$hang_session_info, etc

5. Oracle Hang Manager (Doc ID 1534591.1)

Using the above information we can find the sessions waiting for specific gc events with their final blockers at instance level

System Wide wait-(用于在V$SYSTEM_EVNET中看到的等待)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

如果等待缓冲区花费的时间较长,则需要根据以下内容确定哪个段遭受争用:

SELECT inst_id,

       sid,

       event,

       wait_class,

       P1,

       P2,

       P3 Mode requested / mode held / block class,

       seconds_in_wait

  FROM gv$session_wait

 WHERE event LIKE 'gc buffer%';

从前面的输出中,使用P1和P2中的数据,可以使用以下命令获得相关的对象信息以下查询:

SELECT segment_name

  FROM dba_extents

 WHERE file_id = &file

   AND &block BETWEEN block_id AND block_id + blocks - 1

   AND ROWNUM = 1;

三、故障排查

1)特定HOT块的争用较高

这是由于大量并发插入导致过多的索引块拆分或带有从序列生成的键的右增长索引。

buffer busy 会频繁伴随着这一点。如果问题仍然存在,可以使用 System Wide wait-(用于在V$SYSTEM_EVNET中看到的等待)说明寻找热块。或者从问题时期的AWR报告的 Segments by Global Cache Buffer Busy获取问题segment。

2)gc block busy、enq: TX - row lock contention以及其他等待可能会影响阻止会话或者LMS进程。

如果还有其他等待可能会使块的持有者放慢速度,则解决该问题是当务之急,因为gc buffer busy acquire/release可能只是该等待的副作用。

检查AWR报告中的 Top 10 Foreground Events by Total Wait Time部分,以查看其他等待是否显着影响数据库的性能。

3)高网络延迟或网络问题

发出 "ping -s 10000 <数据库使用的HAIP IP地址或私有IP地址>"并按照文档执行网络检查( How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc ID 1054902.1))

对于过去发生的问题的RCA,请检查OSWatcher以获取ping延迟时间。

AWR报告将包含 Interconnect Ping Latency Stats,这对于检查网络延迟也很有用。

OSWatcher中的netstat和CHM输出中的Nic&Protocol部分可以提供有关网络运行状况的信息。

4)繁忙的服务器或活动的页面调度/交换(由于可用内存不足)

检查vmstat输出或CHM输出,以查看服务器是否繁忙或大量的分页/交换。

对于过去发生的问题的RCA,请检查CHM或OSWatcher输出。

5)低效SQL语句

低效SQL语句会导致不必要的buffer被请求访问,增加了buffer busy的机会。在AWR中可以找到TOP SQL。解决方法可以优化SQL语句减少buffer访问。这点与单机数据库中的buffer busy waits类似。

关于select是否会导致gc buffer busy acquire:

  • 查询一般以shared模式请求buffer,但是如果buffer不在buffer cache中,那么需要IO将buffer 读到内存中,这个过程需要以exclusive模式,如果同时有大量其他的session也请求查询该buffer(以shared 模式请求),那么就会有buffer等待了,此时可能buffer cache不够大。
  • 如果查询请求的block已经被修改了,查询需要访问CR块,为了重构CR块,需要读取对应的undo block,如果undo block不在buffer中,需要IO把undo block读到内存,如果有大量查询访问这个CR块,那么都会有buffer busy等待了。

6)数据在节点间交叉访问

RAC数据库,同一数据在不同数据库实例上被请求访问。

如果应用程序可以实现,那么我们建议不同的应用功能/模块数据分布在不同的数据库实例上被访问,避免同一数据被多个实例交叉访问,可以减少buffer的争用,避免gc等待。

7)Oracle  Bug

四、可能的解决方案

对于高争用和热块:

1

2

3

4

5

6

7

8

9

10

11

Solution is to reorganize the index in a way to avoid the contention or hot spots using the below options

I. Global Hash partition the index

CREATE INDEX hgidx ON tab (c1,c2,c3) GLOBAL

     PARTITION BY HASH (c1,c2)

     (PARTITION p1  TABLESPACE tbs_1,

      PARTITION p2  TABLESPACE tbs_2,

      PARTITION p3  TABLESPACE tbs_3,

      PARTITION p4  TABLESPACE tbs_4);

II. Recreate the index as reverse key index (not suitable for large table, could require buffer cache increased accordingly)

III. If index key is generated from a sequence, increase cache size of the sequence and make the sequence 'no order' if application supports it.

Refer the doc link: 

对于enq: TX - row lock contention:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Mode 4-Related to ITL waits

从AWR报告或使用以下SQL查找具有较高ITL等待的段:

SELECT OWNER, OBJECT_NAME, OBJECT_TYPE

  FROM V$SEGMENT_STATISTICS

 WHERE STATISTIC_NAME = 'ITL waits'

   AND VALUE > 0

 ORDER BY VALUE;

增加这些高ITL等待的segment的inittrans值

Mode 6-Primarily due to application issue:

这是一个应用程序问题,需要应用程序开发人员来调查所涉及的SQL语句。 以下文档可能有助于进一步深入研究:

Note:102925.1 - Tracing sessions: waiting on an enqueue

Note:179582.1 - How to Find TX Enqueue Contention in RAC or OPS

Note:1020008.6 - SCRIPT: FULLY DECODED LOCKING

Note:62354.1 - TX Transaction locks - Example wait scenarios

Note:224305.1 -Autonomous Transaction can cause locking

How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc ID 1054902.1)

 How to Validate Network and Name Resolution Setup for the Clusterware and RAC (Doc ID 1054902.1).pdf

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值