GC block lost wait event tips

最新推荐文章于 2019-12-24 17:12:45 发布

songyb

最新推荐文章于 2019-12-24 17:12:45 发布

阅读量950

点赞数

分类专栏： RAC Concept

RAC Concept 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

GC Block Lost Wait Event

from: http://www.dba-oracle.com/t_rac_tuning_gc_block_lost_wait_event.htm

No network is perfect. Data transmitted from point A to point B may occasionally get lost. The same is true for global cache transfers along the Cluster Interconnect. Global cache block transfers can get lost. If a requested block is not received by the instance in 0.5 seconds, the block is considered to be lost. When most block transfers complete in milliseconds, too many lost global cache block transfers can hamper application performance because the block needs to be re-sent, thus wasting time for the second transfer to complete.

Lost global cache block transfers can be seen in two different areas. Wait events named gc cr block lost and gc current block lost will be raised when a consistent read block transfer is lost, or when a current block transfer is lost, and the session must wait for the block to be resent. The other area is for the Oracle statistics namedgc blocks lost as can be seen on the system or session level. Examples of these two metrics are seen below.

< gc_blocks_lost.sql

select

inst_id,

event,

total_waits,

time_waited

from

gv$system_event

where

event in ('gc current block lost',

'gc cr block lost')

order by

event,

inst_id;

INST_ID EVENT TOTAL_WAITS TIME_WAITED

---------- ------------------------------ ----------- -----------

1 gc cr block lost 50 3029

2 gc cr block lost 75 4516

1 gc current block lost 26 1467

2 gc current block lost 36 2060

select

sn.inst_id,

sn.name,

ss.value

from

gv$statname sn,

gv$sysstat ss

where

sn.inst_id = ss.inst_id

and

sn.statistic# = ss.statistic#

and

sn.name = 'gc blocks lost'

order by

sn.inst_id;

INST_ID NAME VALUE

---------- -------------------- ----------

1 gc blocks lost 90

2 gc blocks lost 164

The output above shows the metrics on a per-instance basis. One can certainly summarize the values across all instances if desired.

The presence of blocks lost in wait events or a system statistic is not sufficient to cause us great concern. Just like any network, there may be an occasional hiccup that would lead to lost block transfers and would appear in the gv$sysstat view. As with any wait event, the wait event metric by itself is essentially meaningless as there is no context from the output above. Is the wait event a ?Top 5? wait event? Where the wait events generated over a 1-hour time period or 1 month? Since we do not know the answers to these questions, we cannot determine if the metrics are indicating a problem or not. More information is needed. An AWR report from a 1-hour snapshot of time can be more indicative that a real problem exists.

Top 5 Timed Foreground Events

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Avg

wait % DB

Event Waits Time(s) (ms) time Wait Class

-------------------------- ------------ ----------- ------ ------ ----------

DB CPU 6,975 32.1

db file sequential read 3,831,277 5,809 2 26.8 User I/O

gc current block lost 3,819 942 247 4.3 Cluster

db file parallel read 145,588 854 6 3.9 User I/O

gc cr multi block request 535,685 498 1 2.3 Cluster

Above, the gc current block lost wait event is in the Top 5 list. The listing above now provides context to the wait event in question. This event contributes the second longest total wait time for the instance during the one-hour time period. However, if the wait event were totally eliminated, only 4.3% of the total processing time would be recovered. From a performance tuning perspective, where the end goal is often to reduce processing time, it would be better to focus on the db file sequential read wait event that is contributing 26.8% of the total database time or determining if the CPU utilization can be decreased as that is contributing to 32.1% of the total time. That being said, it is never a good sign when any global cache blocks being lost are a top wait event.

The most common reason for lost global cache blocks is a faulty private network, i.e. one that is dropping packets. If global cache lost blocks are seen as a problem, then work with the network administrator to ensure the switch is valid, cables are secure and seated properly, firmware levels are up to date, and that other network configuration issues are not a problem. The network administrator should be able to use network tools like netstat and anything else in their arsenal to check for dropped packets on the private network.

[root@host01 ~]# netstat ?su

IcmpMsg:

InType0: 91

InType3: 723

InType8: 23

OutType0: 23

OutType3: 928

OutType8: 103

Udp:

664034038 packets received

983 packets to unknown port received.

20080 packet receive errors

654621700 packets sent

UdpLite:

IpExt:

InMcastPkts: 18041

OutMcastPkts: 8745

InBcastPkts: 102377

OutBcastPkts: 119

InOctets: 4678332299675

OutOctets: 2652878623355

InMcastOctets: 1401313

OutMcastOctets: 636504

InBcastOctets: 19312376

OutBcastOctets: 49090

The netstat utility is reporting UDP packet receive errors, indicating global cache lost block transfers for this node of the cluster. In addition to verifying the hardware is correct, the network administrator should investigate the following:

Private network is truly private

Oversaturated bandwidth due to too much traffic on the network

Quality of Service (QoS) settings that may be downgrading performance

Incorrect Jumbo Frames configuration

Multiple hops between the nodes and the private network switch

Mismatched MTU settings between devices

Mismatch in duplex mode settings between devices

Incorrect bonding/teaming configuration

If everything on the network side checks out, then look to sizing the UDP settings to have larger socket sizes as discussed in the previous section of this chapter. Global cache lost blocks are not always a network issue. After the network has been verified and UDP socket sizes are correct, look to see if CPU resources are in short supply.