[ID 278316.1]
Oracle Server - Standard Edition - Version: 8.0.6.0 to 11.2.0.3 [Release: 8.0.6 to 11.2]
Information in this document applies to any platform.
For a list of row cache types per version see:
Note 468334.1 How to Match a Row Cache Object Child Latch to its Row Cache
When Row cache contention occurs, if the enqueue cannot be obtained within a certain predetermined time period, a trace file will be generated in the user_dump_dest or background_dump_dest depending on whether a user or background process created the trace file. The alert log is usually updated accordingly with the warning and the location of the trace file.
The database detects that a key resource is being held for too long and flags this up to the administrator so that this situation can be resolved.This may also be accompanied by database hang or slowdown.
The message in the alert.log and the trace file generated will tend to contain the message:
>>>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! <<<
If the rowcache entry lock cannot be granted immediately then we enter a loop where we free the row cache objects latch, wait (using the above event), re-acquire the latch and then again try to acquire the rowcache lock. In single-instance mode, this is repeated for up to 1000 times before the process reports WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK message. Under RAC it repeats until we fail to acquire the instance lock or we are interrupted.
The systemstate dumped can provide some useful information for diagnosing the cause of the contention.
>>>>
Depending on the interval used for generating AWR snapshot, get a report for the smallest time frame. available. The default for snapshots is an hourly interval.
As systemstate is complicated to analyze, it is recommended to create a Service Request and upload the alert.log, systemstate dump and AWR reports preceding and during the problem to support.
修改时间 16-OCT-2011 类型 TROUBLESHOOTING 状态 PUBLISHED |
In this Document
Purpose
Last Review Date
Instructions for the Reader
Troubleshooting Details
What is a Row Cache Enqueue Lock?
What is the meaning of the warning WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK?
What information can be gathered to help identify the cause?
How to interpret the information obtained
Potential reasons for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!"
References
Applies to:
Oracle Server - Enterprise Edition - Version: 8.0.6.0 to 11.2.0.3 - Release: 8.0.6 to 11.2Oracle Server - Standard Edition - Version: 8.0.6.0 to 11.2.0.3 [Release: 8.0.6 to 11.2]
Information in this document applies to any platform.
Purpose
The purpose of the note is to help troubleshoot reasons for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! "Last Review Date
October 16, 2011Instructions for the Reader
A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.
Troubleshooting Details
What is a Row Cache Enqueue Lock?
The Row Cache or Data Dictionary Cache is a memory area in the shared pool that holds data dictionary information. Row cache holds data as rows instead of buffers.A Row cache enqueue lock is a lock on the data dictionary rows. The enqueue will be on a specific data dictionary object. This is called the enqueue type and can be found in the v$rowcache viewFor a list of row cache types per version see:
Note 468334.1 How to Match a Row Cache Object Child Latch to its Row Cache
What is the meaning of the warning WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK?
This wait event is used when we are trying to acquire a lock on a rowcache entry.When Row cache contention occurs, if the enqueue cannot be obtained within a certain predetermined time period, a trace file will be generated in the user_dump_dest or background_dump_dest depending on whether a user or background process created the trace file. The alert log is usually updated accordingly with the warning and the location of the trace file.
The database detects that a key resource is being held for too long and flags this up to the administrator so that this situation can be resolved.This may also be accompanied by database hang or slowdown.
The message in the alert.log and the trace file generated will tend to contain the message:
>>>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! <<<
If the rowcache entry lock cannot be granted immediately then we enter a loop where we free the row cache objects latch, wait (using the above event), re-acquire the latch and then again try to acquire the rowcache lock. In single-instance mode, this is repeated for up to 1000 times before the process reports WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK message. Under RAC it repeats until we fail to acquire the instance lock or we are interrupted.
The systemstate dumped can provide some useful information for diagnosing the cause of the contention.
>>>>
> />
What information can be gathered to help identify the cause?
Systemstate dump
A systemstate dump is automatically generated by Oracle when the issue occurs and listed in the alert.log
Wed Sep 21 13:39:19 2011> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=37
System State dumped to trace file /oracle/diag/rdbms/v11202/v11202/v11202_ora_30569.trc
System State dumped to trace file /oracle/diag/rdbms/v11202/v11202/v11202_ora_30569.trc
AWR, ADDM and ASH Reports
Run AWR and ASH reports for the time when the problem is reported as well as a report leading up to the problem as these can sometimes help build a picture of when a problem actually started. The AWR ,ADDM and ASH reports can compliment each other in getting a complete picture.Depending on the interval used for generating AWR snapshot, get a report for the smallest time frame. available. The default for snapshots is an hourly interval.
SQL>@$ORACLE_HOME/rdbms/admin/awrrpt.sql
SQL>@$ORACLE_HOME/rdbms/admin/addmrpt.sql
SQL>@$ORACLE_HOME/rdbms/admin/ashrpt.sql
SQL>@$ORACLE_HOME/rdbms/admin/addmrpt.sql
SQL>@$ORACLE_HOME/rdbms/admin/ashrpt.sql
As systemstate is complicated to analyze, it is recommended to create a Service Request and upload the alert.log, systemstate dump and AWR reports preceding and during the problem to support.
@Internal Query
@ The cache_id for the holder can be determined using the following query:
@
@ set pages 999 lines 132
@
@ column name format a30
@ column held_mode format 990
@ column requent_mode format 990
@
@ select KQRFPCNM Name, KQRFPSES SID, KQRFPMOD Mode, KQRFPREQ Request
@ from X$KQRFP
@ where KQRFPCID=&Cache_id ;
@
@ On each node then, this will show you the waiters and holders for the
@ cache_id. Check the cache_id's using:
@
@ select distinct KQRFPCNM ,KQRFPCID from X$KQRFP ;
Systemstate dump can help to find which row cache is being requested and may help find the blocking process.
Example 1:
In the example above Process 77 is waiting for a row cache needed in shared mode (request: S)
The process state will also contain a stack useful to Oracle Support to determine what part of Oracle code is being performed, wait information and sql currently being run.
Using the address for the row cache object (in this case 0x1dc9a5d30). we see that there is request by PROCESS 218 in exclusive mode for the particular row cache. A request in Exclusive mode will result in any requests in shared mode to wait behind it until it is granted the request in exclusive mode and has released it.
For each enqueue type, there are a limited number of operations that require each enqueue.
Example 2:
The row cache lock is being held by PROCESS 269 in exclusive mode. The process is waiting for 'SGA: allocation forcing component growth' . Thus the root cause in this case is the resizing of SGA and the wait on row cache is secondary result of this
As can be seen clearly in the AWR report Top 5 Timed Events, the major issue is the 'SGA: allocation forcing component growth". This is the root cause where one of the processes as seen in the systemstate dump is holding the row cache while waiting on the above. The top 5 events do not even show waits on 'row cache' as this is a cause and not the actual root issue.
Probably the most likely cause is the allocation of new extents. If extent sizes are set low then the application may constantly be requesting new extents and causing contention. Do you have objects with small extent sizes that are rapidly growing? (You may be able to spot these by looking for objects with large numbers of extents). Check the trace for insert/update activity, check the objects inserted into for number of extents.
DC_SEQUENCES
Check for appropriate caching of sequences for the application requirements.
DC_USERS
Deadlock and resulting "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" can occur if a session issues a GRANT to a user, and that user is in the process of logging on to the database.
DC_OBJECT_IDS
DC_SEGMENTS
This is likely to be down to segment allocation. Identify what the session holding the enqueue is doing and use errorstacks to diagnose.
DB_ROLLBACK_SEGMENTS
This is due to rollback segment allocation. Just like dc_segments,identify what is holding the enqueue and also generate errorstacks. Remember that on a multi-node system (RAC) the holder may be on another node and so multiple systemstates from each node will be required.
DC_TABLE_SCNS
RAC Specific Bugs
@
@ set pages 999 lines 132
@
@ column name format a30
@ column held_mode format 990
@ column requent_mode format 990
@
@ select KQRFPCNM Name, KQRFPSES SID, KQRFPMOD Mode, KQRFPREQ Request
@ from X$KQRFP
@ where KQRFPCID=&Cache_id ;
@
@ On each node then, this will show you the waiters and holders for the
@ cache_id. Check the cache_id's using:
@
@ select distinct KQRFPCNM ,KQRFPCID from X$KQRFP ;
How to interpret the information obtained
Systemstate dump
Often the Row cache enqueue is a result of a chain of events and the lock being held is a symptom of another issue where a process holding the requested row cache enqueue is being blocked by other processes.Systemstate dump can help to find which row cache is being requested and may help find the blocking process.
Example 1:
Oracle process number: 77
Unix process pid: 10846, image: oracle@cpdb4532
*** 2011-05-13 08:08:58.775
*** SERVICE NAME:(ALFCMR_SERVICE) 2011-05-13 08:08:58.775
*** SESSION ID:(1076.796) 2011-05-13 08:08:58.775
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! <<<
row cache enqueue: session: 0x1df57ade8, mode: N, request: S
>>
The header of the trace shows the Oracle process number (pid) of the process waiting for the row cache enqueue lock as well as the mode needed for the row cache enqueue.
Unix process pid: 10846, image: oracle@cpdb4532
*** 2011-05-13 08:08:58.775
*** SERVICE NAME:(ALFCMR_SERVICE) 2011-05-13 08:08:58.775
*** SESSION ID:(1076.796) 2011-05-13 08:08:58.775
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! <<<
row cache enqueue: session: 0x1df57ade8, mode: N, request: S
>>
In the example above Process 77 is waiting for a row cache needed in shared mode (request: S)
PROCESS 77
----------------------------------------
.
.
----------------------------------------
SO: 0x1cdf11958, type: 50, owner: 0x17d198288, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=0x1df57ade8 bject=0x1dc9a5d30, request=S
savepoint=0x87b70d
row cache parent object: address=0x1dc9a5d30 cid=7(dc_users)
.
.
The systemstate contains process state information for every process in the database. Above we see that PROCESS 77 that dumped the systemstate is requesting row cache dc_users in shared mode.
----------------------------------------
.
.
----------------------------------------
SO: 0x1cdf11958, type: 50, owner: 0x17d198288, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=0x1df57ade8 bject=0x1dc9a5d30, request=S
savepoint=0x87b70d
row cache parent object: address=0x1dc9a5d30 cid=7(dc_users)
.
.
The process state will also contain a stack useful to Oracle Support to determine what part of Oracle code is being performed, wait information and sql currently being run.
PROCESS 218:
----------------------------------------
.
.
SO: 0x1cdf118f8, type: 50, owner: 0x1ceb0f178, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=0x1da54cf68 bject=0x1dc9a5d30, request=X
savepoint=0x11e
row cache parent object: address=0x1dc9a5d30 cid=7(dc_users)
----------------------------------------
.
.
SO: 0x1cdf118f8, type: 50, owner: 0x1ceb0f178, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=0x1da54cf68 bject=0x1dc9a5d30, request=X
savepoint=0x11e
row cache parent object: address=0x1dc9a5d30 cid=7(dc_users)
Using the address for the row cache object (in this case 0x1dc9a5d30). we see that there is request by PROCESS 218 in exclusive mode for the particular row cache. A request in Exclusive mode will result in any requests in shared mode to wait behind it until it is granted the request in exclusive mode and has released it.
PROCESS 164:
----------------------------------------
.
.
O/S info: user: u1m, term: , ospid: 1234, machine: cpc44711
program:
last wait for 'SQL*Net message from client' blocking sess=0x(nil) seq=36289 wait_time=6943 seconds since wait started=2539
driver id=54435000, #bytes=1, =0
.
.
SO: 0x1cdf11418, type: 50, owner: 0x1ccc26120, flag: INIT/-/-/0x00
row cache enqueue: count=2 session=0x1df578318 object=0x1dc9a5d30, mode=S
savepoint=0xb1bd8e
row cache parent object: address=0x1dc9a5d30 cid=7(dc_users)
hash=fc968070 typ=11 transaction=(nil) flags=00000002
own=0x1dc9a5e00[0x1cdf11448,0x1cdf11448] wat=0x1dc9a5e10[0x1cdf11928,0x17d5192e0] mode=S
Process 164 is holding the row cache enqueue in Shared mode (mode=S) and thus preventing process 218 from obtaining the row cache enqueue. Furthermore we see that the process 163 is on CPU (the systemstate shows last wait for 'SQL*Net message from client' and not waiting for 'SQL*Net message from client'). To further diagnose support would need to check the stack calls included in the dump.
----------------------------------------
.
.
O/S info: user: u1m, term: , ospid: 1234, machine: cpc44711
program:
last wait for 'SQL*Net message from client' blocking sess=0x(nil) seq=36289 wait_time=6943 seconds since wait started=2539
driver id=54435000, #bytes=1, =0
.
.
SO: 0x1cdf11418, type: 50, owner: 0x1ccc26120, flag: INIT/-/-/0x00
row cache enqueue: count=2 session=0x1df578318 object=0x1dc9a5d30, mode=S
savepoint=0xb1bd8e
row cache parent object: address=0x1dc9a5d30 cid=7(dc_users)
hash=fc968070 typ=11 transaction=(nil) flags=00000002
own=0x1dc9a5e00[0x1cdf11448,0x1cdf11448] wat=0x1dc9a5e10[0x1cdf11928,0x17d5192e0] mode=S
For each enqueue type, there are a limited number of operations that require each enqueue.
Example 2:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /opt/oracle10/product/10.2.0
System name: SunOS
Node name: saecopt51
Release: 5.10
Version: Generic_144488-04
Machine: sun4v
Instance name: PORT_V16
Redo thread mounted by this instance: 1
Oracle process number: 18
Unix process pid: 6196, image: oracle@sae (MMON)
.
.
PROCESS 18:
----------------------------------------
.
.
last wait for 'ksdxexeotherwait' wait_time=0.000013 sec, seconds since wait started=6
.
.
SO: 39bf1f0e8, type: 50, owner: 3980783a0, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=3be37ea80 object=39a79f090, request=S
savepoint=0x41f0ae
row cache parent object: address=39a79f090 cid=22(dc_awr_control)
hash=6f60197e typ=9 transaction=3bc39f560 flags=0000002a
own=39a79f160[39bf1f178,39bf1f178] wat=39a79f170[39bf1f118,39bf1f118] mode=X
.
.
PROCESS 269:
----------------------------------------
.
.
waiting for 'SGA: allocation forcing component growth' wait_time=0, seconds since wait started=3
.
.
SO: 39bf1f148, type: 50, owner: 3bc39f560, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=3be1b7c98 object=39a79f090, mode=X
savepoint=0x41efe8
row cache parent object: address=39a79f090 cid=22(dc_awr_control)
hash=6f60197e typ=9 transaction=3bc39f560 flags=0000002a
own=39a79f160[39bf1f178,39bf1f178] wat=39a79f170[39bf1f118,39bf1f118] mode=X
.
.
In this example we see that PROCESS 18 (MMON) is waiting for row cache type dc_awr_control in SHARED mode.
With the Partitioning, Oracle Label Security, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /opt/oracle10/product/10.2.0
System name: SunOS
Node name: saecopt51
Release: 5.10
Version: Generic_144488-04
Machine: sun4v
Instance name: PORT_V16
Redo thread mounted by this instance: 1
Oracle process number: 18
Unix process pid: 6196, image: oracle@sae (MMON)
.
.
PROCESS 18:
----------------------------------------
.
.
last wait for 'ksdxexeotherwait' wait_time=0.000013 sec, seconds since wait started=6
.
.
SO: 39bf1f0e8, type: 50, owner: 3980783a0, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=3be37ea80 object=39a79f090, request=S
savepoint=0x41f0ae
row cache parent object: address=39a79f090 cid=22(dc_awr_control)
hash=6f60197e typ=9 transaction=3bc39f560 flags=0000002a
own=39a79f160[39bf1f178,39bf1f178] wat=39a79f170[39bf1f118,39bf1f118] mode=X
.
.
PROCESS 269:
----------------------------------------
.
.
waiting for 'SGA: allocation forcing component growth' wait_time=0, seconds since wait started=3
.
.
SO: 39bf1f148, type: 50, owner: 3bc39f560, flag: INIT/-/-/0x00
row cache enqueue: count=1 session=3be1b7c98 object=39a79f090, mode=X
savepoint=0x41efe8
row cache parent object: address=39a79f090 cid=22(dc_awr_control)
hash=6f60197e typ=9 transaction=3bc39f560 flags=0000002a
own=39a79f160[39bf1f178,39bf1f178] wat=39a79f170[39bf1f118,39bf1f118] mode=X
.
.
The row cache lock is being held by PROCESS 269 in exclusive mode. The process is waiting for 'SGA: allocation forcing component growth' . Thus the root cause in this case is the resizing of SGA and the wait on row cache is secondary result of this
AWR Report
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
SGA: allocation forcing compon 42,067,317 38,469 1 7.6 Other
CPU time 2,796 0.6
db file sequential read 132,906 929 7 0.2 User I/O
latch free 4,282,858 704 0 0.1 Other
log file switch (checkpoint in 904 560 620 0.1 Configurat
-------------------------------------------------------------
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
SGA: allocation forcing compon 42,067,317 38,469 1 7.6 Other
CPU time 2,796 0.6
db file sequential read 132,906 929 7 0.2 User I/O
latch free 4,282,858 704 0 0.1 Other
log file switch (checkpoint in 904 560 620 0.1 Configurat
-------------------------------------------------------------
As can be seen clearly in the AWR report Top 5 Timed Events, the major issue is the 'SGA: allocation forcing component growth". This is the root cause where one of the processes as seen in the systemstate dump is holding the row cache while waiting on the above. The top 5 events do not even show waits on 'row cache' as this is a cause and not the actual root issue.
Potential reasons for "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!"
DC_TABLESPACESProbably the most likely cause is the allocation of new extents. If extent sizes are set low then the application may constantly be requesting new extents and causing contention. Do you have objects with small extent sizes that are rapidly growing? (You may be able to spot these by looking for objects with large numbers of extents). Check the trace for insert/update activity, check the objects inserted into for number of extents.
DC_SEQUENCES
Check for appropriate caching of sequences for the application requirements.
Note 853652.1 RAC and Sequences
Note 395314.1 RAC Hangs due to small cache size on SYS.AUDSES$ - fixed in 10.2.0.3
Note 6027068.8 Bug 6027068 - Contention on ORA_TQ_BASE sequence -fixed in 10.2.0.5 and 11.2.0.1
Note 395314.1 RAC Hangs due to small cache size on SYS.AUDSES$ - fixed in 10.2.0.3
Note 6027068.8 Bug 6027068 - Contention on ORA_TQ_BASE sequence -fixed in 10.2.0.5 and 11.2.0.1
DC_USERS
Deadlock and resulting "WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK!" can occur if a session issues a GRANT to a user, and that user is in the process of logging on to the database.
Note 4604972.8 Bug 4604972 - Deadlock on dc_users by Concurrent Grant/Revoke - fixed in 11.1.0.6
Note 6143420.8 Bug 6143420 - Deadlock involving "ROW CACHE LOCK" on dc_users
AND "CURSOR: PIN S WAIT ON X"- fixed in 10.2.0.5 and 11.1.0.6
DC_OBJECTS
Note 6143420.8 Bug 6143420 - Deadlock involving "ROW CACHE LOCK" on dc_users
AND "CURSOR: PIN S WAIT ON X"- fixed in 10.2.0.5 and 11.1.0.6
DC_OBJECTS
DC_OBJECT_IDS
Note 11693365.8 Bug 11693365 - Concurrent Drop table and Select on Reference constraint table hangs
(deadlock) - fixed in 12.1
(deadlock) - fixed in 12.1
DC_SEGMENTS
This is likely to be down to segment allocation. Identify what the session holding the enqueue is doing and use errorstacks to diagnose.
DB_ROLLBACK_SEGMENTS
This is due to rollback segment allocation. Just like dc_segments,identify what is holding the enqueue and also generate errorstacks. Remember that on a multi-node system (RAC) the holder may be on another node and so multiple systemstates from each node will be required.
DC_TABLE_SCNS
Note 5756769.8 Bug 5756769 - Deadlock between Create MVIEW and DML - fixed in 10.2.0.5 ,11.1.07 and 11.2.0.1
RAC Specific Bugs
Note 6004916.8 Bug 6004916 - Hang involving row cache enqueues in RAC (ORA-4021) - fixed in 102.0.5 and 11.1.0.6
Note 8666117.8 Bug 8666117 - High row cache latch contention in RAC - fixed in 11.2.0.2 and 12.1
Note 9866045.8 Bug 9866045 - Long wait on 'wait for master scn' in LCK causing long row cache lock waits - fixed in 12.1
Note 8666117.8 Bug 8666117 - High row cache latch contention in RAC - fixed in 11.2.0.2 and 12.1
Note 9866045.8 Bug 9866045 - Long wait on 'wait for master scn' in LCK causing long row cache lock waits - fixed in 12.1
References
NOTE:468334.1 - How to Match a Row Cache Object Child Latch to its Row Cache 相关内容 产品
|
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/38267/viewspace-710824/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/38267/viewspace-710824/