http://www.xifenfei.com/2012/05/library-cache-latch%e7%ad%89%e5%be%85%e4%ba%8b%e4%bb%b6.html
产生library cache latch原因
The library cache latches protect the cached SQL statements and objects' definitions held
in
the library cache within the shared pool. The library cache latch must be acquired
in
order to add a new statement to the library cache. During a parse, Oracle searches
the library cache
for
a matching statement. If one is not found,
then
Oracle will parse
the SQL statement, obtain the library cache latch and insert the new SQL.
|
综上可知: 在sql执行的过程中可以看出在出现High Versions Count和Hard Parse的情况下都有可能出现library cache latch等待.
http://www.dbafree.net/?p=810
library cache: mutex X替代了之前的library cache latch,主要作用是在hash bucket中定位handle时使用。(比如SQL硬解析时,需要往hash bucket中新增一个cursor时,需要library cache latch)。如下图所示:
文档上面的解释如下:
The library cache mutex is acquired for similar purposes that the library cache latches were acquired in prior versions of Oracle. In 10g, mutexes were introduced for certain operations in the library cache. Starting with 11g, the library cache latches were replaced by mutexes, hence this new wait event.
Mutexes are a lighter-weight and more granular concurrency mechanism than latches. Mutexes take advantage of CPU architectures that offer the compare and swap instructions (or similar). The reason for obtaining a mutex in the first place, is to ensure that certain operations are properly managed for concurrency. E.g., if one session is changing a data structure in memory, then another session must wait to acquire the mutex before it can make a similar change – this prevents unintended changes that would lead to corruptions or crashes if not serialized.
This wait event is present whenever a library cache mutex is held in exclusive mode by a session and other sessions need to wait for it to be released. There are many different operations in the library cache that will require a mutex, so its important to recognize which “location” (in Oracle’s code) is involved in the wait. “Location” is useful to Oracle Support engineers for diagnosing the cause for this wait event.
Individual Waits:
Parameters:
P1 = “idn” = Unique Mutex Identifier
P2 = “value”
P3 = “where” = location in code (internal identifier) where mutex is being waited for
@The meaning of the code for “where” can be found by looking in kgl0.h for entries with the prefix “”kglml_XXX”. For example, if P3=2, then it corresponds to “kglml_kglget2″. You can then search source code for this symbol to see where the mutex is acquired.
测试一个硬解析的场景如下:
conn scott/tiger create table testlib1 (id number) ; create table testlib2 (id number) ; create table testlib3 (id number) ; create table testlib4 (id number) ; create table testlib5 (id number) ; create table testlib6 (id number) ; create table testlib7 (id number) ; create table testlib8 (id number) ; create table testlib9 (id number) ; create table testlib10 (id number) ; create table testlib11 (id number) ; create table testlib12 (id number) ; create table testlib13 (id number) ; create table testlib14 (id number) ; create table testlib15 (id number) ; create table testlib16 (id number) ; create table testlib17 (id number) ; create table testlib18 (id number) ; create table testlib19 (id number) ; create table testlib20 (id number) ; vi test4.sh #!/bin/ksh i="$1" while true do echo $i sqlplus scott/tiger << EOF select * from testlib$i where id = $RANDOM; exit EOF done alter system flush shared_pool; exec DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT (); nohup sh test4.sh 1 & nohup sh test4.sh 2 & nohup sh test4.sh 3 & nohup sh test4.sh 4 & nohup sh test4.sh 5 & nohup sh test4.sh 6 & nohup sh test4.sh 7 & nohup sh test4.sh 8 & nohup sh test4.sh 9 & nohup sh test4.sh 10 & nohup sh test4.sh 11 & nohup sh test4.sh 12 & nohup sh test4.sh 13 & nohup sh test4.sh 14 & nohup sh test4.sh 15 & nohup sh test4.sh 16 & nohup sh test4.sh 17 & nohup sh test4.sh 18 & nohup sh test4.sh 19 & nohup sh test4.sh 20 & exec DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT ();
生成awr report,查看等待事件:
Top 5 Timed Foreground Events ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Avg wait % DB Event Waits Time(s) (ms) time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- DB CPU 104 111.9 library cache: mutex X 1,920 3 1 2.9 Concurrenc latch: shared pool 237 1 5 1.2 Concurrenc latch: row cache objects 146 1 5 .9 Concurrenc cursor: pin S wait on X 42 1 13 .6 Concurrenc ^LHost CPU (CPUs: 16 Cores: 8 Sockets: 2) ~~~~~~~~ Load Average Begin End %User %System %WIO %Idle --------- --------- --------- --------- --------- --------- 1.12 6.89 25.0 18.5 2.7 56.1
我们的hard parse
Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~ --------------- --------------- ---------- ---------- DB Time(s): 2.8 4.0 0.00 0.00 DB CPU(s): 3.1 4.5 0.00 0.00 Redo size: 31,740.9 46,051.8 Logical reads: 3,625.8 5,260.6 Block changes: 83.5 121.1 Physical reads: 0.2 0.4 Physical writes: 0.2 0.2 User calls: 2,719.4 3,945.5 Parses: 1,856.1 2,693.0 Hard parses: 137.4 199.4 W/A MB processed: 0.8 1.2 Logons: 129.7 188.1 Executes: 1,830.6 2,656.0 Rollbacks: 0.2 0.3 Transactions: 0.7
查了下metalink,与"library cache: mutex X"等待相关的BUG大约有30多个,会在11G各个版本之间发生,所以碰到这个等待事件,可以先确认下是不是碰上BUG了。以下两个BUG,在网上好象大家经常碰到,给出metalink相关的描述:
Bug 5928271 - Excessive waits on "library cache: mutex X" [ID 5928271.8]
Versions >= 11.1.0.6 but BELOW 11.2
This fix alleviates some waits on "library cache: mutex X" when looking
up a library cache object.
Bug 9530750 High waits for 'library cache: mutex X' for cursor Build lock
Versions BELOW 12.1
High waits may be seen for "library cache: mutex X" for
a build lock mutex with a call stack including kksGetBuildLock.
Rediscovery Notes:
The mutex is for an object with a name like "$BUILD$.xxxxxxxx"
http://blog.itpub.net/18474/viewspace-1060805/
library cache: mutex等待事件分析方法及案例
mutex是11g中采用的轻量级锁,替代latch(更轻量,更细粒度)。其中mutex X表示eXclusive模式,如果是mutex s则是共享模式。引起library cache mutex冲突的原因很多。
[@more@]
一、 Library cache: mutex等待事件产生常见原因:
l os资源不足(cpu、内存)
l sga设置不合理,shared_pool不足,或动态调整导致hard parse
l hot object contention
l hard parse
l sql high version count 由于子游标太多,扫描时会形成锁
l library cache object失效导致重编译
l bug
二、 诊断方法
1. 首先检查os资源如cpu内存是否存在不足
2. 其次,检查是否存在Hot object contention
方法一:通过ash报告
Top sql with top events最活跃的sql及其top wait events
Top sql with top row sources 最活跃的sql及其详细运行计划信息,可以凭此定位sql中
哪一部分最消耗时间。
Top event p1/p2/p3 values显示wait event详细参数
方法二:通过sql
通过sql发现hot objects:
select KGLNAOBJ, KGLNAOWN, KGLHDNSP, KGLOBTYP, KGLOBT23, KGLOBT24
from x$kglob where KGLOBT23 > 1000000 or KGLOBT24 > 1000000
order by KGLOBT24;
也可以用以下sql找到相应sql,hot objects,以及每句sql等待事件占比
select sw.p1,s.sql_id, count(*),(ratio_to_report(count(*)) over ())*100 pct
from v$session s,v$session_wait sw
where s.event like '%mutex%'and s.sid=sw.sid
group by sw.p1,s.sql_id
order by count(*) desc;
select sql_id,sql_text from v$sql where sql_id=…;
如果要查看历史信息,可以从历史表查询:
select * from (
select p1, sql_id,
count(*),
(ratio_to_report(count(*)) over ())*100 pct
from dba_hist_active_sess_history
where event='library cache: mutex X'
and snap_id between and
and dbid =
group by p1, sql_id
order by count(*) desc)
where rownum <= 10;
然后根据p1查热点对象
select KGLNAOBJ, KGLNAOWN, KGLHDNSP, KGLOBTYP
from x$kglob where KGLNAHSH= ;
3. 如果不存在hot object contention,则检查AWR报告(和baseline做比较)
1) Hard parse
检查awr的load profile中的hard parse再检查sql orderd by pase call)
检查是否发生了shared_pool size的调整
2) high version count高
检查sql orderd by version count是否正常
cursor_sharing参数是否设成了SIMILAR(建议用FORCE取代)
3) Library cache object失效
检查 Library Cache Activity的invalidations. 如果失效很多,检查是否执行了ddl操作
check ddl's performed during the time such as truncate, drop, grants, dbms_stats
4. mutex sleeps
检查AWR中'Mutex Sleep Summary',主要关注location和times,
如果Sleeps次数多但time少,则影响不大。
也可通过sql查询(注意是instance启动以来的,而不是某个时段内)
select to_char(sysdate, 'HH:MI:SS') time, KGLNAHSH hash, sum(sleeps) sleeps,location,MUTEX_TYPE
, substr(KGLNAOBJ,1,40) object
from x$kglob , v$mutex_sleep --也可查历史表v$mutex_sleep_history
where kglnahsh=mutex_identifier
group by KGLNAOBJ,KGLNAHSH,location,MUTEX_TYPE
order by sleeps