--ORA-04031是个经典错误,一般情况是由于shared pool被耗尽导致,例如:未绑定变量导致的hard parse或大并发的shared pool本身太小,一般情况ORA-04031发生时会同时又相关等待事件出现
cursor: pin S wait on X;library cache lock;latch: library cache;latch: shared pool等。
AWR里shared Pool Advisory的各项指标说明,此区域可以说明是否需要调节shared_pool
Est LC===> Estimate library cache
Shared Pool Size(M)==>用以评估的shared pool共享池大小,在AMM /ASMM环境下shared_pool大小都可能浮动
SP Size Factr==>共享池大小的比例因子,(Shared Pool Size for Estim/SHARED_POOL_SIZE),当此值为1时,shared pool size是实际的大小
Est LC Size(M)==>评估的library cache大小,单位为MB,因为是shared pool中包含library cache当然还有其他例如dictionary cache
Est LC Mem Obj==>评估指定大小共享池内的library cache memory object的数量
Est LC Time Saved(s)==>指定的共享池大小情况下可找到需要的library cache memory objects从而节约的解析时间,这些节约的解析时间也是花费在共享池内重复加载需要的对象(reload),
这些对象可能因为共享池没有足够的free memory而被刷出去
Est LC Time Saved Factr==>Est LC Time Saved(s)的比例因子(Est LC Time Saved(s)/Current LC Time Saved(s))
Est LC Load Time (s)==>在指定的共享池大小情况下解析的耗时
Est LC Load Time Factr==>Est LC Load Time(s)的比例因子,(Est LC Load Time(s)/Current LC Load Time(s))
Est LC Mem Obj Hits(K)==>在指定的共享池大小情况下需要的library cache memory object正好在共享池中被找到的次数
相关信息来自v$shared_pool_advice视图
---此语句就对应着awr上Library Cache Activity区域---此区域可以判定是否是shared pool小导致,一般shared Pool Advisory也可反映出来这个问题
SELECT l.NAMESPACE,--library cache里相关对象
l.GETS,--相关对象library cache lock被申请的次数
(1 - l.GETHITRATIO) * 100 " get misses pct",--丢失,即未申请到local
l.GETHITS,--相关对象正好在library cache handle找到的次数
l.PINS,--这个对象上的被申请pin锁的次数
(1 - l.PINHITRATIO) * 100 "pins misses pct",--丢失,即未申请到latch
l.PINHITS,--pin对象的heap metadata正好在shared pool上的次数
l.RELOADS,--从object handle被创建建开始不是第一次PIN该对象的PIN ,且该次PIN要求对象从磁盘上读取加载的次数 ;Reloads值较高的情况 建议增大shared_pool_size
l.INVALIDATIONS--对象被修改导致该命名空间所包含对象被标记为无效的次数
FROM V$LIBRARYCACHE l;
某节点直接因ORA-4031错误hang死实例,
Errors in file /u01/app/oracle/diag/rdbms/querydb/querydb2/trace/querydb2_m000_21164.trc:
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KKSSP^2372","kglseshtSegs")
Wed Oct 08 21:44:11 2014
Process m000 died, see its trace file
Wed Oct 08 21:44:15 2014
Errors in file /u01/app/oracle/diag/rdbms/querydb/querydb2/trace/querydb2_smon_7226.trc:
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^e949e218","kglHeapInitialize:temp")
Wed Oct 08 21:44:31 2014
Errors in file /u01/app/oracle/diag/rdbms/querydb/querydb2/trace/querydb2_smon_7226.trc:
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","lock table sys.mon_mods$ in ...","SQLA","tmp")
Wed Oct 08 21:44:47 2014
Errors in file /u01/app/oracle/diag/rdbms/querydb/querydb2/trace/querydb2_smon_7226.trc:
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^e949e218","kglHeapInitialize:temp")
---从上面的错误可以看到问题发生在shared pool上,而不是large pool 或java pool,调去awr报告来看
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 10031 04-Oct-14 20:00:18 457 8.7
End Snap: 10032 04-Oct-14 21:00:01 791 9.3
Elapsed: 59.72 (mins)
DB Time: 6,091.04 (mins)==>数据库相应时间超高
Cache Sizes Begin End
~~~~~~~~~~~ ---------- ----------
Buffer Cache: 40,832M 40,832M Std Block Size: 8K
Shared Pool Size: 4,224M 4,224M Log Buffer: 44,656K
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ --------------- --------------- ---------- ----------
DB Time(s): 102.0 12.7 0.14 0.18
DB CPU(s): 2.0 0.3 0.00 0.00
Redo size: 282,965.6 35,170.7
Logical reads: 31,776.1 3,949.6
Block changes: 1,168.5 145.2
Physical reads: 940.6 116.9
Physical writes: 186.2 23.2
User calls: 573.1 71.2
Parses: 213.4 26.5
Hard parses: 66.4 8.3==>同比和环比下,此时间段的hard parse太高
W/A MB processed: 4.4 0.6
Logons: 0.2 0.0
Executes: 717.9 89.2
Rollbacks: 0.0 0.0
Transactions: 8.1
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg
wait % DB
Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
cursor: pin S wait on X 34,869 204,419 5862 55.9 Concurrenc==>
SGA: allocation forcing compon 1,562,520 91,720 59 25.1 Other
library cache: mutex X 27,076 30,436 1124 8.3 Concurrenc
db file sequential read 1,821,755 8,849 5 2.4 User I/O
DB CPU 7,280 2.0
-> "Pct Misses" should be very low
Get Pct Pin Pct Invali-
Namespace Requests Miss Requests Miss Reloads dations
--------------- ------------ ------ -------------- ------ ---------- --------
ACCOUNT_STATUS 1,128 0.4 0 N/A 0 0
BODY 471 8.9 764 16.6 84 0
CLUSTER 6,040 0.2 6,112 0.2 0 0
DBLINK 1,195 0.0 0 N/A 0 0
EDITION 552 0.0 1,087 0.0 0 0
INDEX 216,927 24.3 216,480 24.7 789 0
LOCATION 84 17.9 84 17.9 0 0
OBJECT ID 136 100.0 0 N/A 0 0
QUEUE 6 33.3 1,009 0.5 3 0
RULESET 0 N/A 14 14.3 2 0
SCHEMA 5,826 0.0 0 N/A 0 0
SQL AREA 811,075 6.7 3,813,011 23.3 482,756 13,900==>shared pool 太小,从而pin对象reload很大,导致前面的硬解析很高,从而
SQL AREA BUILD 537,350 11.6 0 N/A 0 0
SQL AREA STATS 209,927 95.8 209,258 95.8 61 0
SUBSCRIPTION 0 N/A 84 26.2 22 0
TABLE/PROCEDURE 261,995 0.9 1,092,520 19.6 59,542 0
TRIGGER 3,526 42.7 3,526 42.8 4 0
-------------------------------------------------------------
-----通过查询发生ORA-04031时间段的cursor:pin S wait on X等待,发现有有2k多个
SELECT event, blocking_session, sql_id, COUNT(*)
FROM dba_hist_active_sess_history ash
WHERE sample_time >=
to_timestamp('2014-10-04 20:00:00', 'yyyy-mm-dd hh24:mi:ss')
AND sample_time <=
to_timestamp('2014-10-04 21:10:00', 'yyyy-mm-dd hh24:mi:ss')
AND event = 'cursor: pin S wait on X'
GROUP BY event, blocking_session, sql_id;
SELECT event, blocking_session, sql_id, COUNT(*)
FROM dba_hist_active_sess_history ash
WHERE sample_time >=
to_timestamp('2014-10-04 20:00:00', 'yyyy-mm-dd hh24:mi:ss')
AND sample_time <=
to_timestamp('2014-10-04 21:10:00', 'yyyy-mm-dd hh24:mi:ss')
AND ash.session_id = 4284
GROUP BY event, blocking_session, sql_id;
----原因就是:shared pool太小,正好此时间段是数据库的业务高峰期,从而当后面查询的对象信息在前面已经加载到了shared pool中,
由于太小,从而很快被刷出去,导致后面一次的访问,要重新做硬解析,这点从解析和reload次数看出;并发高,导致要以X模式pin住library cache object,
这就意味这个process acquires the mutex in exclusive mode即library cache latch以X模式,多个session是这样,从而产生cursor: pin S wait on X等待。
shared pool太小&&并发高===导致shared pool里的library cache共享游标被刷出去,产生大量硬解析==>很多session以library cache latch以X模式===>产生大量不同sql_id的cursor: pin S wait on X===>需要调整shared_pool_size
内存中shared pool结构及机制
Shared Pool当中主要包含了2部分:library cache和dictionary cache;
Library cache包含了共享SQL区(shared SQL areas),私有SQL区(private SQLareas,如果配置了共享服务器),PL/SQL存储过程以及包,还有一些控制信息,比如说locks以及library cache handles。
Dictionary cache包含了表,视图的依赖信息,比如表结构,它的用户,Oracle在解析SQL的时候就会频繁的访问dictionary cache。
library chache结构,是有多个hash bucket组成的hash table
多个hash bucket<====有很多的lch组成的lch list======>library cache handle<==lch里有大量对象指针,通过此指针,指到相关对象========>library cache handle<==========>....
lch指向的library cache object对象大概有
dependency table:指向本对象所依赖的对象。
child table:指向本对象的子对象,比如某个游标的子游标
authorization table:对象的授权信息。
type:shared cursor,index,table,cluster,view,synonym,sequence,procedure,function,package,table body,package body,trigger等等。
data blocks:data block是一个指针,指向了data heap,即存放真实数据的地方
相关锁发生场景:
library cache lock latch获得lock后,则就释放lock latch,必须先获得latch===>library cache lock发生在查找lch时====通过lch找lco时==>===先申请library cache pin latch,获得pin后,则就释放pin latch,必须先获得latch===>library cache pin发生在查找访问lco的heap上
Library Cache lock有3中模式:
Share(S):当读取一个library cache object的时候获得
Exclusive(X):当创建/修改一个library cache object的时候获得
Null(N):用来确保对象依赖性
Library Cache pin有2种模式:
Share(S):读取object heap
Exclusive(X):修改object heap
从Oracle10g开始,Oracle正在逐步用mutex取代library cache中的latch,
cursor:pin S和cursor:pin X相当于share和exclusive类型的library cache pin,
cursor:pin S wait on X则表示share方式正在等待exclusive锁定
详情相关信息,请参考oracle11g核心技术
cursor: pin S wait on X;library cache lock;latch: library cache;latch: shared pool等。
AWR里shared Pool Advisory的各项指标说明,此区域可以说明是否需要调节shared_pool
Est LC===> Estimate library cache
Shared Pool Size(M)==>用以评估的shared pool共享池大小,在AMM /ASMM环境下shared_pool大小都可能浮动
SP Size Factr==>共享池大小的比例因子,(Shared Pool Size for Estim/SHARED_POOL_SIZE),当此值为1时,shared pool size是实际的大小
Est LC Size(M)==>评估的library cache大小,单位为MB,因为是shared pool中包含library cache当然还有其他例如dictionary cache
Est LC Mem Obj==>评估指定大小共享池内的library cache memory object的数量
Est LC Time Saved(s)==>指定的共享池大小情况下可找到需要的library cache memory objects从而节约的解析时间,这些节约的解析时间也是花费在共享池内重复加载需要的对象(reload),
这些对象可能因为共享池没有足够的free memory而被刷出去
Est LC Time Saved Factr==>Est LC Time Saved(s)的比例因子(Est LC Time Saved(s)/Current LC Time Saved(s))
Est LC Load Time (s)==>在指定的共享池大小情况下解析的耗时
Est LC Load Time Factr==>Est LC Load Time(s)的比例因子,(Est LC Load Time(s)/Current LC Load Time(s))
Est LC Mem Obj Hits(K)==>在指定的共享池大小情况下需要的library cache memory object正好在共享池中被找到的次数
相关信息来自v$shared_pool_advice视图
---此语句就对应着awr上Library Cache Activity区域---此区域可以判定是否是shared pool小导致,一般shared Pool Advisory也可反映出来这个问题
SELECT l.NAMESPACE,--library cache里相关对象
l.GETS,--相关对象library cache lock被申请的次数
(1 - l.GETHITRATIO) * 100 " get misses pct",--丢失,即未申请到local
l.GETHITS,--相关对象正好在library cache handle找到的次数
l.PINS,--这个对象上的被申请pin锁的次数
(1 - l.PINHITRATIO) * 100 "pins misses pct",--丢失,即未申请到latch
l.PINHITS,--pin对象的heap metadata正好在shared pool上的次数
l.RELOADS,--从object handle被创建建开始不是第一次PIN该对象的PIN ,且该次PIN要求对象从磁盘上读取加载的次数 ;Reloads值较高的情况 建议增大shared_pool_size
l.INVALIDATIONS--对象被修改导致该命名空间所包含对象被标记为无效的次数
FROM V$LIBRARYCACHE l;
某节点直接因ORA-4031错误hang死实例,
Errors in file /u01/app/oracle/diag/rdbms/querydb/querydb2/trace/querydb2_m000_21164.trc:
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KKSSP^2372","kglseshtSegs")
Wed Oct 08 21:44:11 2014
Process m000 died, see its trace file
Wed Oct 08 21:44:15 2014
Errors in file /u01/app/oracle/diag/rdbms/querydb/querydb2/trace/querydb2_smon_7226.trc:
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^e949e218","kglHeapInitialize:temp")
Wed Oct 08 21:44:31 2014
Errors in file /u01/app/oracle/diag/rdbms/querydb/querydb2/trace/querydb2_smon_7226.trc:
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","lock table sys.mon_mods$ in ...","SQLA","tmp")
Wed Oct 08 21:44:47 2014
Errors in file /u01/app/oracle/diag/rdbms/querydb/querydb2/trace/querydb2_smon_7226.trc:
ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^e949e218","kglHeapInitialize:temp")
---从上面的错误可以看到问题发生在shared pool上,而不是large pool 或java pool,调去awr报告来看
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 10031 04-Oct-14 20:00:18 457 8.7
End Snap: 10032 04-Oct-14 21:00:01 791 9.3
Elapsed: 59.72 (mins)
DB Time: 6,091.04 (mins)==>数据库相应时间超高
Cache Sizes Begin End
~~~~~~~~~~~ ---------- ----------
Buffer Cache: 40,832M 40,832M Std Block Size: 8K
Shared Pool Size: 4,224M 4,224M Log Buffer: 44,656K
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ --------------- --------------- ---------- ----------
DB Time(s): 102.0 12.7 0.14 0.18
DB CPU(s): 2.0 0.3 0.00 0.00
Redo size: 282,965.6 35,170.7
Logical reads: 31,776.1 3,949.6
Block changes: 1,168.5 145.2
Physical reads: 940.6 116.9
Physical writes: 186.2 23.2
User calls: 573.1 71.2
Parses: 213.4 26.5
Hard parses: 66.4 8.3==>同比和环比下,此时间段的hard parse太高
W/A MB processed: 4.4 0.6
Logons: 0.2 0.0
Executes: 717.9 89.2
Rollbacks: 0.0 0.0
Transactions: 8.1
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg
wait % DB
Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
cursor: pin S wait on X 34,869 204,419 5862 55.9 Concurrenc==>
SGA: allocation forcing compon 1,562,520 91,720 59 25.1 Other
library cache: mutex X 27,076 30,436 1124 8.3 Concurrenc
db file sequential read 1,821,755 8,849 5 2.4 User I/O
DB CPU 7,280 2.0
-> "Pct Misses" should be very low
Get Pct Pin Pct Invali-
Namespace Requests Miss Requests Miss Reloads dations
--------------- ------------ ------ -------------- ------ ---------- --------
ACCOUNT_STATUS 1,128 0.4 0 N/A 0 0
BODY 471 8.9 764 16.6 84 0
CLUSTER 6,040 0.2 6,112 0.2 0 0
DBLINK 1,195 0.0 0 N/A 0 0
EDITION 552 0.0 1,087 0.0 0 0
INDEX 216,927 24.3 216,480 24.7 789 0
LOCATION 84 17.9 84 17.9 0 0
OBJECT ID 136 100.0 0 N/A 0 0
QUEUE 6 33.3 1,009 0.5 3 0
RULESET 0 N/A 14 14.3 2 0
SCHEMA 5,826 0.0 0 N/A 0 0
SQL AREA 811,075 6.7 3,813,011 23.3 482,756 13,900==>shared pool 太小,从而pin对象reload很大,导致前面的硬解析很高,从而
SQL AREA BUILD 537,350 11.6 0 N/A 0 0
SQL AREA STATS 209,927 95.8 209,258 95.8 61 0
SUBSCRIPTION 0 N/A 84 26.2 22 0
TABLE/PROCEDURE 261,995 0.9 1,092,520 19.6 59,542 0
TRIGGER 3,526 42.7 3,526 42.8 4 0
-------------------------------------------------------------
-----通过查询发生ORA-04031时间段的cursor:pin S wait on X等待,发现有有2k多个
SELECT event, blocking_session, sql_id, COUNT(*)
FROM dba_hist_active_sess_history ash
WHERE sample_time >=
to_timestamp('2014-10-04 20:00:00', 'yyyy-mm-dd hh24:mi:ss')
AND sample_time <=
to_timestamp('2014-10-04 21:10:00', 'yyyy-mm-dd hh24:mi:ss')
AND event = 'cursor: pin S wait on X'
GROUP BY event, blocking_session, sql_id;
SELECT event, blocking_session, sql_id, COUNT(*)
FROM dba_hist_active_sess_history ash
WHERE sample_time >=
to_timestamp('2014-10-04 20:00:00', 'yyyy-mm-dd hh24:mi:ss')
AND sample_time <=
to_timestamp('2014-10-04 21:10:00', 'yyyy-mm-dd hh24:mi:ss')
AND ash.session_id = 4284
GROUP BY event, blocking_session, sql_id;
----原因就是:shared pool太小,正好此时间段是数据库的业务高峰期,从而当后面查询的对象信息在前面已经加载到了shared pool中,
由于太小,从而很快被刷出去,导致后面一次的访问,要重新做硬解析,这点从解析和reload次数看出;并发高,导致要以X模式pin住library cache object,
这就意味这个process acquires the mutex in exclusive mode即library cache latch以X模式,多个session是这样,从而产生cursor: pin S wait on X等待。
shared pool太小&&并发高===导致shared pool里的library cache共享游标被刷出去,产生大量硬解析==>很多session以library cache latch以X模式===>产生大量不同sql_id的cursor: pin S wait on X===>需要调整shared_pool_size
内存中shared pool结构及机制
Shared Pool当中主要包含了2部分:library cache和dictionary cache;
Library cache包含了共享SQL区(shared SQL areas),私有SQL区(private SQLareas,如果配置了共享服务器),PL/SQL存储过程以及包,还有一些控制信息,比如说locks以及library cache handles。
Dictionary cache包含了表,视图的依赖信息,比如表结构,它的用户,Oracle在解析SQL的时候就会频繁的访问dictionary cache。
library chache结构,是有多个hash bucket组成的hash table
多个hash bucket<====有很多的lch组成的lch list======>library cache handle<==lch里有大量对象指针,通过此指针,指到相关对象========>library cache handle<==========>....
lch指向的library cache object对象大概有
dependency table:指向本对象所依赖的对象。
child table:指向本对象的子对象,比如某个游标的子游标
authorization table:对象的授权信息。
type:shared cursor,index,table,cluster,view,synonym,sequence,procedure,function,package,table body,package body,trigger等等。
data blocks:data block是一个指针,指向了data heap,即存放真实数据的地方
相关锁发生场景:
library cache lock latch获得lock后,则就释放lock latch,必须先获得latch===>library cache lock发生在查找lch时====通过lch找lco时==>===先申请library cache pin latch,获得pin后,则就释放pin latch,必须先获得latch===>library cache pin发生在查找访问lco的heap上
Library Cache lock有3中模式:
Share(S):当读取一个library cache object的时候获得
Exclusive(X):当创建/修改一个library cache object的时候获得
Null(N):用来确保对象依赖性
Library Cache pin有2种模式:
Share(S):读取object heap
Exclusive(X):修改object heap
从Oracle10g开始,Oracle正在逐步用mutex取代library cache中的latch,
cursor:pin S和cursor:pin X相当于share和exclusive类型的library cache pin,
cursor:pin S wait on X则表示share方式正在等待exclusive锁定
详情相关信息,请参考oracle11g核心技术