遭遇Cursor:pin s wait x and Ora-04031

从昨天晚上开始某核心库过十几分钟就会报出Cursor:pin s wait x的告警短信,而且很频繁,第二天到公司查了下原因:

首先收集了下相应时间ASH报告和AWR报告,显示的信息如下:

Top 5 Timed Events 
Event                  Waits Time(s)  Avg Wait(ms) %    Total Call Time    Wait Class 
cursor: pin S wait on X     1,091,         611 10,688 10      45.1           Concurrency 
row cache lock           95,543         6,924 72           29.2           Concurrency 
收集ASH信息,间隔为15分钟,看到如下信息:
Event Event               Class       % Activity Avg    Active Sessions 
cursor: pin S wait on X     Concurrency      55.94          85.03 
row cache lock           Concurrency      34.39          52.28
library cache load lock     Concurrency      2.73           4.15

可以看到Active Sessions的数量相当高,在当时堵塞的session达到85。

继续查看:

select sample_id,session_id,
       to_char(SAMPLE_TIME,'YY-MM-DD:HH:MI:SS') samp_time,
       sql_id,
       session_state,
       event,
       p1,
       p1text,
       p2,
       p2text,
       p3,
       p3text,
       wait_time,
       time_waited
from v$active_session_history  --DBA_HIST_ACTIVE_SESS_HISTORY
  where to_char(SAMPLE_TIME,'YY-MM-DD:HH:MI:SS') between
 '12-05-16:07:00:00' and '12-05-16:08:00:00' and event like'cursor%' and p1=2057858921
  order by sample_id desc;


查看blocking sid:

SELECT decode(trunc(24386824306688/4294967296),
           0,trunc(24386824306688/65536),
	     trunc(24386824306688/4294967296)) SID_HOLDING_MUTEX
 FROM dual;
SID_HOLDING_MUTEX
--------------------------------
5678

查找阻塞此session的sid为:5678,那么看看session 5678究竟在做些什么。

select sample_id,session_id,   
       sql_id,
       session_state,
       event,
       p1,       
       p2,
       p3,
       wait_time,
       time_waited
from DBA_HIST_ACTIVE_SESS_HISTORY
  where to_char(SAMPLE_TIME,'YY-MM-DD:HH:MI:SS') between
 '12-05-16:07:00:00' and '12-05-16:08:00:00'  and session_id=5678
  order by sample_id desc;
SESSION_ID  SQL_ID       STATE       EVENT        P1    P2   P3    WAIT_TIME  TIME_WAITED
---------- ------------- ------------- --------------- ----- ----- ----  ---------- ----
    5678   g3176qdxahvv9 WAITING       row cache lock   8     0    3      0          0
此session也在等待,只不过等待的事件是:row cache lock。
select cache#,parameter,type,count,gets,getmisses from v$rowcache where cache#=8;
 
  CACHE# PARAMETER             TYPE           COUNT    GETS     GETMISSES
-------- -------------------- ----------- ---------- ---------- ----------
     8   dc_objects            PARENT          1410  -76918095    1851821
     8   dc_object_grants      SUBORDINATE     91     840478      47918
Oracle文档对dc_objects的解释:DC_OBJECTS 
Look for any object compilation activity which might require an exclusive lock and thus block online activity. 

现在看来是大量的解析导致的,难道是有什么作业在定时的做某种DDL操作,导致相关对象上的游标失效,于是,抓了个session,查看相关sql的对象,发现并没有做ddl操作。那为什么会间隔的导致如此多的硬解析呢。

此时,当我执行此SQL:select s.sid, s.username,s.sql_id,s.module,s.osuser,s.event, s.blocking_session,

s.seconds_in_wait,s.wait_time,w.p1,w.p2,w.p3

from v$session s,v$session_wait w,v$sql q

where s.sid=w.sid

and s.statein ('WAITING')

and s.sql_id=q.sql_id

and s.wait_class !='Idle'; 

数据库抛出了

ORA-04031: unable to allocate 370048 bytes of sharedmemory ("shared pool","unknown object",""sgaheap(4,0),"hng: All sessions data for API.")

ORA-0431,这个很经典的错误,导致此错误的原因也有很多,我们就不多说,看看我的错误的参数部分,sga heap(4,0),说明在第四个subpool里找不到370048bytes,那继续来看看数据库的共享池配置及使用情况:

select  a.ksppinm  "Parameter",
        b.ksppstvl "Session Value",
        c.ksppstvl "Instance Value"
from x$ksppi a, x$ksppcv b, x$ksppsv c
where a.indx = b.indx and a.indx = c.indx
and a.ksppinm in('_kghdsidx_count','__shared_pool_size','_shared_pool_reserved_pct');
Parameter                     Session Value   Instance Value
---------------------------- --------------- -----------------
_kghdsidx_count                7                 7  
_shared_pool_size              6442450944        6442450944
_shared_pool_reserved_pct      5                 5
系统有6G的共享池,7个subpool,另外我们的sga_target=0.
SQL> select
  2    kghlurcr,
  3    kghlutrn,
  4    kghlufsh,
  5    kghluops,
  6    kghlunfu,
  7    kghlunfs
  8  from
  9    sys.x$kghlu
 10  where
 11    inst_id = userenv('Instance')
 12  /
 RECURRENT  TRANSIENT   FLUSHED   PINS AND    ORA-4031  LAST ERROR
    CHUNKS     CHUNKS  CHUNKS   RELEASES      ERRORS       SIZE
----------       ----------   ----------   ----------         ---------    ----------
       852        865    1367514  286598314       1280     370048
      1932      36615     597116  245240992        578     370024
      6663       7618    2208370   70999571       2037     370000
       907        960    1207376   64136373       1057     369976
       892       1141    1749793  138707112        701     369952
      1060       1297    1795660   52162243       1598     369928
      1593       2149    1295995  176694298       1386     369904

7 rows selected.
可以看到last error size为370048,正好和我们的报错吻合,那么,再来看看各个Subpool的空间使用情况:


再来看看reserved shared pool信息:

SQL> select used_space, requests, request_misses, last_miss_size
  2  from v$shared_pool_reserved;
REM Look at the breakdown of chunks for more detail on fragmentation.
                                          Reserved
        Reserved            Total             Area          Size of
            Used         Requests           Misses        Last Miss
---------------- ---------------- ---------------- ----------------
     232,613,024          623,742            9,324          370,048
select free_space,avg_free_size,used_space,avg_used_size,request_failures,last_failure_size
from v$shared_pool_reserved;
FREE_SPACE AVG_FREE_SIZE USED_SPACE AVG_USED_SIZE REQUEST_FAILURES LAST_FAILURE_SIZE
---------- ------------- ---------- ------------- ---------------- -----------------
  89753440    56734.1593  232613024     147037.31             9713            370048

参数:

_shared_pool_reserved_min_alloc     4400                      4400

可以看到request_failures>0并且last_failure_size > _shared_pool_reserved_min_alloc,name ORA-04031错误就可能是因为共享池保留空间缺少连续空间所致。要解决这个问题,可以考虑加大_shared_pool_reserved_min_allock来降低缓冲区共享池保留空间的对象数目,并增大shared_pool_reserved_size来加大共享池保留空间的可用内存。当抛出了ORA-04031,当抛出此错误的同时,此核心库的Cursor:pin s wait x再次出现,并且大量导致session堵塞,经过查看相关的监控作业,发现有个作业会定期执行一条sql(select count(*) from gv$session;).但是,为什么只有在查询v$session的时候才会引发呢?而且,这里只是一个查询信息,为什么会出现wait x?

从一开始这个问题就走入了误区,大量session的cursor:pin s wait x其实只是表象,并不是真正的问题原因。那么,有这么一种可能: Oracle会了帮助诊断4031错误,出现这个错误的时候可能会做一次SGA的dump,这个过程可能会长时间或频繁地持有一些latch,这样会导致SQL解析出现问题。但是,我发现4031生成的trace文件很小,只有0.2M,也不好确定。于是尝试修改参数,在测试一下:

Alter system set “_4031_dump_bitvec”=o scope=memory

再次执行v$session的相关查询,查看结果,还是有大量的cursor:Pin s wait x,row cache lock.正如文章开头的信息,导致cursor:pins wait x的事件是row cache lock,那具体的是dc_objects,dc_objects_grant,所以猜测查询v$session的时候,有row cache一直没有成功装入,引起rowcache lock,影响了其他SQL的解析,但是因为系统是核心库,我尝试dump systemstat,结果CPU迅速飙升到80%,我也只好终止。所以,这个问题最后还是修改session_cached_cursor为200,重启数据库。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值