东南亚印尼,系统A已上线好几年,数据量较小,一直运行正常,4月22号同事反应
系统处在hang的状态,一启动,就有好多session . 系统基本不可用。
OS:suse10
DBMS:11.1.0.7
ps -ef|grep ora|wc -l 发现进程数一直在增长
运行sql
-- 查找当前会话信息
select t3.sql_id, t3.sql_text, t1.SID, t1.SERIAL#, t1.STATUS, t1.PROGRAM, t1.LOGON_TIME, t1.EVENT, t1.SECONDS_IN_WAIT, t2.SPID
from v$session t1,
v$process t2,
(select sql_id, sql_text
from v$sql
group by sql_id, sql_text) t3
where t1.paddr = t2.addr
and t1.status = 'ACTIVE'
and t1.sql_id = t3.sql_id(+)
order by sql_text;
select t3.sql_id, t3.sql_text, t1.SID, t1.SERIAL#, t1.STATUS, t1.PROGRAM, t1.LOGON_TIME, t1.EVENT, t1.SECONDS_IN_WAIT, t2.SPID
from v$session t1,
v$process t2,
(select sql_id, sql_text
from v$sql
group by sql_id, sql_text) t3
where t1.paddr = t2.addr
and t1.status = 'ACTIVE'
and t1.sql_id = t3.sql_id(+)
order by sql_text;
发现大量的cursor: pin S wait on X等待事件
74c1hdswksgyt | select id,name from V_NE_NETYPE | 502 | 2 | ACTIVE | JDBC Thin Client | 2013/4/22 17:15:16 | single-task message |
gts5km94cpd3f | select id,name from v_ne_netype t | 515 | 2 | ACTIVE | JDBC Thin Client | 2013/4/22 17:15:16 | cursor: pin S wait on X |
gts5km94cpd3f | select id,name from v_ne_netype t | 506 | 2 | ACTIVE | JDBC Thin Client | 2013/4/22 17:15:16 | single-task message |
gts5km94cpd3f | select id,name from v_ne_netype t | 523 | 2 | ACTIVE | JDBC Thin Client | 2013/4/22 17:15:15 | cursor: pin S wait on X |
6ua7g9w10tnrj | select id,name from v_ne_netype where id='20' | 520 | 2 | ACTIVE | JDBC Thin Client | 2013/4/22 17:15:16 | cursor: pin S wait on X |
6ua7g9w10tnrj | select id,name from v_ne_netype where id='20' | 531 | 2 | ACTIVE | JDBC Thin Client | 2013/4/22 17:15:15 | cursor: pin S wait on X |
6ua7g9w10tnrj | select id,name from v_ne_netype where id='20' | 530 | 2 | ACTIVE | JDBC Thin Client | 2013/4/22 17:15:15 | cursor: pin S wait on X |
6ua7g9w10tnrj | select id,name from v_ne_netype where id='20' | 526 | 4 | ACTIVE | JDBC Thin Client | 2013/4/22 17:15:15 | cursor: pin S wait on X |
6ua7g9w10tnrj | select id,name from v_ne_netype where id='20' | 525 | 2 | ACTIVE | JDBC Thin Client | 2013/4/22 17:15:15 | cursor: pin S wait on X |
6ua7g9w10tnrj | select id,name from v_ne_netype where id='20' | 18 | 16 | ACTIVE | JDBC Thin Client | 2013/4/22 17:15:16 | cursor: pin S wait on X |
6ua7g9w10tnrj | select id,name from v_ne_netype where id='20' | 508 | 2 | ACTIVE | JDBC Thin Client | 2013/4/22 17:15:16 | cursor: pin S wait on X |
6ua7g9w10tnrj | select id,name from v_ne_netype where id='20' | 505 | 2 | ACTIVE | JDBC Thin Client | 2013/4/22 17:15:16 | cursor: pin S wait on X |
6ua7g9w10tnrj | select id,name from v_ne_netype where id='20' | 495 | 2 | ACTIVE | JDBC Thin Client | 2013/4/22 17:15:16 | single-task message |
cb8d7tmyz0gau | select id,name from v_ne_netype where id='30' | 524 | 2 | ACTIVE | JDBC Thin Client | 2013/4/22 17:15:15 | single-task message |
a7k5va9c6x21n | select id,name from v_ne_netype where 1=1 | 513 | 7 | ACTIVE | JDBC Thin Client | 2013/4/22 17:15:16 | cursor: pin S wait on X |
cursor: pin S wait on X 这是这周来第二次遇到此事件,上次是因为系统bug
cursor: pin S A session waits on this event when it wants to update a shared mutex pin and another session
is currently
in
the process of updating a shared mutex pin
for
the same cursor object. This wait event should
rarely be seen because a shared mutex pin update is very fast.(Wait Time: Microseconds)
引起这个等待事件有几个原因
1. 硬解析太高
2. shared_pool 大小
3. bug
第一个原因,系统已经运好几年了,虽然很多没有绑定变量,但也不会这么严重
调大了shared_pool
alter system set shared_pool_size=2g scope=both;
依然没有好转
出了awr 报告
在
SQL ordered by Parse Calls
里发现
有一个这个sql
begin :id := sys.dbms_transaction.local_transaction_id; end;
被解析1400多次。
这个处理分布式事务的一个application ,咨询同事有无分布式数据库的使用,比如dblink, 被告知只有webservice 没有dblink.(其实是有的,这里误导了我)
再回到开始在等待的sql里找出一
select id,name from v_ne_netype where id='20' 直接hang住, 不执行。
再问同事,v_ne_netype 是由一个dblink 的视图。 此dblink 指向系统B
此时的B 通过plsql 已不能正常联接。
原因找到
登陆B 系统,查看其监听,命令响应很慢。不能关闭,也不能重启。
日志也没有。 kill 掉监听进程,重启监听。 B系统正常。 A系统也正常。
原因:
B系统因为一些原因,监听异常,不能正确解析sql ,引发的联锁反应。
附张印尼照片