某个后台批量作业发起数据抽取操作后,在数据库里引起了上百个session都在等待名为” latch: row cache objects”的Event,当时在11.2.0.3数据库里观察到的session等待情况如下:
SID | username | program | event | sql_id | sql_text | status |
10985 | OB60 | JefOrm@QZ-ZGWEB08 | latch: row cache objects | dzy6cxkhps9c3 | select t.* from CD.CO_PROD_6 t where t.OBJECT_ID=:1 and t.PRODUCT_ID=:2 order by t.EXPIRE_DATE DESC | ACTIVE |
20285 | OB60 | JefOrm@JQ-ZGWEB04 | latch: row cache objects | 1d88gc8q9nnk9 | select t.* from BD.RS_TYPE_DEF t where t.ITEM_ID=:1 | ACTIVE |
50711 | OB60 | JefOrm@QZ-ZGWEB08 | latch: row cache objects | a82ycaftxuh04 | select t.* from PD.PM_PRODUCT_OFFERING t where t.PRODUCT_OFFERING_ID=:1 | ACTIVE |
36973 | OB60 | JefOrm@JQ-ZGAPP03 | latch: row cache objects | a82ycaftxuh04 | select t.* from PD.PM_PRODUCT_OFFERING t where t.PRODUCT_OFFERING_ID=:1 | ACTIVE |
59797 | OB60 | JefOrm@JQ-ZGAPP03 | latch: row cache objects | 8a2429zxyrwy9 | select t.* from AD.CA_ACCOUNT_RES_5 t where t.RESOURCE_ID=:1 and t.RELATIONSHIP_TYPE=:2 and t.EXPIRE_DATE>:3 | ACTIVE |
493 | OB60 | JefOrm@JQ-ZGAPP03 | latch: row cache objects | a82ycaftxuh04 | select t.* from PD.PM_PRODUCT_OFFERING t where t.PRODUCT_OFFERING_ID=:1 | ACTIVE |
293 | OB60 | JefOrm@JQ-ZGAPP03 | latch: row cache objects | 3zfcg18fj1th2 | select t.* from CD.CM_RES_IDENTITY_7 t where t.IDENTITY=:1 and t.IDENTITY_TYPE=:2 and t.EXPIRE_DATE>:3 | ACTIVE |
24615 | OB60 | JefOrm@JQ-ZGAPP03 | latch: row cache objects | fmd3bbtuxuh2q | select t.* from pd.PM_QUERY_ITEM_GROUP_DTL t where t.ITEM_ID=:1 | ACTIVE |
41373 | OB60 | JefOrm@JQ-ZGAPP03 | latch: row cache objects | c5qv3qs8w2yy7 | select t.* from AD.CA_ACCOUNT_RES_11 t where t.RESOURCE_ID=:1 and t.RELATIONSHIP_TYPE=:2 and t.EXPIRE_DATE>:3 | ACTIVE |
50499 | OB60 | JefOrm@JQ-ZGWEB08 | latch: row cache objects | 6mv3w3qds1sjh | select t.* from CD.CM_RES_IDENTITY_2 t where t.RESOURCE_ID=:1 and t.IDENTITY_TYPE=:2 and t.IDENTITY_ATTR=:3 and t.EXPIRE_DATE>:4 | ACTIVE |
39551 | OB60 | JefOrm@JQ-ZGWEB04 | latch: row cache objects | 57ns85nach42x | select t.* from CD.CM_RESOURCE_12 t where t.RESOURCE_ID=:1 and t.EXPIRE_DATE>:2 | ACTIVE |
49653 | OB60 | JefOrm@JQ-ZGWEB08 | latch: row cache objects | 1d88gc8q9nnk9 | select t.* from BD.RS_TYPE_DEF t where t.ITEM_ID=:1 | ACTIVE |
61373 | OB60 | JefOrm@JQ-ZGAPP03 | latch: row cache objects | 2sndsvy46b5ay | select t.* from CD.CM_RES_IDENTITY_6 t where t.IDENTITY=:1 and t.IDENTITY_TYPE=:2 and t.EXPIRE_DATE>:3 | ACTIVE |
。。。。。。篇幅关系没有列全,总共有100多个Session都在wait “latch: row cache objects”
断断续续一共持续了将近1小时,latch: row cache objects等待才彻底消失。
Row cache object的争用主要出现在语句解析时需要到data dictionary cache里对操作对象的数据字典信息进行访问以获取执行权限,对象属性、直方图等信息。事后对这个问题的做了如下分析:
相应时段的AWR报告如下:
(1) 每秒解析次数达到了3154次,每秒执行次数达到了15673次,正常情况下这个库的两个值分别为1000和9000左右,因为发起了批量任务致使parses和executions飙升
硬解析时间只占了总解析时间的1%,只占了0.68%的DB_TIME
(2) 因为大部分时间都在wait “latch:row cache object”,所以真正用于parse的cpu资源只占了1.89%
(3) “latch:row cache object”当仁不让的成为了Top 1
(4) Breakdown “latch:row cache object” event,我们找到”Dictionary Cache Stats”部分,列出了所有的DC对象,其中dc_users的请求量最大
(5) 在”Latch Sleep Breakdown”里可以看到row cache object的sleeps,spin gets均很高,misses变成了负值,估计是太大的缘故
相应时段的ASH报告如下:
我们可以将P1值转换成16进制,验证了dc_users是Data dictionary里的主要争用对象
SQL> Select to_char(504403275245182512,'xxxxxxxxxxxxxxxxxxx') from dual;
TO_CHAR(504403275245
--------------------
700001b3c885e30
SELECT kqrsttxt PARAMETER, kqrstcid CACHE#, kqrstcln "Lchild#", kqrstgrq "DCGets", l.gets "LGets", l.misses "Misses" FROM X$KQRST, V$LATCH_CHILDREN l WHERE l.addr like '%700001B3C885E30%' and l.child#=KQRSTCLN ORDER BY 1,2;
PARAMETER Lchild# DCGets LGets Misses
-------------------------------- ---------- ---------- ---------- ----------
dc_users 8 3053974768 4290466375 3872920205
dc_users 8 0 4290466375 3872920205
dc_users 8 4067920485 4290466375 3872920205
dc_users 8 117165043 4290466375 3872920205
有了上述信息后我们逐一排除了以下可能
1、 是否硬解析引起
从AWR报告里可以看出hard parses只占了总解析次数的1%,且涉事的SQL语句都采用了绑定变量,并且使用select * from v$sql_shared_cursor where sql_id=XX查出的结果都是N,说明SQL解析已经最大程度上实现了共享
2、 是否Shared pool设置过小
select * from v$sga_resize_ops order by start_time desc并没有发现系统自动加大shared pool的记录
3、 是否与个别session或者语句有关
从v$active_session_history里获取到如下信息
SESSION_ID | PROGRAM | BLOCKING_SESSION | EVENT | TO_CHAR(SAMPLE_TIME,'YYYYMMDDH |
7444 | JDBC Thin Client |
| latch: row cache objects | 20150813 22:21:19 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) | 24236 | latch: row cache objects | 20150813 22:30:18 |
7444 | JDBC Thin Client |
| latch: row cache objects | 20150813 22:30:52 |
18648 | JefOrm@QZ-ZGWEB04 |
| latch: row cache objects | 20150813 22:34:34 |
2640 | JefOrm@QZ-ZGAPP03 | 21647 | latch: row cache objects | 20150813 22:36:38 |
3206 | JefOrm@QZ-ZGWEB08 |
| latch: row cache objects | 20150813 22:36:38 |
2029 | JefOrm@QZ-ZGWEB04 | 8615 | latch: row cache objects | 20150813 22:44:14 |
6651 | JefOrm@JQ-ZGWEB08 | 32 | latch: row cache objects | 20150813 22:45:00 |
2848 | JefOrm@JQ-ZGWEB08 | 21403 | latch: row cache objects | 20150813 22:45:19 |
4650 | JefOrm@JQ-ZGWEB04 | 13240 | latch: row cache objects | 20150813 22:47:24 |
21642 | JefOrm@QZ-ZWAPP04 | 13841 | latch: row cache objects | 20150813 22:47:54 |
19220 | JefOrm@JQ-ZGAPP03 | 16853 | latch: row cache objects | 20150813 22:48:00 |
2246 | JefOrm@QZ-ZGAPP03 | 12211 | latch: row cache objects | 20150813 22:48:12 |
1233 | JefOrm@QZ-ZGWEB08 | 4223 | latch: row cache objects | 20150813 22:49:01 |
16405 | JefOrm@JQ-ZGWEB04 |
| latch: row cache objects | 20150813 22:49:41 |
8615 | JefOrm@QZ-ZGWEB04 | 23025 | latch: row cache objects | 20150813 22:54:00 |
9624 | JefOrm@JQ-ZGWEB04 | 11602 | latch: row cache objects | 20150813 22:54:00 |
21233 | JefOrm@QZ-ZGWEB08 |
| latch: row cache objects | 20150813 22:54:07 |
21642 | JefOrm@QZ-ZWAPP04 | 19252 | latch: row cache objects | 20150813 22:54:07 |
21812 | JefOrm@QZ-ZGAPP03 | 23008 | latch: row cache objects | 20150813 22:54:07 |
22841 | JefOrm@QZ-ZGWEB08 | 16405 | latch: row cache objects | 20150813 22:54:07 |
23611 | JefOrm@QZ-ZGAPP03 | 19410 | latch: row cache objects | 20150813 22:54:07 |
16238 | JefOrm@QZ-ZGWEB08 |
| latch: row cache objects | 20150813 22:54:07 |
17033 | JefOrm@QZ-ZGWEB04 | 16238 | latch: row cache objects | 20150813 22:54:07 |
再针对其中某个session进行查询:
SESSION_ID | PROGRAM | BLOCKING_SESSION | EVENT | TO_CHAR(SAMPLE_TIME,'YYYYMMDDH |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) | 24236 | latch: row cache objects | 20150813 22:30:18 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) | 7055 | latch: row cache objects | 20150813 22:54:07 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) | 13656 | latch: row cache objects | 20150813 22:54:15 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) |
| latch: row cache objects | 20150813 22:54:21 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) |
| latch: row cache objects | 20150813 22:54:22 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) | 2029 | latch: row cache objects | 20150813 22:54:24 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) | 2229 | latch: row cache objects | 20150813 22:54:26 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) | 9624 | latch: row cache objects | 20150813 22:54:27 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) | 6806 | latch: row cache objects | 20150813 22:54:28 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) |
| latch: row cache objects | 20150813 22:54:29 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) | 16216 | latch: row cache objects | 20150813 22:54:30 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) |
| latch: row cache objects | 20150813 22:54:32 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) | 14032 | latch: row cache objects | 20150813 22:54:33 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) | 22841 | latch: row cache objects | 20150813 22:54:34 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) | 7031 | latch: row cache objects | 20150813 22:54:35 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) | 11026 | latch: row cache objects | 20150813 22:54:38 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) |
| latch: row cache objects | 20150813 22:54:39 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) |
| latch: row cache objects | 20150813 22:54:40 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) |
| latch: row cache objects | 20150813 22:54:41 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) |
| latch: row cache objects | 20150813 22:54:43 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) | 1003 | latch: row cache objects | 20150813 22:54:44 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) |
| latch: row cache objects | 20150813 22:54:46 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) |
| latch: row cache objects | 20150813 22:54:47 |
17844 | ftpclient@QZ-ZGAPP04 (TNS V1-V3) | 3225 | latch: row cache objects | 20150813 22:54:49 |
以上查询表眀” latch: row cache objects”不是由某个或某几个session所引起的,所以需要从系统层面去分析
从第一部分摘出的SQL Text发现应用程序使用ob60这个用户连接数据库,但是操作的却都是其它schema下的表,例如select t.* from BD.RS_TYPE_DEF t where t.ITEM_ID=:1这条语句能解析的前提是ob60用户对BD用户下的表具有select权限,这个权限可以通过对象级别的赋权(grant select on XXX to ob60),亦可以通过赋予ob60 用户select any table这一系统权限来实现。现实中ob60确实被赋予了SELECT ANY TABLE权限
SQL> select * from dba_sys_privs where grantee='OB60' and privilege='SELECT ANY TABLE';
GRANTEE PRIVILEGE ADM
------------------------------ ---------------------------------------- ---
OB60 SELECT ANY TABLE NO
MOS 1639293.1里对这个问题进行了较为详细的解析,使用select any table系统权限的情况下,每次执行sql时都要到dc里判断一下用户是否有权限执行,select any table权限无法缓冲到shared pool里实现共享和复用。建议的解决方法是采用对象级别的赋权,或者减少parses次数。前一种方法可行性较差,成千上万张表需要一张一张的赋权,如果遇到新增的表还要另外赋权。大多数情况我们只能避免第二种情况,在我的case里就是因为有批量作业的运行使解析次数大幅上升所以才能比较明显的观察到” latch: row cache objects”等待,正常负荷的情况下” ”latch: row cache objects”等待及其短暂一般很难被观察到。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/53956/viewspace-1783814/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/53956/viewspace-1783814/