http://www.dbsnake.net/hard-and-soft-parse-latch-analysis.html#comment-359
硬解析和软解析情况下latch持有情况分析
为了方便阐述,这篇文章里我用了如下一些术语:
1、用“library cache lock”代表名为library cache lock的enqueue;
2、用“library cache pin”代表名为library cache pin的enqueue;
3、用“Child library cache”代表名为library cache的子latch;
4、用“Child shared pool”代表名为shared pool的子latch;
5、用“Child library cache lock”代表名为library cache lock的子latch;
6、用“Child library cache pin”代表名为library cache pin的子latch;
你可能需要先阅读如下的三篇文章才能看懂我在说什么:
shared pool latch和library cache latch
library cache lock和library cache pin到底是什么
library cache lock和library cache pin到底是什么(续)
另外需要注意的是:从11gR1开始,Oracle用mutex替换了各种跟library cache相关的latch,所以这篇文章的结论并不适用于11g。
我们首先来看library cache的经典架构:
其中我们可以看到在每一个library cache object handle上都存在4个链表,它们分别是lock owners,lock waiters,Pin owners,Pin waiters。所以实际上library cache object handle你可以把它看成是library cache lock和library cache pin所保护的resource。而向lock owners和lock waiters里添加/修改一个library cache lock时是需要先持有Child library cache lock;向Pin owners和Pin waiters里添加/修改一个library cache pin时是需要先持有Child library cache pin。
无论是硬解析还是软解析,在扫描/修改library cache object handle链表的时候是需要先持有Child library cache;另外,硬解析还涉及到要在shared pool里分配内存,所以硬解析还会持有Child shared pool。
好了,我们来分6步进一步细化和扩展上述结论,这6步考察latch持有情况的思路均为:同时开3个session,session 1中先持有所有的相关子latch;session 2在session 1持有完所有的相关子latch后去执行硬解析/软解析,这个时候session 2一定会hang住;同时我们在session 3中去做session 2的process dump,从session 3的dump文件中即可看到session 2中的latch持有与等待的情况。
好了,我们开始。
第一步:先来证明硬解析会在持有Child library cache且并不释放的情况下转而去持有Child shared pool:
先分别到session 1、session 2、session 3里做一下准备工作:
Session 1:
SQL> select name,value from all_parameters where name=’_kghdsidx_count’;
NAME VALUE
——————————————– —————————————————————-
_kghdsidx_count 1
SQL> select name,level#,child#,addr,gets from v$latch_children where name like ‘library cache%’ or name=’shared pool’;
NAME LEVEL# CHILD# ADDR GETS
————————————————– ———- ———- ——– ———-
shared pool 7 1 04E5D308 783011
shared pool 7 2 04E5D370 26
shared pool 7 3 04E5D3D8 26
shared pool 7 4 04E5D440 26
shared pool 7 5 04E5D4A8 26
shared pool 7 6 04E5D510 26
shared pool 7 7 04E5D578 26
library cache 5 1 3213DE80 406403
library cache 5 2 3213DE18 322236
library cache 5 3 3213DDB0 96778
library cache lock 6 1 3213E0F0 107337
library cache lock 6 2 3213E088 86643
library cache lock 6 3 3213E020 28278
library cache pin 6 1 3213DFB8 204881
library cache pin 6 2 3213DF50 170659
library cache pin 6 3 3213DEE8 51722
library cache pin allocation 3 1 3213E228 561
library cache pin allocation 3 2 3213E1C0 38
library cache pin allocation 3 3 3213E158 206
library cache lock allocation 3 1 3213E360 722
library cache lock allocation 3 2 3213E2F8 46
library cache lock allocation 3 3 3213E290 229
library cache hash chains 9 1 3213E498 0
library cache hash chains 9 2 3213E430 0
library cache hash chains 9 3 3213E3C8 0
25 rows selected
从上述结果可以看到,Child library cache的level为5,Child library cache lock和Child library cache pin的level为6,Child shared pool的level为7,所以已经具备了在不释放Child library cache的情况下进而去持有Child shared pool的条件。
SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0×00000001′ from v$latch_children where name=’shared pool’ and child#=1;
‘ORADEBUGPOKE0X’||ADDR||’40X00
————————————-
oradebug poke 0x04E5D308 4 0×00000001
因为要测硬解析,这里我flush一下shared pool:
SQL> alter system flush shared_pool;
系统已更改。
Session 2:
SQL> select sid from v$mystat where rownum=1;
SID
———-
137
SQL> select spid from v$process where addr in (select paddr from v$session where sid=138);
SPID
————
3952
Session 3:
SQL> oradebug setospid 3952
Oracle pid: 24, Windows thread id: 3952, image: ORACLE.EXE (SHAD)
准备工作已经做完,我们现在去session 1里手工持有上述Child shared pool:
Session 1:
SQL> oradebug setmypid
已处理的语句
SQL> oradebug poke 0x04E5D308 4 0×00000001
BEFORE: [4E5D308, 4E5D30C) = 00000000
AFTER: [4E5D308, 4E5D30C) = 00000001
然后到session 2中去随便执行一条sql,因为刚才在session 1里面flush了shared pool,所以这里一定是硬解析,并且这里一定会hang住:
Session 2:
SQL> select * from scott.emp;
……这里hang住了
接着我们到session 3中去对session 2做一下process dump:
Session 3:
SQL> oradebug dump processstate 10
已处理的语句
SQL> oradebug tracefile_name
d:\oracle\admin\cuihua\udump\cuihua_ora_3952.trc
d:\oracle\admin\cuihua\udump\cuihua_ora_3952.trc的内容:
Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_3952.trc
Sat May 05 01:13:57 2012
ORACLE V10.2.0.1.0 - Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Windows XP Version V5.1 Service Pack 2
CPU : 2 - type 586
Process Affinity : 0x00000000
Memory (Avail/Total): Ph:1005M/2030M, Ph+PgF:2333M/3921M, VA:1273M/2047M
Instance name: cuihua
Redo thread mounted by this instance: 1
Oracle process number: 24
Windows thread id: 3952, image: ORACLE.EXE (SHAD)
*** 2012-05-05 01:13:57.468
*** ACTION NAME:(Command Window - New) 2012-05-05 01:13:57.406
*** MODULE NAME:(PL/SQL Developer) 2012-05-05 01:13:57.406
*** SERVICE NAME:(cuihua) 2012-05-05 01:13:57.406
*** SESSION ID:(137.75) 2012-05-05 01:13:57.406
Received ORADEBUG command 'dump processstate 10' from process Windows thread id: 2924, image:
===================================================
PROCESS STATE
-------------
Process global information:
process: 33E4FB7C, call: 33F606A8, xact: 00000000, curses: 33F254F4, usrses: 33F254F4
----------------------------------------
SO: 33E4FB7C, type: 2, owner: 00000000, flag: INIT/-/-/0x00
(process) Oracle pid=24, calls cur/top: 33F606A8/33F606A8, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=20
Location from where call was made: kghupr1: Chunk Header
Context saved from call: 815704132
waiting for 4e5d308 Child shared pool level=7 child#=1
Location from where latch is held: kghupr1: Chunk Header
Context saved from call: 737738784
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
10 (33, 1336151634, 33)
20 (33, 1336151634, 33)
24 (15, 1336151634, 15)
waiter count=3
gotten 857154 times wait, failed first 316 sleeps 24
gotten 0 times nowait, failed: 0
on wait list for 4e5d308
holding (efd=5) 3213de18 Child library cache level=5 child#=2
Location from where latch is held: kglpndl: child: before processing: latch
Context saved from call: 1
state=busy, wlstate=free
Process Group: DEFAULT, pseudo proc: 33E7F368
O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 3952
OSD pid info: Windows thread id: 3952, image: ORACLE.EXE (SHAD)
……省略显示其他内容
注意上述trace文件中有如下内容:
waiting for 4e5d308 Child shared pool level=7 child#=1
holding (efd=5) 3213de18 Child library cache level=5 child#=2
所以我们已经证明了硬解析会在持有Child library cache且并不释放的情况下转而去持有Child shared pool。
第二步:我们来证明硬解析会去持有Child library cache lock,但是是在释放了Child library cache的情况下,也就是说硬解析的时候不会同时持有Child library cache和Child library cache lock:
先在session 1里手工持有所有的Child library cache lock:
Session 1:
SQL> alter system flush shared_pool;
系统已更改。
SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0×00000001′ from v$latch_children where name=’library cache lock’;
‘ORADEBUGPOKE0X’||ADDR||’40X00
————————————-
oradebug poke 0x3213E0F0 4 0×00000001
oradebug poke 0x3213E088 4 0×00000001
oradebug poke 0x3213E020 4 0×00000001
SQL> oradebug poke 0x3213E0F0 4 0×00000001
BEFORE: [3213E0F0, 3213E0F4) = 00000000
AFTER: [3213E0F0, 3213E0F4) = 00000001
SQL> oradebug poke 0x3213E088 4 0x00000001
BEFORE: [3213E088, 3213E08C) = 00000000
AFTER: [3213E088, 3213E08C) = 00000001
SQL> oradebug poke 0x3213E020 4 0x00000001
BEFORE: [3213E020, 3213E024) = 00000000
AFTER: [3213E020, 3213E024) = 00000001
然后到session 2中去随便执行一条sql,因为刚才在session 1里面flush了shared pool,所以这里一定是硬解析,并且这里一定会hang住:
Session 2:
SQL> select * from scott.emp;
……这里hang住了
接着我们到session 3中去对session 2做一下process dump:
Session 3:
SQL> oradebug dump processstate 10
已处理的语句
SQL> oradebug tracefile_name
d:\oracle\admin\cuihua\udump\cuihua_ora_3140.trc
d:\oracle\admin\cuihua\udump\cuihua_ora_3140.trc的内容:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Windows XP Version V5.1 Service Pack 2
CPU : 2 - type 586
Process Affinity : 0x00000000
Memory (Avail/Total): Ph:1142M/2030M, Ph+PgF:2355M/3921M, VA:1345M/2047M
Instance name: cuihua
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
Windows thread id: 3140, image: ORACLE.EXE (SHAD)
KCCDEBUG_LEVEL = 0
Dynamic strand is set to TRUE
Running with 2 shared and 18 private strand(s). Zero-copy redo is FALSE
Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_3140.trc
Sat May 05 01:26:29 2012
ORACLE V10.2.0.1.0 - Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Windows XP Version V5.1 Service Pack 2
CPU : 2 - type 586
Process Affinity : 0x00000000
Memory (Avail/Total): Ph:969M/2030M, Ph+PgF:2332M/3921M, VA:1267M/2047M
Instance name: cuihua
Redo thread mounted by this instance: 1
Oracle process number: 31
Windows thread id: 3140, image: ORACLE.EXE (SHAD)
*** 2012-05-05 01:26:29.609
*** ACTION NAME:(Command Window - New) 2012-05-05 01:26:29.562
*** MODULE NAME:(PL/SQL Developer) 2012-05-05 01:26:29.562
*** SERVICE NAME:(cuihua) 2012-05-05 01:26:29.562
*** SESSION ID:(153.27) 2012-05-05 01:26:29.562
Received ORADEBUG command 'dump processstate 10' from process Windows thread id: 2844, image:
===================================================
PROCESS STATE
-------------
Process global information:
process: 33E5250C, call: 31A8ACBC, xact: 00000000, curses: 33F38174, usrses: 33F38174
----------------------------------------
SO: 33E5250C, type: 2, owner: 00000000, flag: INIT/-/-/0x00
(process) Oracle pid=31, calls cur/top: 31A8ACBC/31A8ACBC, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Location from where call was made: kgllkdl: child: cleanup: latch
Context saved from call: 1
waiting for 3213e088 Child library cache lock level=6 child#=2
Location from where latch is held: kgllkdl: child: cleanup: latch
Context saved from call: 1
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
10 (15, 1336152387, 15)
31 (3, 1336152387, 3)
waiter count=2
gotten 135516 times wait, failed first 6 sleeps 5
gotten 0 times nowait, failed: 0
on wait list for 3213e088
Process Group: DEFAULT, pseudo proc: 33E7F368
O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 3140
OSD pid info: Windows thread id: 3140, image: ORACLE.EXE (SHAD)
……省略显示其他内容
注意上述trace文件中有如下内容:
waiting for 3213e088 Child library cache lock level=6 child#=2
但注意这里没有holding Child library cache!所以我们已经证明了硬解析会持有Child library cache lock,但是是在释放了Child library cache的情况下,也就是说硬解析的时候不会同时持有Child library cache和Child library cache lock。
第三步:我们来证明硬解析会去持有Child library cache pin,但是是在释放了Child library cache的情况下,也就是说硬解析的时候不会同时持有Child library cache和Child library cache pin:
证明步骤和思路与第二步一模一样,这里不再赘述,我们直接来看相关的process dump的trace文件:
d:\oracle\admin\cuihua\udump\cuihua_ora_2444.trc的内容:
Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_2444.trc
Sat May 05 01:34:27 2012
ORACLE V10.2.0.1.0 – Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – Production
With the Partitioning, OLAP and Data Mining options
Windows XP Version V5.1 Service Pack 2
CPU : 2 – type 586
Process Affinity : 0×00000000
Memory (Avail/Total): Ph:963M/2030M, Ph+PgF:2334M/3921M, VA:1269M/2047M
Instance name: cuihua
Redo thread mounted by this instance: 1
Oracle process number: 30
Windows thread id: 2444, image: ORACLE.EXE (SHAD)
*** 2012-05-05 01:34:27.609
*** ACTION NAME:(Command Window – New) 2012-05-05 01:34:27.546
*** MODULE NAME:(PL/SQL Developer) 2012-05-05 01:34:27.546
*** SERVICE NAME:(cuihua) 2012-05-05 01:34:27.546
*** SESSION ID:(130.15) 2012-05-05 01:34:27.546
Received ORADEBUG command ‘dump processstate 10′ from process Windows thread id: 800, image:
===================================================
PROCESS STATE
————-
Process global information:
process: 33E51F1C, call: 317DD38C, xact: 00000000, curses: 33F1D17C, usrses: 33F1D17C
—————————————-
SO: 33E51F1C, type: 2, owner: 00000000, flag: INIT/-/-/0×00
(process) Oracle pid=30, calls cur/top: 317DD38C/317DD38C, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Location from where call was made: kglpnal: child: alloc space: latch
Context saved from call: 1
waiting for 3213df50 Child library cache pin level=6 child#=2
Location from where latch is held: kglpndl: child
Context saved from call: 1
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
10 (49, 1336152864, 49)
30 (24, 1336152864, 24)
waiter count=2
gotten 348654 times wait, failed first 2 sleeps 1
gotten 0 times nowait, failed: 0
on wait list for 3213df50
Process Group: DEFAULT, pseudo proc: 33E7F368
O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 2444
OSD pid info: Windows thread id: 2444, image: ORACLE.EXE (SHAD)
……省略显示其他内容
注意上述trace文件中有如下内容:
waiting for 3213df50 Child library cache pin level=6 child#=2
但注意这里没有holding Child library cache!所以我们已经证明了硬解析会持有Child library cache pin,但是是在释放了Child library cache的情况下,也就是说硬解析的时候不会同时持有Child library cache和Child library cache pin。
第四步:我们来证明软解析也会去持有Child library cache,所以过度的软解析实际上也会导致Child library cache的争用:
先到session 1、session 2和session 3中去做一下准备工作:
Session 1:
SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0×00000001′ from v$latch_children where name=’library cache’;
‘ORADEBUGPOKE0X’||ADDR||’40X00
————————————-
oradebug poke 0x3213DE80 4 0×00000001
oradebug poke 0x3213DE18 4 0×00000001
oradebug poke 0x3213DDB0 4 0×00000001
Session 2:
SQL> show parameter session_cache;
NAME TYPE VALUE
———————————— ———– ——————————
session_cached_cursors integer 20
这里session_cached_cursors的值大于0,所以这里执行同样的sql三次以上的时候才会是软软解析,如果同样的sql的执行次数大于一次但不超过三次,则是软解析。
SQL> select * from scott.emp;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
—– ———- ——— —– ———– ——— ——— ——
7981 CUIHUA 7981
7369 SMITH CLERK 7902 1980-12-17 800.00 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30
7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30
7566 JONES MANAGER 7839 1981-4-2 2975.00 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30
7782 CLARK MANAGER 7839 1981-6-9 2450.00 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.00 20
7839 KING PRESIDENT 1981-11-17 5000.00 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.00 0.00 30
7876 ADAMS CLERK 7788 1987-5-23 1100.00 20
7800 JAME4 CLERK 7698 1981-12-3 950.00 30
13 rows selected
注意这里子cursor和执行次数均为1:
SQL> select sql_text, SQL_ID, VERSION_COUNT,executions from v$sqlarea where sql_text like ‘select * from scott.emp%’;
SQL_TEXT SQL_ID VERSION_COUNT EXECUTIONS
——————————————————————————– ————- ————- ———-
select * from scott.emp dzk32w36udhc5 1 1
SQL> select PLAN_HASH_VALUE, CHILD_NUMBER from v$sql where sql_id=’dzk32w36udhc5′;
PLAN_HASH_VALUE CHILD_NUMBER
————— ————
3956160932 0
SQL> select sid from v$mystat where rownum=1;
SID
———-
134
SQL> select spid from v$process where addr in (select paddr from v$session where sid=134);
SPID
————
2740
Session 3:
SQL> oradebug setospid 2740
Oracle pid: 30, Windows thread id: 2740, image: ORACLE.EXE (SHAD)
接着我们回到session 1,去手工持有所有的Child library cache:
Session 1:
SQL> oradebug setmypid
已处理的语句
SQL> oradebug poke 0x3213DE80 4 0×00000001
BEFORE: [3213DE80, 3213DE84) = 00000000
AFTER: [3213DE80, 3213DE84) = 00000001
SQL> oradebug poke 0x3213DE18 4 0x00000001
BEFORE: [3213DE18, 3213DE1C) = 00000000
AFTER: [3213DE18, 3213DE1C) = 00000001
SQL> oradebug poke 0x3213DDB0 4 0x00000001
BEFORE: [3213DDB0, 3213DDB4) = 00000000
AFTER: [3213DDB0, 3213DDB4) = 00000001
接着到session 2去执行一下我们刚才已经在session 1中执行过的sql,这里也会hang住:
Session 2:
SQL> select * from scott.emp;
……这里hang住了
接着我们到session 3中去对session 2做一下process dump:
Session 3:
SQL> oradebug dump processstate 10
已处理的语句
SQL> oradebug tracefile_name
d:\oracle\admin\cuihua\udump\cuihua_ora_2740.trc
d:\oracle\admin\cuihua\udump\cuihua_ora_2740.trc的内容:
Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_2740.trc
Thu Jan 28 09:08:16 2010
ORACLE V10.2.0.1.0 - Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options
Windows XP Version V5.1 Service Pack 2
CPU : 2 - type 586
Process Affinity : 0x00000000
Memory (Avail/Total): Ph:1387M/2030M, Ph+PgF:2882M/3921M, VA:1272M/2047M
Instance name: cuihua
Redo thread mounted by this instance: 1
Oracle process number: 24
Windows thread id: 2740, image: ORACLE.EXE (SHAD)
*** SERVICE NAME:(cuihua) 2010-01-28 09:08:16.375
*** SESSION ID:(140.4) 2010-01-28 09:08:16.375
Memory Notification: Library Cache Object loaded into SGA
Heap size 2210K exceeds notification threshold (2048K)
LIBRARY OBJECT HANDLE: handle=33872c70 mutex=33872D24(0)
name=XDB.XDbD/PLZ01TcHgNAgAIIegtw==
hash=e0f82b0c545a707da230a62675d34c80 timestamp=08-30-2005 14:28:09
namespace=XDBS flags=KGHP/TIM/SML/[02000000]
kkkk-dddd-llll=0000-0000-0000 lock=S pin=S latch#=2 hpc=0002 hlc=0002
lwt=33872CCC[33872CCC,33872CCC] ltm=33872CD4[33872CD4,33872CD4]
pwt=33872CB0[33872CB0,33872CB0] ptm=33872CB8[33872CB8,33872CB8]
ref=33872CEC[33872CEC,33872CEC] lnd=33872CF8[33872CF8,33872CF8]
LIBRARY OBJECT: object=303c0ba0
type=XDBS flags=EXS/LOC[0005] pflags=[0000] status=VALD load=X
DATA BLOCKS:
data# heap pointer status pins change whr alloc(K) size(K)
—– ——– ——– ——— —- —— — ——– ——–
0 33872c00 303c0c5c I/P/A/-/- 0 NONE 00 0.34 0.00
1 303c0c98 3038daa8 I/P/A/-/- 1 NONE 00 2210.98 2224.10
Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_2740.trc
Sat May 05 08:48:42 2012
ORACLE V10.2.0.1.0 – Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – Production
With the Partitioning, OLAP and Data Mining options
Windows XP Version V5.1 Service Pack 2
CPU : 2 – type 586
Process Affinity : 0×00000000
Memory (Avail/Total): Ph:1290M/2030M, Ph+PgF:2663M/3921M, VA:1268M/2047M
Instance name: cuihua
Redo thread mounted by this instance: 1
Oracle process number: 30
Windows thread id: 2740, image: ORACLE.EXE (SHAD)
*** 2012-05-05 08:48:42.781
*** ACTION NAME:(Command Window – New) 2012-05-05 08:48:42.687
*** MODULE NAME:(PL/SQL Developer) 2012-05-05 08:48:42.687
*** SERVICE NAME:(cuihua) 2012-05-05 08:48:42.687
*** SESSION ID:(134.7) 2012-05-05 08:48:42.687
Received ORADEBUG command ‘dump processstate 10′ from process Windows thread id: 400, image:
===================================================
PROCESS STATE
————-
Process global information:
process: 33E51F1C, call: 317C80A0, xact: 00000000, curses: 33F21C9C, usrses: 33F21C9C
—————————————-
SO: 33E51F1C, type: 2, owner: 00000000, flag: INIT/-/-/0×00
(process) Oracle pid=30, calls cur/top: 317C80A0/317C80A0, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Location from where call was made: kglpin: child: heap processing: latch
Context saved from call: 1
waiting for 3213de18 Child library cache level=5 child#=2
Location from where latch is held: kglpndl: child: after processing: latch
Context saved from call: 1
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
30 (39, 1336178921, 39)
waiter count=1
gotten 179887 times wait, failed first 15 sleeps 1
gotten 206 times nowait, failed: 0
on wait list for 3213de18
Process Group: DEFAULT, pseudo proc: 33E7F368
O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 2740
OSD pid info: Windows thread id: 2740, image: ORACLE.EXE (SHAD)
……省略显示其他内容
注意上述trace文件中有如下内容:
waiting for 3213de18 Child library cache level=5 child#=2
所以我们已经证明了软解析也会去持有Child library cache(这个其实很正常,因为软解析还是要扫描library cache object handle链表的嘛),即过度的软解析实际上也会导致Child library cache的争用。
手工释放上述Child library cache后回到session 2,刚才hang住的sql已恢复正常:
SQL> select * from scott.emp;
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
—– ———- ——— —– ———– ——— ——— ——
7981 CUIHUA 7981
7369 SMITH CLERK 7902 1980-12-17 800.00 20
7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30
7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30
7566 JONES MANAGER 7839 1981-4-2 2975.00 20
7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30
7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30
7782 CLARK MANAGER 7839 1981-6-9 2450.00 10
7788 SCOTT ANALYST 7566 1987-4-19 3000.00 20
7839 KING PRESIDENT 1981-11-17 5000.00 10
7844 TURNER SALESMAN 7698 1981-9-8 1500.00 0.00 30
7876 ADAMS CLERK 7788 1987-5-23 1100.00 20
7800 JAME4 CLERK 7698 1981-12-3 950.00 30
13 rows selected
从如下结果可以看到,子cursor的数量还是1,执行次数变成了2,即刚才第二次执行的sql毫无疑问是软解析:
SQL> select sql_text, SQL_ID, VERSION_COUNT,executions from v$sqlarea where sql_text like ‘select * from scott.emp%’;
SQL_TEXT SQL_ID VERSION_COUNT EXECUTIONS
——————————————————————————– ————- ————- ———-
select * from scott.emp dzk32w36udhc5 1 2
第五步:我们来证明软解析还是会去持有Child library cache lock,但是是在释放了Child library cache的情况下,也就是说软解析会去持有Child library cache lock,但不会同时持有Child library cache和Child library cache lock:
证明步骤和思路与第二步一模一样,这里不再赘述,我们直接来看相关的process dump的trace文件:
d:\oracle\admin\cuihua\udump\cuihua_ora_2988.trc的内容:
Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_2988.trc
Sat May 05 01:50:36 2012
ORACLE V10.2.0.1.0 – Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – Production
With the Partitioning, OLAP and Data Mining options
Windows XP Version V5.1 Service Pack 2
CPU : 2 – type 586
Process Affinity : 0×00000000
Memory (Avail/Total): Ph:1031M/2030M, Ph+PgF:2330M/3921M, VA:1269M/2047M
Instance name: cuihua
Redo thread mounted by this instance: 1
Oracle process number: 24
Windows thread id: 2988, image: ORACLE.EXE (SHAD)
*** 2012-05-05 01:50:36.531
*** ACTION NAME:(Command Window – New) 2012-05-05 01:50:36.468
*** MODULE NAME:(PL/SQL Developer) 2012-05-05 01:50:36.468
*** SERVICE NAME:(cuihua) 2012-05-05 01:50:36.468
*** SESSION ID:(132.36) 2012-05-05 01:50:36.468
Received ORADEBUG command ‘dump processstate 10′ from process Windows thread id: 972, image:
===================================================
PROCESS STATE
————-
Process global information:
process: 33E4FB7C, call: 33F61154, xact: 00000000, curses: 33F1F70C, usrses: 33F1F70C
—————————————-
SO: 33E4FB7C, type: 2, owner: 00000000, flag: INIT/-/-/0×00
(process) Oracle pid=24, calls cur/top: 33F61154/33F61154, flag: (0) -
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Location from where call was made: kgllkdl: child: cleanup: latch
Context saved from call: 2
waiting for 3213e0f0 Child library cache lock level=6 child#=1
Location from where latch is held: kgllkdl: child: cleanup: latch
Context saved from call: 2
state=busy, wlstate=free
waiters [orapid (seconds since: put on list, posted, alive check)]:
25 (123, 1336153833, 123)
27 (123, 1336153833, 123)
11 (120, 1336153833, 120)
20 (114, 1336153833, 114)
19 (111, 1336153833, 111)
24 (39, 1336153833, 39)
waiter count=6
gotten 261770 times wait, failed first 74 sleeps 22
gotten 0 times nowait, failed: 0
on wait list for 3213e0f0
Process Group: DEFAULT, pseudo proc: 33E7F368
O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 2988
OSD pid info: Windows thread id: 2988, image: ORACLE.EXE (SHAD)
……省略显示其他内容
注意上述trace文件中有如下内容:
waiting for 3213e0f0 Child library cache lock level=6 child#=1
但注意这里没有holding Child library cache!所以我们已经证明了软解析依然会持有Child library cache lock,但是是在释放了Child library cache的情况下,也就是说软解析会去持有Child library cache lock,但不会同时持有Child library cache和Child library cache lock。
第六步:我们来证明软解析还是会去持有Child library cache pin,但是是在释放了Child library cache的情况下,也就是说软解析会去持有Child library cache pin,但不会同时持有Child library cache和Child library cache pin:
证明步骤和思路与第五步一模一样,这里不再赘述。
结合我文章开头提到的那三篇文章(尤其是大熊的那篇文章)和上面的六步证明过程,我们得出如下关于硬解析和软解析时latch持有情况的结论(这个结论是近似的,并没有包括所有的情况):
1、SQL在硬解析的时候,会先去持有Child library cache以扫描library cache object handle链表,接着会释放Child library cache(因为找不到相关的library cache object handle),接下来会再次持有Child library cache(因为要修改library cache object handle链表),然后在不释放Child library cache的情况下进而去持有Child shared pool以分配内存,接着释放Child shared pool(因为已经分配完了),接着再释放Child library cache;然后因为要以null模式获得library cache lock,所以要先持有Child library cache lock;接着要把相关sql的执行计划load到对应的子cursor的library cache object handle中,即要以exclusive模式获得library cache pin,所以要先持有Child library cache pin;
2、SQL在软解析的时候和硬解析类似,只不过不需要持有Child shared pool并且持有Child library cache的时间会更短。
最后我们来说一下硬解析和软解析在latch争用情况下的本质区别:
1、软解析不会导致Child shared pool的争用;
2、硬解析跟软解析比会导致更为剧烈的Child library cache的争用,这个当然是因为硬解析在不释放Child library cache的情况下进而会去持有Child shared pool。
测试的真仔细!
关于硬分析持有的latch,我看大部分的书都是这样描述:
硬分析应该是先持有SQL handle的library cache lock ,然后在生成计划的过程中,对SQLhandle的LCO以exclusive 方式持有library cache pin.
但是楼主的说法,好像恰好反过来了,请楼主能否进一步解释下2者持有的顺序,lock和pin的对象分别是针对的对象是谁?
我后一篇文章会解释为什么是library cache pin在前,library cache lock在后。
ok,等待您的大作!
这个你是对的,应该是先持有library cache lock latch,再持有library cache pin latch,我已经改过来了。
测试非常详细,看完后加深了对软解析、硬解析的理解。
你好,goog job!
我在重现以下过程:
第三步:我们来证明硬解析会去持有Child library cache pin,但是是在释放了Child library cache的情况下,也就是说硬解析的时候不会同时持有Child library cache和Child library cache pin
时出现了一个问题,向你请教。
环境:windows 32
oracle:10.2.0.1
准备信息如下:
session1:
SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0×00000001′ from v$latch_children where name=’library cache pin’;
‘ORADEBUGPOKE0X’||ADDR||’40X00000001′
————————————————
oradebug poke 0x240BDF50 4 0×00000001
oradebug poke 0x240BDEE8 4 0×00000001
oradebug poke 0x240BDE80 4 0×00000001
SQL> alter system flush shared_pool;
系统已更改。
SQL> oradebug setmypid
已处理的语句
session2的准备信息如下:
SQL> select sid from v$mystat where rownum=1;
SID
———-
148
SQL> select spid from v$process where addr in (select paddr from v$session where sid=148);
SPID
————————
4072
session3准备信息如下:
SQL> oradebug setospid 4072
Oracle pid: 19, Windows thread id: 4072, image: ORACLE.EXE (SHAD)
接着
在session1中poke了library cache pin
SQL> oradebug poke 0x240BDF50 4 0×00000001
BEFORE: [240BDF50, 240BDF54) = 00000000
AFTER: [240BDF50, 240BDF54) = 00000001
SQL> oradebug poke 0x240BDEE8 4 0×00000001
BEFORE: [240BDEE8, 240BDEEC) = 00000000
AFTER: [240BDEE8, 240BDEEC) = 00000001
SQL> oradebug poke 0x240BDE80 4 0×00000001
BEFORE: [240BDE80, 240BDE84) = 00000000
AFTER: [240BDE80, 240BDE84) = 00000001
session2中执行以下查询,被hang住:
SQL> select * from scott.emp;
select * from scott.emp
hang住
session3中dump process信息,此时出现了如下错误:
SQL> oradebug dump processstate 10
ORA-00072: 进程 “Windows thread id: 4072, image: ORACLE.EXE (SHAD)” 不活动
同时,发现session2中有如下提示错误:
select * from scott.emp
*
第 1 行出现错误:
ORA-03113: 通信通道的文件结束
请问这是如何导致的,经过测试library cache lock的对的,但这个library cache pin的测试做了很多次,一直都是上面的错误信息,是我测试的哪一步出了问题呢?
还是同样的测试,刚特意在centos 5.7 64bit, oracle 10.2.0.1 64bit的环境上进行,就没有这个问题,也就是说在linux上的测试结果和你的是一样的,可是在windows上,试了很多次,一直提示如上的错误,问题可能出在什么地方呢?