（转载）硬解析和软解析情况下latch持有情况分析

最新推荐文章于 2021-05-04 13:08:36 发布

dd334566788566

最新推荐文章于 2021-05-04 13:08:36 发布

阅读量182

点赞数

文章标签：数据库数据结构与算法

原文链接：http://www.cnblogs.com/taowang2016/archive/2013/02/25/2932069.html

版权

http://www.dbsnake.net/hard-and-soft-parse-latch-analysis.html#comment-359

硬解析和软解析情况下latch持有情况分析

Posted: May 5, 2012 | Author: Cui Hua | Filed under: Oracle | 8 Comments »

为了方便阐述，这篇文章里我用了如下一些术语：

1、用“library cache lock”代表名为library cache lock的enqueue；

2、用“library cache pin”代表名为library cache pin的enqueue；

3、用“Child library cache”代表名为library cache的子latch；

4、用“Child shared pool”代表名为shared pool的子latch；

5、用“Child library cache lock”代表名为library cache lock的子latch；

6、用“Child library cache pin”代表名为library cache pin的子latch；

你可能需要先阅读如下的三篇文章才能看懂我在说什么：

shared pool latch和library cache latch

library cache lock和library cache pin到底是什么

library cache lock和library cache pin到底是什么（续）

另外需要注意的是：从11gR1开始，Oracle用mutex替换了各种跟library cache相关的latch，所以这篇文章的结论并不适用于11g。

我们首先来看library cache的经典架构：

library_cache_architecture

其中我们可以看到在每一个library cache object handle上都存在4个链表，它们分别是lock owners，lock waiters，Pin owners，Pin waiters。所以实际上library cache object handle你可以把它看成是library cache lock和library cache pin所保护的resource。而向lock owners和lock waiters里添加/修改一个library cache lock时是需要先持有Child library cache lock；向Pin owners和Pin waiters里添加/修改一个library cache pin时是需要先持有Child library cache pin。

无论是硬解析还是软解析，在扫描/修改library cache object handle链表的时候是需要先持有Child library cache；另外，硬解析还涉及到要在shared pool里分配内存，所以硬解析还会持有Child shared pool。

好了，我们来分6步进一步细化和扩展上述结论，这6步考察latch持有情况的思路均为：同时开3个session，session 1中先持有所有的相关子latch；session 2在session 1持有完所有的相关子latch后去执行硬解析/软解析，这个时候session 2一定会hang住；同时我们在session 3中去做session 2的process dump，从session 3的dump文件中即可看到session 2中的latch持有与等待的情况。

好了，我们开始。

第一步：先来证明硬解析会在持有Child library cache且并不释放的情况下转而去持有Child shared pool：

先分别到session 1、session 2、session 3里做一下准备工作：

Session 1:

SQL> select name,value from all_parameters where name=’_kghdsidx_count’;

NAME VALUE

——————————————– —————————————————————-

_kghdsidx_count 1

SQL> select name,level#,child#,addr,gets from v$latch_children where name like ‘library cache%’ or name=’shared pool’;

NAME LEVEL# CHILD# ADDR GETS

————————————————– ———- ———- ——– ———-

shared pool 7 1 04E5D308 783011

shared pool 7 2 04E5D370 26

shared pool 7 3 04E5D3D8 26

shared pool 7 4 04E5D440 26

shared pool 7 5 04E5D4A8 26

shared pool 7 6 04E5D510 26

shared pool 7 7 04E5D578 26

library cache 5 1 3213DE80 406403

library cache 5 2 3213DE18 322236

library cache 5 3 3213DDB0 96778

library cache lock 6 1 3213E0F0 107337

library cache lock 6 2 3213E088 86643

library cache lock 6 3 3213E020 28278

library cache pin 6 1 3213DFB8 204881

library cache pin 6 2 3213DF50 170659

library cache pin 6 3 3213DEE8 51722

library cache pin allocation 3 1 3213E228 561

library cache pin allocation 3 2 3213E1C0 38

library cache pin allocation 3 3 3213E158 206

library cache lock allocation 3 1 3213E360 722

library cache lock allocation 3 2 3213E2F8 46

library cache lock allocation 3 3 3213E290 229

library cache hash chains 9 1 3213E498 0

library cache hash chains 9 2 3213E430 0

library cache hash chains 9 3 3213E3C8 0

25 rows selected

从上述结果可以看到，Child library cache的level为5，Child library cache lock和Child library cache pin的level为6，Child shared pool的level为7，所以已经具备了在不释放Child library cache的情况下进而去持有Child shared pool的条件。

SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0×00000001′ from v$latch_children where name=’shared pool’ and child#=1;

‘ORADEBUGPOKE0X’||ADDR||’40X00

————————————-

oradebug poke 0x04E5D308 4 0×00000001

因为要测硬解析，这里我flush一下shared pool：

SQL> alter system flush shared_pool;

系统已更改。

Session 2:

SQL> select sid from v$mystat where rownum=1;

SID

———-

137

SQL> select spid from v$process where addr in (select paddr from v$session where sid=138);

SPID

————

3952

Session 3:

SQL> oradebug setospid 3952

Oracle pid: 24, Windows thread id: 3952, image: ORACLE.EXE (SHAD)

准备工作已经做完，我们现在去session 1里手工持有上述Child shared pool：

Session 1:

SQL> oradebug setmypid

已处理的语句

SQL> oradebug poke 0x04E5D308 4 0×00000001

BEFORE: [4E5D308, 4E5D30C) = 00000000

AFTER: [4E5D308, 4E5D30C) = 00000001

然后到session 2中去随便执行一条sql，因为刚才在session 1里面flush了shared pool，所以这里一定是硬解析，并且这里一定会hang住：

Session 2：

SQL> select * from scott.emp;

……这里hang住了

接着我们到session 3中去对session 2做一下process dump：

Session 3:

SQL> oradebug dump processstate 10

已处理的语句

SQL> oradebug tracefile_name

d:\oracle\admin\cuihua\udump\cuihua_ora_3952.trc

d:\oracle\admin\cuihua\udump\cuihua_ora_3952.trc的内容：

Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_3952.trc

Sat May 05 01:13:57 2012

ORACLE V10.2.0.1.0 - Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production

With the Partitioning, OLAP and Data Mining options

Windows XP Version V5.1 Service Pack 2

CPU : 2 - type 586

Process Affinity : 0x00000000

Memory (Avail/Total): Ph:1005M/2030M, Ph+PgF:2333M/3921M, VA:1273M/2047M

Instance name: cuihua

Redo thread mounted by this instance: 1

Oracle process number: 24

Windows thread id: 3952, image: ORACLE.EXE (SHAD)

*** 2012-05-05 01:13:57.468

*** ACTION NAME:(Command Window - New) 2012-05-05 01:13:57.406

*** MODULE NAME:(PL/SQL Developer) 2012-05-05 01:13:57.406

*** SERVICE NAME:(cuihua) 2012-05-05 01:13:57.406

*** SESSION ID:(137.75) 2012-05-05 01:13:57.406

Received ORADEBUG command 'dump processstate 10' from process Windows thread id: 2924, image:

===================================================

PROCESS STATE

-------------

Process global information:

process: 33E4FB7C, call: 33F606A8, xact: 00000000, curses: 33F254F4, usrses: 33F254F4

----------------------------------------

SO: 33E4FB7C, type: 2, owner: 00000000, flag: INIT/-/-/0x00

(process) Oracle pid=24, calls cur/top: 33F606A8/33F606A8, flag: (0) -

int error: 0, call error: 0, sess error: 0, txn error 0

(post info) last post received: 0 0 0

last post received-location: No post

last process to post me: none

last post sent: 0 0 0

last post sent-location: No post

last process posted by me: none

(latch info) wait_event=0 bits=20

Location from where call was made: kghupr1: Chunk Header

Context saved from call: 815704132

waiting for 4e5d308 Child shared pool level=7 child#=1

Location from where latch is held: kghupr1: Chunk Header

Context saved from call: 737738784

state=busy, wlstate=free

waiters [orapid (seconds since: put on list, posted, alive check)]:

10 (33, 1336151634, 33)

20 (33, 1336151634, 33)

24 (15, 1336151634, 15)

waiter count=3

gotten 857154 times wait, failed first 316 sleeps 24

gotten 0 times nowait, failed: 0

on wait list for 4e5d308

holding (efd=5) 3213de18 Child library cache level=5 child#=2

Location from where latch is held: kglpndl: child: before processing: latch

Context saved from call: 1

state=busy, wlstate=free

Process Group: DEFAULT, pseudo proc: 33E7F368

O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 3952

OSD pid info: Windows thread id: 3952, image: ORACLE.EXE (SHAD)

……省略显示其他内容

注意上述trace文件中有如下内容：

waiting for 4e5d308 Child shared pool level=7 child#=1

holding (efd=5) 3213de18 Child library cache level=5 child#=2

所以我们已经证明了硬解析会在持有Child library cache且并不释放的情况下转而去持有Child shared pool。

第二步：我们来证明硬解析会去持有Child library cache lock，但是是在释放了Child library cache的情况下，也就是说硬解析的时候不会同时持有Child library cache和Child library cache lock：

先在session 1里手工持有所有的Child library cache lock：

Session 1:

SQL> alter system flush shared_pool;

系统已更改。

SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0×00000001′ from v$latch_children where name=’library cache lock’;

‘ORADEBUGPOKE0X’||ADDR||’40X00

————————————-

oradebug poke 0x3213E0F0 4 0×00000001

oradebug poke 0x3213E088 4 0×00000001

oradebug poke 0x3213E020 4 0×00000001

SQL> oradebug poke 0x3213E0F0 4 0×00000001

BEFORE: [3213E0F0, 3213E0F4) = 00000000

AFTER: [3213E0F0, 3213E0F4) = 00000001

SQL> oradebug poke 0x3213E088 4 0x00000001

BEFORE: [3213E088, 3213E08C) = 00000000

AFTER: [3213E088, 3213E08C) = 00000001

SQL> oradebug poke 0x3213E020 4 0x00000001

BEFORE: [3213E020, 3213E024) = 00000000

AFTER: [3213E020, 3213E024) = 00000001

然后到session 2中去随便执行一条sql，因为刚才在session 1里面flush了shared pool，所以这里一定是硬解析，并且这里一定会hang住：

Session 2:

SQL> select * from scott.emp;

……这里hang住了

接着我们到session 3中去对session 2做一下process dump：

Session 3:

SQL> oradebug dump processstate 10

已处理的语句

SQL> oradebug tracefile_name

d:\oracle\admin\cuihua\udump\cuihua_ora_3140.trc

d:\oracle\admin\cuihua\udump\cuihua_ora_3140.trc的内容：

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production

With the Partitioning, OLAP and Data Mining options

Windows XP Version V5.1 Service Pack 2

CPU : 2 - type 586

Process Affinity : 0x00000000

Memory (Avail/Total): Ph:1142M/2030M, Ph+PgF:2355M/3921M, VA:1345M/2047M

Instance name: cuihua

Redo thread mounted by this instance: 0 <none>

Oracle process number: 0

Windows thread id: 3140, image: ORACLE.EXE (SHAD)

KCCDEBUG_LEVEL = 0

Dynamic strand is set to TRUE

Running with 2 shared and 18 private strand(s). Zero-copy redo is FALSE

Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_3140.trc

Sat May 05 01:26:29 2012

ORACLE V10.2.0.1.0 - Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production

With the Partitioning, OLAP and Data Mining options

Windows XP Version V5.1 Service Pack 2

CPU : 2 - type 586

Process Affinity : 0x00000000

Memory (Avail/Total): Ph:969M/2030M, Ph+PgF:2332M/3921M, VA:1267M/2047M

Instance name: cuihua

Redo thread mounted by this instance: 1

Oracle process number: 31

Windows thread id: 3140, image: ORACLE.EXE (SHAD)

*** 2012-05-05 01:26:29.609

*** ACTION NAME:(Command Window - New) 2012-05-05 01:26:29.562

*** MODULE NAME:(PL/SQL Developer) 2012-05-05 01:26:29.562

*** SERVICE NAME:(cuihua) 2012-05-05 01:26:29.562

*** SESSION ID:(153.27) 2012-05-05 01:26:29.562

Received ORADEBUG command 'dump processstate 10' from process Windows thread id: 2844, image:

===================================================

PROCESS STATE

-------------

Process global information:

process: 33E5250C, call: 31A8ACBC, xact: 00000000, curses: 33F38174, usrses: 33F38174

----------------------------------------

SO: 33E5250C, type: 2, owner: 00000000, flag: INIT/-/-/0x00

(process) Oracle pid=31, calls cur/top: 31A8ACBC/31A8ACBC, flag: (0) -

int error: 0, call error: 0, sess error: 0, txn error 0

(post info) last post received: 0 0 0

last post received-location: No post

last process to post me: none

last post sent: 0 0 0

last post sent-location: No post

last process posted by me: none

(latch info) wait_event=0 bits=0

Location from where call was made: kgllkdl: child: cleanup: latch

Context saved from call: 1

waiting for 3213e088 Child library cache lock level=6 child#=2

Location from where latch is held: kgllkdl: child: cleanup: latch

Context saved from call: 1

state=busy, wlstate=free

waiters [orapid (seconds since: put on list, posted, alive check)]:

10 (15, 1336152387, 15)

31 (3, 1336152387, 3)

waiter count=2

gotten 135516 times wait, failed first 6 sleeps 5

gotten 0 times nowait, failed: 0

on wait list for 3213e088

Process Group: DEFAULT, pseudo proc: 33E7F368

O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 3140

OSD pid info: Windows thread id: 3140, image: ORACLE.EXE (SHAD)

……省略显示其他内容

注意上述trace文件中有如下内容：

waiting for 3213e088 Child library cache lock level=6 child#=2

但注意这里没有holding Child library cache！所以我们已经证明了硬解析会持有Child library cache lock，但是是在释放了Child library cache的情况下，也就是说硬解析的时候不会同时持有Child library cache和Child library cache lock。

第三步：我们来证明硬解析会去持有Child library cache pin，但是是在释放了Child library cache的情况下，也就是说硬解析的时候不会同时持有Child library cache和Child library cache pin：

证明步骤和思路与第二步一模一样，这里不再赘述，我们直接来看相关的process dump的trace文件：

d:\oracle\admin\cuihua\udump\cuihua_ora_2444.trc的内容：

Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_2444.trc

Sat May 05 01:34:27 2012

ORACLE V10.2.0.1.0 – Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – Production

With the Partitioning, OLAP and Data Mining options

Windows XP Version V5.1 Service Pack 2

CPU : 2 – type 586

Process Affinity : 0×00000000

Memory (Avail/Total): Ph:963M/2030M, Ph+PgF:2334M/3921M, VA:1269M/2047M

Instance name: cuihua

Redo thread mounted by this instance: 1

Oracle process number: 30

Windows thread id: 2444, image: ORACLE.EXE (SHAD)

*** 2012-05-05 01:34:27.609

*** ACTION NAME:(Command Window – New) 2012-05-05 01:34:27.546

*** MODULE NAME:(PL/SQL Developer) 2012-05-05 01:34:27.546

*** SERVICE NAME:(cuihua) 2012-05-05 01:34:27.546

*** SESSION ID:(130.15) 2012-05-05 01:34:27.546

Received ORADEBUG command ‘dump processstate 10′ from process Windows thread id: 800, image:

===================================================

PROCESS STATE

————-

Process global information:

process: 33E51F1C, call: 317DD38C, xact: 00000000, curses: 33F1D17C, usrses: 33F1D17C

—————————————-

SO: 33E51F1C, type: 2, owner: 00000000, flag: INIT/-/-/0×00

(process) Oracle pid=30, calls cur/top: 317DD38C/317DD38C, flag: (0) -

int error: 0, call error: 0, sess error: 0, txn error 0

(post info) last post received: 0 0 0

last post received-location: No post

last process to post me: none

last post sent: 0 0 0

last post sent-location: No post

last process posted by me: none

(latch info) wait_event=0 bits=0

Location from where call was made: kglpnal: child: alloc space: latch

Context saved from call: 1

waiting for 3213df50 Child library cache pin level=6 child#=2

Location from where latch is held: kglpndl: child

Context saved from call: 1

state=busy, wlstate=free

waiters [orapid (seconds since: put on list, posted, alive check)]:

10 (49, 1336152864, 49)

30 (24, 1336152864, 24)

waiter count=2

gotten 348654 times wait, failed first 2 sleeps 1

gotten 0 times nowait, failed: 0

on wait list for 3213df50

Process Group: DEFAULT, pseudo proc: 33E7F368

O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 2444

OSD pid info: Windows thread id: 2444, image: ORACLE.EXE (SHAD)

……省略显示其他内容

注意上述trace文件中有如下内容：

waiting for 3213df50 Child library cache pin level=6 child#=2

但注意这里没有holding Child library cache！所以我们已经证明了硬解析会持有Child library cache pin，但是是在释放了Child library cache的情况下，也就是说硬解析的时候不会同时持有Child library cache和Child library cache pin。

第四步：我们来证明软解析也会去持有Child library cache，所以过度的软解析实际上也会导致Child library cache的争用：

先到session 1、session 2和session 3中去做一下准备工作：

Session 1:

SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0×00000001′ from v$latch_children where name=’library cache’;

‘ORADEBUGPOKE0X’||ADDR||’40X00

————————————-

oradebug poke 0x3213DE80 4 0×00000001

oradebug poke 0x3213DE18 4 0×00000001

oradebug poke 0x3213DDB0 4 0×00000001

Session 2:

SQL> show parameter session_cache;

NAME TYPE VALUE

———————————— ———– ——————————

session_cached_cursors integer 20

这里session_cached_cursors的值大于0，所以这里执行同样的sql三次以上的时候才会是软软解析，如果同样的sql的执行次数大于一次但不超过三次，则是软解析。

SQL> select * from scott.emp;

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO

—– ———- ——— —– ———– ——— ——— ——

7981 CUIHUA 7981

7369 SMITH CLERK 7902 1980-12-17 800.00 20

7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30

7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30

7566 JONES MANAGER 7839 1981-4-2 2975.00 20

7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30

7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30

7782 CLARK MANAGER 7839 1981-6-9 2450.00 10

7788 SCOTT ANALYST 7566 1987-4-19 3000.00 20

7839 KING PRESIDENT 1981-11-17 5000.00 10

7844 TURNER SALESMAN 7698 1981-9-8 1500.00 0.00 30

7876 ADAMS CLERK 7788 1987-5-23 1100.00 20

7800 JAME4 CLERK 7698 1981-12-3 950.00 30

13 rows selected

注意这里子cursor和执行次数均为1：

SQL> select sql_text, SQL_ID, VERSION_COUNT,executions from v$sqlarea where sql_text like ‘select * from scott.emp%’;

SQL_TEXT SQL_ID VERSION_COUNT EXECUTIONS

——————————————————————————– ————- ————- ———-

select * from scott.emp dzk32w36udhc5 1 1

SQL> select PLAN_HASH_VALUE, CHILD_NUMBER from v$sql where sql_id=’dzk32w36udhc5′;

PLAN_HASH_VALUE CHILD_NUMBER

————— ————

3956160932 0

SQL> select sid from v$mystat where rownum=1;

SID

———-

134

SQL> select spid from v$process where addr in (select paddr from v$session where sid=134);

SPID

————

2740

Session 3:

SQL> oradebug setospid 2740

Oracle pid: 30, Windows thread id: 2740, image: ORACLE.EXE (SHAD)

接着我们回到session 1，去手工持有所有的Child library cache：

Session 1:

SQL> oradebug setmypid

已处理的语句

SQL> oradebug poke 0x3213DE80 4 0×00000001

BEFORE: [3213DE80, 3213DE84) = 00000000

AFTER: [3213DE80, 3213DE84) = 00000001

SQL> oradebug poke 0x3213DE18 4 0x00000001

BEFORE: [3213DE18, 3213DE1C) = 00000000

AFTER: [3213DE18, 3213DE1C) = 00000001

SQL> oradebug poke 0x3213DDB0 4 0x00000001

BEFORE: [3213DDB0, 3213DDB4) = 00000000

AFTER: [3213DDB0, 3213DDB4) = 00000001

接着到session 2去执行一下我们刚才已经在session 1中执行过的sql，这里也会hang住：

Session 2:

SQL> select * from scott.emp;

……这里hang住了

接着我们到session 3中去对session 2做一下process dump：

Session 3:

SQL> oradebug dump processstate 10

已处理的语句

SQL> oradebug tracefile_name

d:\oracle\admin\cuihua\udump\cuihua_ora_2740.trc

d:\oracle\admin\cuihua\udump\cuihua_ora_2740.trc的内容：

Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_2740.trc

Thu Jan 28 09:08:16 2010

ORACLE V10.2.0.1.0 - Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production

With the Partitioning, OLAP and Data Mining options

Windows XP Version V5.1 Service Pack 2

CPU : 2 - type 586

Process Affinity : 0x00000000

Memory (Avail/Total): Ph:1387M/2030M, Ph+PgF:2882M/3921M, VA:1272M/2047M

Instance name: cuihua

Redo thread mounted by this instance: 1

Oracle process number: 24

Windows thread id: 2740, image: ORACLE.EXE (SHAD)

*** SERVICE NAME:(cuihua) 2010-01-28 09:08:16.375

*** SESSION ID:(140.4) 2010-01-28 09:08:16.375

Memory Notification: Library Cache Object loaded into SGA

Heap size 2210K exceeds notification threshold (2048K)

LIBRARY OBJECT HANDLE: handle=33872c70 mutex=33872D24(0)

name=XDB.XDbD/PLZ01TcHgNAgAIIegtw==

hash=e0f82b0c545a707da230a62675d34c80 timestamp=08-30-2005 14:28:09

namespace=XDBS flags=KGHP/TIM/SML/[02000000]

kkkk-dddd-llll=0000-0000-0000 lock=S pin=S latch#=2 hpc=0002 hlc=0002

lwt=33872CCC[33872CCC,33872CCC] ltm=33872CD4[33872CD4,33872CD4]

pwt=33872CB0[33872CB0,33872CB0] ptm=33872CB8[33872CB8,33872CB8]

ref=33872CEC[33872CEC,33872CEC] lnd=33872CF8[33872CF8,33872CF8]

LIBRARY OBJECT: object=303c0ba0

type=XDBS flags=EXS/LOC[0005] pflags=[0000] status=VALD load=X

DATA BLOCKS:

data# heap pointer status pins change whr alloc(K) size(K)

—– ——– ——– ——— —- —— — ——– ——–

0 33872c00 303c0c5c I/P/A/-/- 0 NONE 00 0.34 0.00

1 303c0c98 3038daa8 I/P/A/-/- 1 NONE 00 2210.98 2224.10

Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_2740.trc

Sat May 05 08:48:42 2012

ORACLE V10.2.0.1.0 – Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – Production

With the Partitioning, OLAP and Data Mining options

Windows XP Version V5.1 Service Pack 2

CPU : 2 – type 586

Process Affinity : 0×00000000

Memory (Avail/Total): Ph:1290M/2030M, Ph+PgF:2663M/3921M, VA:1268M/2047M

Instance name: cuihua

Redo thread mounted by this instance: 1

Oracle process number: 30

Windows thread id: 2740, image: ORACLE.EXE (SHAD)

*** 2012-05-05 08:48:42.781

*** ACTION NAME:(Command Window – New) 2012-05-05 08:48:42.687

*** MODULE NAME:(PL/SQL Developer) 2012-05-05 08:48:42.687

*** SERVICE NAME:(cuihua) 2012-05-05 08:48:42.687

*** SESSION ID:(134.7) 2012-05-05 08:48:42.687

Received ORADEBUG command ‘dump processstate 10′ from process Windows thread id: 400, image:

===================================================

PROCESS STATE

————-

Process global information:

process: 33E51F1C, call: 317C80A0, xact: 00000000, curses: 33F21C9C, usrses: 33F21C9C

—————————————-

SO: 33E51F1C, type: 2, owner: 00000000, flag: INIT/-/-/0×00

(process) Oracle pid=30, calls cur/top: 317C80A0/317C80A0, flag: (0) -

int error: 0, call error: 0, sess error: 0, txn error 0

(post info) last post received: 0 0 0

last post received-location: No post

last process to post me: none

last post sent: 0 0 0

last post sent-location: No post

last process posted by me: none

(latch info) wait_event=0 bits=0

Location from where call was made: kglpin: child: heap processing: latch

Context saved from call: 1

waiting for 3213de18 Child library cache level=5 child#=2

Location from where latch is held: kglpndl: child: after processing: latch

Context saved from call: 1

state=busy, wlstate=free

waiters [orapid (seconds since: put on list, posted, alive check)]:

30 (39, 1336178921, 39)

waiter count=1

gotten 179887 times wait, failed first 15 sleeps 1

gotten 206 times nowait, failed: 0

on wait list for 3213de18

Process Group: DEFAULT, pseudo proc: 33E7F368

O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 2740

OSD pid info: Windows thread id: 2740, image: ORACLE.EXE (SHAD)

……省略显示其他内容

注意上述trace文件中有如下内容：

waiting for 3213de18 Child library cache level=5 child#=2

所以我们已经证明了软解析也会去持有Child library cache（这个其实很正常，因为软解析还是要扫描library cache object handle链表的嘛），即过度的软解析实际上也会导致Child library cache的争用。

手工释放上述Child library cache后回到session 2，刚才hang住的sql已恢复正常:

SQL> select * from scott.emp;

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO

—– ———- ——— —– ———– ——— ——— ——

7981 CUIHUA 7981

7369 SMITH CLERK 7902 1980-12-17 800.00 20

7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30

7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30

7566 JONES MANAGER 7839 1981-4-2 2975.00 20

7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30

7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30

7782 CLARK MANAGER 7839 1981-6-9 2450.00 10

7788 SCOTT ANALYST 7566 1987-4-19 3000.00 20

7839 KING PRESIDENT 1981-11-17 5000.00 10

7844 TURNER SALESMAN 7698 1981-9-8 1500.00 0.00 30

7876 ADAMS CLERK 7788 1987-5-23 1100.00 20

7800 JAME4 CLERK 7698 1981-12-3 950.00 30

13 rows selected

从如下结果可以看到，子cursor的数量还是1，执行次数变成了2，即刚才第二次执行的sql毫无疑问是软解析：

SQL> select sql_text, SQL_ID, VERSION_COUNT,executions from v$sqlarea where sql_text like ‘select * from scott.emp%’;

SQL_TEXT SQL_ID VERSION_COUNT EXECUTIONS

——————————————————————————– ————- ————- ———-

select * from scott.emp dzk32w36udhc5 1 2

第五步：我们来证明软解析还是会去持有Child library cache lock，但是是在释放了Child library cache的情况下，也就是说软解析会去持有Child library cache lock，但不会同时持有Child library cache和Child library cache lock：

证明步骤和思路与第二步一模一样，这里不再赘述，我们直接来看相关的process dump的trace文件：

d:\oracle\admin\cuihua\udump\cuihua_ora_2988.trc的内容：

Dump file d:\oracle\admin\cuihua\udump\cuihua_ora_2988.trc

Sat May 05 01:50:36 2012

ORACLE V10.2.0.1.0 – Production vsnsta=0

vsnsql=14 vsnxtr=3

Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – Production

With the Partitioning, OLAP and Data Mining options

Windows XP Version V5.1 Service Pack 2

CPU : 2 – type 586

Process Affinity : 0×00000000

Memory (Avail/Total): Ph:1031M/2030M, Ph+PgF:2330M/3921M, VA:1269M/2047M

Instance name: cuihua

Redo thread mounted by this instance: 1

Oracle process number: 24

Windows thread id: 2988, image: ORACLE.EXE (SHAD)

*** 2012-05-05 01:50:36.531

*** ACTION NAME:(Command Window – New) 2012-05-05 01:50:36.468

*** MODULE NAME:(PL/SQL Developer) 2012-05-05 01:50:36.468

*** SERVICE NAME:(cuihua) 2012-05-05 01:50:36.468

*** SESSION ID:(132.36) 2012-05-05 01:50:36.468

Received ORADEBUG command ‘dump processstate 10′ from process Windows thread id: 972, image:

===================================================

PROCESS STATE

————-

Process global information:

process: 33E4FB7C, call: 33F61154, xact: 00000000, curses: 33F1F70C, usrses: 33F1F70C

—————————————-

SO: 33E4FB7C, type: 2, owner: 00000000, flag: INIT/-/-/0×00

(process) Oracle pid=24, calls cur/top: 33F61154/33F61154, flag: (0) -

int error: 0, call error: 0, sess error: 0, txn error 0

(post info) last post received: 0 0 0

last post received-location: No post

last process to post me: none

last post sent: 0 0 0

last post sent-location: No post

last process posted by me: none

(latch info) wait_event=0 bits=0

Location from where call was made: kgllkdl: child: cleanup: latch

Context saved from call: 2

waiting for 3213e0f0 Child library cache lock level=6 child#=1

Location from where latch is held: kgllkdl: child: cleanup: latch

Context saved from call: 2

state=busy, wlstate=free

waiters [orapid (seconds since: put on list, posted, alive check)]:

25 (123, 1336153833, 123)

27 (123, 1336153833, 123)

11 (120, 1336153833, 120)

20 (114, 1336153833, 114)

19 (111, 1336153833, 111)

24 (39, 1336153833, 39)

waiter count=6

gotten 261770 times wait, failed first 74 sleeps 22

gotten 0 times nowait, failed: 0

on wait list for 3213e0f0

Process Group: DEFAULT, pseudo proc: 33E7F368

O/S info: user: SYSTEM, term: Y8908CUIHUA, ospid: 2988

OSD pid info: Windows thread id: 2988, image: ORACLE.EXE (SHAD)

……省略显示其他内容

注意上述trace文件中有如下内容：

waiting for 3213e0f0 Child library cache lock level=6 child#=1

但注意这里没有holding Child library cache！所以我们已经证明了软解析依然会持有Child library cache lock，但是是在释放了Child library cache的情况下，也就是说软解析会去持有Child library cache lock，但不会同时持有Child library cache和Child library cache lock。

第六步：我们来证明软解析还是会去持有Child library cache pin，但是是在释放了Child library cache的情况下，也就是说软解析会去持有Child library cache pin，但不会同时持有Child library cache和Child library cache pin：

证明步骤和思路与第五步一模一样，这里不再赘述。

结合我文章开头提到的那三篇文章（尤其是大熊的那篇文章）和上面的六步证明过程，我们得出如下关于硬解析和软解析时latch持有情况的结论（这个结论是近似的，并没有包括所有的情况）：

1、SQL在硬解析的时候，会先去持有Child library cache以扫描library cache object handle链表，接着会释放Child library cache（因为找不到相关的library cache object handle），接下来会再次持有Child library cache（因为要修改library cache object handle链表），然后在不释放Child library cache的情况下进而去持有Child shared pool以分配内存，接着释放Child shared pool（因为已经分配完了），接着再释放Child library cache；然后因为要以null模式获得library cache lock，所以要先持有Child library cache lock；接着要把相关sql的执行计划load到对应的子cursor的library cache object handle中，即要以exclusive模式获得library cache pin，所以要先持有Child library cache pin；

2、SQL在软解析的时候和硬解析类似，只不过不需要持有Child shared pool并且持有Child library cache的时间会更短。

最后我们来说一下硬解析和软解析在latch争用情况下的本质区别：

1、软解析不会导致Child shared pool的争用；

2、硬解析跟软解析比会导致更为剧烈的Child library cache的争用，这个当然是因为硬解析在不释放Child library cache的情况下进而会去持有Child shared pool。

8 Comments on “硬解析和软解析情况下latch持有情况分析”

1. boypoo
  
  May 6, 2012 at 8:48 am
  
  测试的真仔细！
  
  Reply
2. overmars
  
  May 7, 2012 at 10:13 pm
  
  关于硬分析持有的latch，我看大部分的书都是这样描述：
  硬分析应该是先持有SQL handle的library cache lock ，然后在生成计划的过程中，对SQLhandle的LCO以exclusive 方式持有library cache pin.
  
  但是楼主的说法，好像恰好反过来了，请楼主能否进一步解释下2者持有的顺序，lock和pin的对象分别是针对的对象是谁？
  
  Reply
  - Cui Hua
    
    May 7, 2012 at 10:20 pm
    
    我后一篇文章会解释为什么是library cache pin在前，library cache lock在后。
    
    Reply
3. overmars
  
  May 8, 2012 at 1:31 am
  
  ok，等待您的大作！
  
  Reply
  - Cui Hua
    
    May 14, 2012 at 9:47 pm
    
    这个你是对的，应该是先持有library cache lock latch，再持有library cache pin latch，我已经改过来了。
    
    Reply
4. wzh
  
  January 22, 2013 at 11:27 am
  
  测试非常详细，看完后加深了对软解析、硬解析的理解。
  
  Reply
5. taowang2016
  
  February 22, 2013 at 5:34 pm
  
  你好，goog job!
  我在重现以下过程：
  第三步：我们来证明硬解析会去持有Child library cache pin，但是是在释放了Child library cache的情况下，也就是说硬解析的时候不会同时持有Child library cache和Child library cache pin
  时出现了一个问题，向你请教。
  
  环境：windows 32
  oracle:10.2.0.1
  
  准备信息如下：
  session1：
  SQL> select ‘oradebug poke 0x’ || addr || ‘ 4 0×00000001′ from v$latch_children where name=’library cache pin’;
  
  ‘ORADEBUGPOKE0X’||ADDR||’40X00000001′
  ————————————————
  oradebug poke 0x240BDF50 4 0×00000001
  oradebug poke 0x240BDEE8 4 0×00000001
  oradebug poke 0x240BDE80 4 0×00000001
  
  SQL> alter system flush shared_pool;
  
  系统已更改。
  
  SQL> oradebug setmypid
  已处理的语句
  
  session2的准备信息如下：
  SQL> select sid from v$mystat where rownum=1;
  
  SID
  ———-
  148
  
  SQL> select spid from v$process where addr in (select paddr from v$session where sid=148);
  
  SPID
  ————————
  4072
  
  session3准备信息如下：
  SQL> oradebug setospid 4072
  
  Oracle pid: 19, Windows thread id: 4072, image: ORACLE.EXE (SHAD)
  
  接着
  在session1中poke了library cache pin
  
  SQL> oradebug poke 0x240BDF50 4 0×00000001
  BEFORE: [240BDF50, 240BDF54) = 00000000
  AFTER: [240BDF50, 240BDF54) = 00000001
  SQL> oradebug poke 0x240BDEE8 4 0×00000001
  BEFORE: [240BDEE8, 240BDEEC) = 00000000
  AFTER: [240BDEE8, 240BDEEC) = 00000001
  SQL> oradebug poke 0x240BDE80 4 0×00000001
  BEFORE: [240BDE80, 240BDE84) = 00000000
  AFTER: [240BDE80, 240BDE84) = 00000001
  
  session2中执行以下查询，被hang住：
  
  SQL> select * from scott.emp;
  select * from scott.emp
  hang住
  
  session3中dump process信息，此时出现了如下错误：
  
  SQL> oradebug dump processstate 10
  ORA-00072: 进程 “Windows thread id: 4072, image: ORACLE.EXE (SHAD)” 不活动
  
  同时，发现session2中有如下提示错误：
  
  select * from scott.emp
  *
  第 1 行出现错误:
  ORA-03113: 通信通道的文件结束
  
  请问这是如何导致的，经过测试library cache lock的对的，但这个library cache pin的测试做了很多次，一直都是上面的错误信息，是我测试的哪一步出了问题呢？
  
  Reply
  - taowang2016
    
    February 22, 2013 at 6:18 pm
    
    还是同样的测试，刚特意在centos 5.7 64bit， oracle 10.2.0.1 64bit的环境上进行，就没有这个问题，也就是说在linux上的测试结果和你的是一样的，可是在windows上，试了很多次，一直提示如上的错误，问题可能出在什么地方呢？
    
    Reply