结论
1,阻塞process allocation latch,数据库会出现latch free及dfs lock handle2,dfs lock handle通过systemstate dump找不到有价值的信息
却可通过v$session的p1,p2,p3找到有价值的信息
3,因为dfs lock handle的p1,p2,p3含义不知,只能借助于mos,找到文章:Understanding 'DFS Lock Handle' and Cross-Instance (CI) Call Waits [ID 458292.1]
定位到锁CI
4,关于锁CI的含义请见v$lock_type
5, 关于ID1及ID2请见 Understanding 'DFS Lock Handle' and Cross-Instance (CI) Call Waits [ID 458292.1]
6,根据ID1及ID2不同的值,即会在不同的解决思路
7,ORACLE获取锁TYEP的脚本
select chr(bitand(p1,-16777216)/16777215) || chr(bitand(p1, 16711680)/65535) "Lock_type" from dual; --lock_type对应v$lock.type
测试
--oradebug suspend process allocation latch前
SQL> select sid,serial#,paddr from v$session where sid=(select sid from v$mystat where rownum=1);
SID SERIAL# PADDR
---------- ---------- ----------------
153 742 0000000083A60520
SQL> select pid,spid from v$process where addr='0000000083A60520';
PID SPID
---------- ------------
22 28711
SQL> create table t_process(a int);
Table created.
SQL> select sid,serial#,program,event,blocking_session,p1,p1text,p2,p2text,p3,p3text from v$session where sid=153;
SID SERIAL# PROGRAM EVENT BLOCKING_SESSION P1 P1TEXT P2 P2TEXT P3 P3TEXT
---------- ---------- ------------------------------ --------------- ---------------- ---------- --------------- ---------- --------------- ---------- ---------------
153 742 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message 1650815232 driver id 1 #bytes 0
from client
---oradebug suspend process allocation latch后
SQL> select name,addr,latch#,level# from v$latch where name='process allocation';
NAME ADDR LATCH# LEVEL#
-------------------- ---------------- ---------- ----------
process allocation 0000000060007498 3 1
SQL> select 'oradebug poke 0x'||addr||' 4 0x00000001;' from v$latch where latch#=3;
'ORADEBUGPOKE0X'||ADDR||'40X00000001;'
----------------------------------------------
oradebug poke 0x0000000060007498 4 0x00000001;
SQL> oradebug setmypid
Statement processed.
SQL> oradebug poke 0x0000000060007498 4 0x00000001;
BEFORE: [060007498, 06000749C) = 00000000
AFTER: [060007498, 06000749C) = 00000001
可见dml不受影响
SQL> insert into t_process values(1);
1 row created.
SQL> commit;
Commit complete.
而发生检查点,不过过了一会儿就正常了,这里就不好说process allocation latch对于发生检查点的影响程度了
SQL> alter system checkpoint;
等待dfs lock handle
SQL> select sid,serial#,program,event,blocking_session,p1,p1text,p2,p2text,p3,p3text from v$session where sid=153;
SID SERIAL# PROGRAM EVENT BLOCKING_SESSION P1 P1TEXT P2 P2TEXT P3 P3TEXT
---------- ---------- ------------------------------ --------------- ---------------- ---------- --------------- ---------- --------------- ---------- ---------------
153 742 sqlplus@jingfa1 (TNS V1-V3) DFS lock handle 1128857605 type|mode 2 id1 2 id2
SQL> select sid,serial#,program,event from v$session where type='USER' and wait_class<>'Idle';
SID SERIAL# PROGRAM EVENT
---------- ---------- ------------------------------ --------------------------------------------------
132 866 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message to client
新增2个无法登陆的会话,可见后台台进程mmon及lmon分别在等待dfs lock handle及latch free
SQL> select sid,serial#,program,event from v$session where wait_class<>'Idle';
SID SERIAL# PROGRAM EVENT
---------- ---------- ------------------------------ --------------------------------------------------
132 866 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message to client
156 1 oracle@jingfa1 (MMON) DFS lock handle
167 1 oracle@jingfa1 (LMON) latch free
新增一个无法登陆会话,MMON又变成等待os thread startup
SQL> select sid,serial#,program,event from v$session where wait_class<>'Idle';
SID SERIAL# PROGRAM EVENT
---------- ---------- ------------------------------ --------------------------------------------------
132 866 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message to client
156 1 oracle@jingfa1 (MMON) os thread startup
167 1 oracle@jingfa1 (LMON) latch free
继续新增一个无法登陆会话,等待会话由3个变成了4个
SQL> select sid,serial#,program,event from v$session where wait_class<>'Idle';
SID SERIAL# PROGRAM EVENT
---------- ---------- ------------------------------ --------------------------------------------------
132 866 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message to client
156 1 oracle@jingfa1 (MMON) DFS lock handle
160 1 oracle@jingfa1 (CKPT) os thread startup
167 1 oracle@jingfa1 (LMON) latch free
新增会话,同上
新增会话,由4个会话变成了3个会话,少了CKPT进程
SQL> select sid,serial#,program,event from v$session where wait_class<>'Idle';
SID SERIAL# PROGRAM EVENT
---------- ---------- ------------------------------ --------------------------------------------------
132 866 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message to client
156 1 oracle@jingfa1 (MMON) DFS lock handle
167 1 oracle@jingfa1 (LMON) latch free
新增会话,同上
分析dump,暂未分析出有价值的内容
SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump systemstate 3
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/admin/jingfa/udump/jingfa1_ora_27463.trc
转换思路,dfs lock handle几个参数含义到底是什么
SQL> select sid,serial#,program,event,p1,p1text,p2,p2text,p3,p3text from v$session where wait_class<>'Idle';
SID SERIAL# PROGRAM EVENT P1 P1TEXT P2 P2TEXT P3 P3TEXT
---------- ---------- ------------------------------ --------------- ---------- --------------- ---------- --------------- ---------- ---------------
132 866 sqlplus@jingfa1 (TNS V1-V3) SQL*Net message 1650815232 driver id 1 #bytes 0
to client
156 1 oracle@jingfa1 (MMON) DFS lock handle 1128857605 type|mode 10 id1 2 id2
167 1 oracle@jingfa1 (LMON) latch free 1610642584 address 3 number 177447 tries
也上DUMP文件匹配,但还是不知是什么含义
SQL> select to_char('1128857605','xxxxxxxxx') from dual;
TO_CHAR('1
----------
43490005
经过查阅MOS发现如下文章:
Understanding 'DFS Lock Handle' and Cross-Instance (CI) Call Waits [ID 458292.1]
获知:id1及id2的含义:可见id1=10,即 Create Remote parallel query Server ,说明与进程创建有关系,也就是说无法创建进程
id2=2,即调用后台进程中一个函数
1 reuse (checkpoint and invalidate block range)
2 LGWR Checkpointing and Hot Backup
3 DBWR syncronization of SGA with control file
4 log file add/drop/rename notification
5 miscellaneous CTWR operations
7 Manged standby recovery related
8 alter rollback segment optimal
9 Signal Query Servers/coordinator
10 Create Remote parallel query Server
11 Set Global Partitions
12 Stop Disk Writes
13 Drop Sort Segments
14 Release unused space from Sort Segments
15 instance Recovery for Parallel operation Group
16 Validate parallel slave Lock Value
17 check transaction state objects
18 object reuse request
19 rolling release checks
20 propagate begin backup scn for a file
21 refresh top plan (for db scheduler
22 clear checkpoint progress record
23 drop temp file
24 QUiesce database Restricted
25 Update Dscn Tracking (ktcndt
26 Purge dictionary Object number Cache
27 set Database Force Logging mode
28 invalidate cached file address translations
29 Cursor Unauthorize Mode
30 process waiters after row cache requeue
31 Active Change Directory extent relocation
32 block change tracking state change
33 kgl mulitversion obsolete
34 set previous resetlogs data
35 set recovery destination pointer
36 fast object reuse request
38 ASM diskgroup discovery wait
39 ASM diskgroup release
40 ASM push DB updates
41 ASM add ACD chunk
42 ASM map resize message
43 ASM map lock message
44 ASM map unlock message (phase 1
45 ASM map unlock message (phase 2
46 ASM generate add disk redo marker
d47 ASM check of PST validity
48 ASM offline disk CIC
49 Logical Standby Sync Point SCN
50 update SQL Tuning Base existence bitvector
51 PQ induced Checkpointing
52 ASM F1X0 relocation
53 Scheduler autostart
54 KZS increment grant/revoke counter
55 ASM disk operation message
56 ASM I/O error emulation
57 DB Supp log cursor invalidation
58 Cache global range invalidation
59 Cache global object invalidation
60 ASM Pre-Existing Extent Lock wait
61 Perform a ksk action through DBWR
62 ASM diskgroup refresh wait
63 KCBO object checkpoint
64 KCBO object pq checkpoint
65 global health check event
66 Oracle Label Security refresh
67 thread internal enable
68 cross-instance registration
69 KGL purge unused subheaps
70 clear pin instance flag
71 Rolling operations CIC
The following can be used to understand id2:
1 used to pass in parameters
2 used to invoke the function in backgroud process
3 used to indicate the foreground has not returned
4 mounted excl, use to allocate mechanism
5 used to queue up interested clients
基于上述分析,获取fs lock handle到底是什么锁
SQL> select chr(bitand(1128857605,-16777216)/16777215) || chr(bitand(1128857605, 16711680)/65535) "Lock_name" from dual;
Lock_name
--------------------
CI
获取CI锁的含义,用于协调跨实例函数的调用,可见用于RAC
SQL> select type,name,id1_tag,id2_tag,is_user,description from v$lock_type where type='CI';
TYPE NAME ID1_TAG ID2_TAG IS_USER DESCRIPTION
--------------- -------------------- ---------- ---------- ---------- --------------------------------------------------
CI Cross-Instance Call opcode type NO Coordinates cross-instance function invocations
Invocation
发现没有任何会话持CI锁
SQL> select sid,type,id1,id2,lmode,request from v$lock where type='CI';
no rows selected
一运行全局GV$LOCK马上就HANG住了
SQL> select sid,type,id1,id2,lmode,request from gv$lock where type='CI';
在另一个节点查询看看,也没有内容
SQL> select sid,type,id1,id2,lmode,request from v$lock where type='CI';
no rows selected
---换个思路,查看对应节点的告警日志
上述查询一直HNAG时,告警日志是没有信息的,但如果你强制取消,出现如下信息,可见是启动并行进程时报错
引申,DB HANG或出故障时,查询GV视图会去创建并行查询进程(到于为何要创建这个并行查询进程,将在下文分析)
Fri Nov 13 12:06:58 2015
Timed out trying to start process PZ99.
这个就和前面的联系起来吧,哈哈,因为oradebug suspend process allocation latch
我们resume process allocation latch
[oracle@jingfa1 bdump]$ ps -ef|grep ora_|wc -l
25
[oracle@jingfa1 bdump]$
我们再次运行gv$lock查询
可见没有进程没有变化
[oracle@jingfa1 bdump]$ ps -ef|grep ora_|wc -l
25
[oracle@jingfa1 bdump]$
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/9240380/viewspace-1837910/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/9240380/viewspace-1837910/