顾名思义,hanganalyze 是用来在 oracle hang 的时候分析的。对于 oracle 来说,hang 的表现 基本上是 由于 internal deadlock,对于这种 deadlock ,oracle 是没办法自动判断并清除的,而对于 TM 级别的 deadlock,oracle 能自动判断并且能主动停止其中一个。
这里,我简单说下 hanganalyze 的用法,并举一个例子:
1、 介绍下 hanganalyze 级别的定义
10 Dump all processes (IGN state)
5 Level 4 + Dump all processes involved in wait chains (NLEAF state)
4 Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state)
3 Level 2 + Dump only processes thought to be in a hang (IN_HANG state)
1-2 Only HANGANALYZE output, no process dump at all
一般情况下,我们都定义到 级别 3,metalink 上说,级别 3 以上,需要 oracle support 的人告知后,才能用。
2、 hanganalyze 一些术语的解释
Examples of session “headers”:
Oracle 8.x chain header:
Oracle9i chain header: :
Where:
sid = Session ID
sess_srno = Serial#
proc_ptr = Process Pointer
ospid = OS Process Id
wait_event = Waitevent
cnode = Node Id (Only available since Oracle9i) 节点代号,如果为rac,其值就存在,单节点的值为0
Examples of STATE OF THE NODES DESCRIPTION[番茄花园1]
Oracle 8.x : [nodenum]/sid/sess_srno/session/state/start/finish/[adjlist]/predecessor
Oracle9i: [nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor
Where:
Nodenum[番茄花园2] = This is sequential number used by HANGANALYZE to identify each session
sid = Session ID
sess_srno = Serial#
ospid = OS Process Id
state = State of the node
adjlist = adjacent node (Usually represents a blocker node)
predecessor = predecessor node (Usually represents a waiter node)
cnode = Node number (Only available since Oracle9i)
3、 hanganalyze 的用法
单节点下,可以:
HANGANALYZE may be executed using the following syntax:
ALTER SESSION SET EVENTS 'immediate trace name HANGANALYZE level ';
Or when logged in with the “SYSDBA” role,
ORADEBUG hanganalyze
Rac 下可以:
To perform. cluster wide HANGANALYZE use the following syntax:
ORADEBUG setmypid
ORADEBUG setinst all
ORADEBUG -g def hanganalyze
4、 hanganalyze 的解释
hanganalyze 由几个主要部分组成
CYCLES: This section reports the process dependencies between sessions that are in a deadlock condition. Cycles are considered “true” hangs. –出现 cycle 表明,系统hang住了
Cycle 1 : :
<980/3887/0xe4214964/24065/latch free>
-- <2518/352/0xe4216560/24574/latch free>
-- <55/10/0xe41236a8/13751/latch free>
OPEN CHAINS: This section reports sessions involved on a wait chain. A wait chains means that one session is blocking one or more other sessions. –出现 open chains 系统不一定会 hang,只是表明系统有可能变慢,也就是 某些session 阻塞了 另外一些 session
Open chains found:
Chain 1 : :
<2/1/0xe411b0f4/12280/db file parallel write>
Chain 2 : :
<3/1/0xe411b410/12282/No Wait>
Chain 6 : :
<18/1631/0xe4243cf8/25457/db file scattered read>
-- <229/1568/0xe422b84c/8460/buffer busy waits>
Chain 17 : :
<56/11/0xe4123ce0/13755/latch free>
-- <2384/599/0xe41890dc/22488/latch free> --这个表明 56的sid 阻塞了 下面两个sid[番茄花园3]
-- <32/2703/0xe41fa284/25693/latch free>
OTHER CHAINS: It refers to chains of blockers and waiters related to other sessions identified under “open chains”, but not blocked directly by the process reported on the "open chain".
Other chains found:
Chain 676 : :
<20/93/0xe411d644/13597/latch free>
Chain 677 : :
<27/1201/0xe41d3188/15809/latch free>
Chain 678 : :
<36/1532/0xe428be8c/4232/latch free>
-- <706/1216/0xe4121aac/23317/latch free>
Chain 679 : :
<43/12/0xe4122d54/13745/latch free>
Chain 680 : :
<80/2/0xe41290d4/13811/library cache pin>
-- <1919/1134/0xe421fdbc/3343/enqueue>
State of nodes: --这部分内容比较重要,列出了具体的 session 信息,和阻塞信息
The following describes the important states to be considered:
IN_HANG: This might be considered as the most critical STATE. Basically a node in this state is involved in a deadlock, or is hung. Usually there will be another “adjacent node” in the same status. For example:
[nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor
[16]/0/17/154/0x24617be0/26800/IN_HANG/29/32/[185]/19
[185]/1/16/4966/0x24617270//IN_HANG/30/31/[16]/16
In this example the node [16] is waiting for node [185], and the other way around; this is a cyclical condition (deadlock).
LEAF and LEAF_NW: Leaf nodes are considered on top of the wait chain (usually blockers). They are considered “Blockers” when there is another session waiting. This can be easily identified using the “predecesor” field. If there is a node referenced in the ‘prdecessor’ field, the node is considered as “blocker”, otherwise it is considered as a “slow” session waiting for some resource.
The difference between LEAF and LEAF_NW is that LEAF nodes are not waiting for something, while LEAF_NW are not waiting or may be using the CPU. A typical representation of these nodes when they are considered blockers is:
[ nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor
[16]/0/17/154/0x24617be0/26800/LEAF/29/30//19
[19]/0/20/13/0x24619830/26791/NLEAF/33/34/[16]/186
In this example, node [16] is blocking node [19]. Notice that node [16] has node [19] in the predecessor field.
Also notice that node [19] has node [16] in the adjacent list.
NLEAF : These sessions are usually considered as “stuck” sessions. It means that there is another session holding a resource needed by the session in this state. By using the adjlist, you can determine which node is the blocker of this process. When many sessions are found in this state, it is likely the database is experiencing a performance problem rather than a hang problem.
IGN and IGN_DMP : Sessions in this state are usually considered as IDLE sessions, unless they reference a node in the “ adjlist” field. In this case, the node is waiting for another node, so it will be considered as a ‘stuck’ session as well.
Extending the previous example,
[nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor
[16]/0/17/154/0x24617be0/26800/LEAF/29/30//19
[19]/0/20/13/0x24619830/26791/NLEAF/33/34/[16]/186
[189]/1/20/36/0x24619830//IGN/95/96/[19]/none
[176]/1/7/1/0x24611d80//IGN/75/76//none
You may notice that node [189] is waiting for node [19] which in turn is waiting for node [16], while node [176] is an IDLE session . This maybe the case when a session has a DML lock but never finished the transaction.
In Oracle9i, two new states were introduced to differentiate between LEAF nodes that have other nodes waiting behind them (i.e., LEAF nodes that are blockers) vs. LEAF nodes that are not affecting other nodes.
SINGLE_NODE and SINGLE_NODE_NW:
This can be described the same as LEAF and LEAF_NW, except that they don't have processes depending on them.
5、 hanganalyze 的例子
以下是一个例子,先主要说明下这个例子:
Session 1(sid 71):执行 update 语句,跟新表中的一行,但是不提交
UPDATE Test_c c SET c.ch1='bb' WHERE c.ch1='b'
Session 2-14:分别都执行上面的语句,由于 sid 71 未提交,导致这几个 session 出现等待,这时候,执行 hanganalyze,可看到如下内容:
==============
HANG ANALYSIS:
==============
Found 13 objects waiting for
<0/71/2410/0x20d85bc4/3568/SQL*Net message from client> --当一个 session 导致了超过 10 个的session 去等待的时候,就会出现上面的内容
Open chains found:
Chain 1 : :
<0/71/2410/0x20d85bc4/3568/SQL*Net message from client>
-- <0/42/2/0x20d8cd2c/6112/enq: TX - row lock contention> --这里 只记录了一个 session 去等待 sid 71 这个session
Other chains found:
Chain 2 : :
<0/52/6959/0x20d855cc/5024/jobq slave wait>
Chain 3 : :
<0/58/2141/0x20d88b84/4300/No Wait>
Chain 4 : :
<0/59/1/0x20d8799c/2608/Streams AQ: waiting for time man>
Chain 5 : :
<0/60/1/0x20d873a4/2604/Streams AQ: qmn slave idle wait>
Chain 6 : :
<0/62/1/0x20d86dac/2452/Streams AQ: qmn coordinator idle>
Extra information that will be dumped at higher levels:
[level 4] : 1 node dumps -- [REMOTE_WT] [LEAF] [LEAF_NW]
[level 5] : 6 node dumps -- [SINGLE_NODE] [SINGLE_NODE_NW] [IGN_DMP]
[level 10] : 26 node dumps -- [IGN]
State of nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/start/finish/[adjlist]/predecessor):
[41]/0/42/2/0x20dd427c/6112/IGN_DMP/1/4/[70]/none
[42]/0/43/2/0x20dd5664/4288/IGN/5/6/[70]/none
[43]/0/44/2/0x20dd6a4c/4688/IGN/7/8/[70]/none
[44]/0/45/2/0x20dd7e34/1448/IGN/9/10/[70]/none
[45]/0/46/2/0x20dd921c/2596/IGN/11/12/[70]/none
[46]/0/47/938/0x20dda604/4256/IGN/13/14/[70]/none
[47]/0/48/229/0x20ddb9ec/6028/IGN/15/16/[70]/none
[48]/0/49/3/0x20ddcdd4/5296/IGN/17/18/[70]/none
[49]/0/50/11/0x20dde1bc/2984/IGN/19/20/[70]/none
[50]/0/51/548/0x20ddf5a4/3156/IGN/21/22/[70]/none
[51]/0/52/6959/0x20de098c/5024/SINGLE_NODE/23/24//none
[52]/0/53/1426/0x20de1d74/1100/IGN/25/26//none
[53]/0/54/8445/0x20de315c/3536/IGN/27/28/[70]/none
[57]/0/58/2141/0x20de80fc/4300/SINGLE_NODE_NW/29/30//none
[58]/0/59/1/0x20de94e4/2608/SINGLE_NODE/31/32//none
[59]/0/60/1/0x20dea8cc/2604/SINGLE_NODE/33/34//none
[61]/0/62/1/0x20ded09c/2452/SINGLE_NODE/35/36//none
[63]/0/64/116/0x20def86c/5080/IGN/37/38/[70]/none
[66]/0/67/3/0x20df3424/3188/IGN/39/40//none
[67]/0/68/3/0x20df480c/3764/IGN/41/42//none
[69]/0/70/1227/0x20df6fdc/4500/IGN/43/44/[70]/none
[70]/0/71/2410/0x20df83c4/3568/LEAF/2/3//41
[71]/0/72/1/0x20df97ac/3572/IGN/45/46//none
[72]/0/73/1/0x20dfab94/3564/IGN/47/48//none
[73]/0/74/1/0x20dfbf7c/3556/IGN/49/50//none
[74]/0/75/1/0x20dfd364/3548/IGN/51/52//none
[75]/0/76/1/0x20dfe74c/3540/IGN/53/54//none
[76]/0/77/1/0x20dffb34/3532/IGN/55/56//none
[77]/0/78/1/0x20e00f1c/3528/IGN/57/58//none
[78]/0/79/1/0x20e02304/3524/IGN/59/60//none
[79]/0/80/1/0x20e036ec/3496/IGN/61/62//none
[80]/0/81/1/0x20e04ad4/3492/IGN/63/64//none
[81]/0/82/1/0x20e05ebc/3488/IGN/65/66//none
====================
END OF HANG ANALYSIS
====================
如果出现的事态紧急,这时,我们可以 观察下 出现 blocker 的那个 sid 的语句,如果有必要,就kill 掉
参考文档:
Metalink:ID 215858.1
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/14730395/viewspace-683845/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/14730395/viewspace-683845/