oracle hang analyze,使用oradebug dump hanganalyze分析oracle hang系列一

背景

oracle db有可能因为诸多因系而hang,可以借助某些工具进行诊断分析,本文主要学习如何用不用oradebug dump hanganalyze 3进行分析产生的TRACE FILE.

以及所包含的内容结构。

结论

1,oradebug hanganalyze 3产生的TRACE FILE包括3部分内容

2,各部分内容如下:

第一部分

Chains most likely to have caused the hang:

[a] Chain 1 Signature: <='latch: cache buffers chains'

Chain 1 Signature Hash: 0xccebf225

[b] Chain 2 Signature: <='buffer busy waits'

Chain 2 Signature Hash: 0x9a13abed

[c] Chain 3 Signature: <='latch: cache buffers chains'

Chain 3 Signature Hash: 0xccebf225

第二部分(当然这里面又分为intersecting chains或者non-intersecting chains)

===============================================================================

Non-intersecting chains:

-------------------------------------------------------------------------------

Chain 1:

-------------------------------------------------------------------------------

Oracle session identified by:

{

instance: 1 (guowang.guowang)

os id: 10574

process id: 10, oracle@seconary (DBW0)

session id: 10

session serial #: 1

}

is waiting for 'latch: cache buffers chains' with wait info:

{

p1: 'address'=0xda4f8850

p2: 'number'=0x96

p3: 'tries'=0x0

time in wait: 0.012628 sec

timeout after: never

wait id: 50243

blocking: 0 sessions

current sql:

short stack: ksedsts()+461

wait history:

* time between current wait and wait #1: 0.000058 sec

1.       event: 'db file async I/O submit'

time waited: 0.328988 sec

wait id: 50242           p1: 'requests'=0x1

p2: 'interrupt'=0x0

p3: 'timeout'=0x0

* time between wait #1 and #2: 0.000076 sec

2.       event: 'db file async I/O submit'

time waited: 0.000748 sec

wait id: 50241           p1: 'requests'=0x1

p2: 'interrupt'=0x0

p3: 'timeout'=0x0

* time between wait #2 and #3: 0.000086 sec

3.       event: 'db file async I/O submit'

time waited: 0.000177 sec

wait id: 50240           p1: 'requests'=0x1

p2: 'interrupt'=0x0

p3: 'timeout'=0x0

}

and is blocked by

=> Oracle session identified by:

{

instance: 1 (guowang.guowang)

os id: 6664

process id: 91, oracle@seconary (J063)

session id: 32

session serial #: 9465

}

which is not in a wait:

{

last wait: 0.000000 sec ago

blocking: 6 sessions

current sql:

short stack:

}

Chain 1 Signature: <='latch: cache buffers chains'

Chain 1 Signature Hash: 0xccebf225

第三部分

State of LOCAL nodes

([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):

[9]/1/10/1/0xdcba9740/10574/NLEAF/[31]

[11]/1/12/1/0xdcba3a60/10578/SINGLE_NODE/

中间类似内容略

[346]/1/347/9522/0xdc7d75d0/6413/NLEAF/[118]

[348]/1/349/7696/0xdc7d18f0/6433/NLEAF/[204]

关于各列含义见下:

根据其每列含义

([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):

数据库实例编号         会话sid    会话serial#   会话的地址saddr   会话对应的操作系统ID          表明会话是否等待           如有值,表明是持锁会话的CHAIN编号,否则为空

[163]   /    1              /   164    /   10419    /   0xdc9ea7e0   /    4046              /           NLEAF         /          [301]

3,non-intersecting chains的特色就是所包含的会话的阻塞会话,不隶属于任何chain

而intersecting chains包含的会话的阻塞会话隶属于另一个chains中的持锁会话,也就是说chains包含在另一个chains中

4,chains中包括每个会话以及阻塞会话的详细信息,包括SID,PID,SPID以及当前运行的SQL

所调用堆栈;以及近期的历史等待会话列表

5,关于 ([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]) 中的state列的各值含义如下,暂不全面(仍需要进一步测试)

NLEAF表明是等待会话

SINGLE_NODE表明它是等待任何会话或资源

LEAF表明它是持锁会话,即它不等待任何会话或资源

LEAF_NEW它也是持锁会话

测试

1,数据库版本

SQL> select * from v$version where rownum=1;

BANNER

----------------------------------------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

2,数据库未有任何HANG时,产生hanganalyze

SQL> oradebug setmypid

Statement processed.

SQL> oradebug hanganalyze 3

Hang Analysis in /oracle/diag/rdbms/guowang/guowang/trace/guowang_ora_13018.trc

*** 2015-11-02 07:02:02.135

===============================================================================

HANG ANALYSIS:

instances (db_name.oracle_sid): guowang.guowang

oradebug_node_dump_level: 3 --trace level级别

analysis initiated by oradebug --标明hanganalyze由oradebug生成

===============================================================================

Chains most likely to have caused the hang:--导致HANG的原因

===============================================================================

No chains found.

===============================================================================

Extra information that will be dumped at higher levels:

State of ALL nodes  --全部节点的状态(我估计适用于RAC)

([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]): --每个列的含义

*** 2015-11-02 07:02:02.163

===============================================================================

END OF HANG ANALYSIS

===============================================================================

*** 2015-11-02 07:02:02.164

===============================================================================

HANG ANALYSIS DUMPS:

oradebug_node_dump_level: 3

===============================================================================

State of LOCAL nodes  --本地节点的状态

([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):--同state of all nodes

No processes qualify for dumping.

===============================================================================

HANG ANALYSIS DUMPS: END

===============================================================================

4,手工模拟产生锁等待,产生hanganalyze

测试会话1

SQL> select sid,serial# from v$session where sid=(select sid from v$mystat where rownum=1);

SID    SERIAL#

---------- ----------

302       9226

SQL> select pid,spid from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));

PID SPID

---------- ------------------------------------------------

165 3812

SQL> create table t_hang(a int,b int);

Table created.

SQL> insert into t_hang values(1,1);

1 row created.

SQL> commit;

Commit complete.

测试会话2

SQL> select sid,serial# from v$session where sid=(select sid from v$mystat where rownum=1);

SID    SERIAL#

---------- ----------

164      10419

SQL> select pid,spid from v$process where addr=(select paddr from v$session where sid=(select sid from v$mystat where rownum=1));

PID SPID

---------- ------------------------------------------------

184 4046

hang住

SQL> update t_hang set a=2 where a=1;

SQL> oradebug setmypid

Statement processed.

SQL> oradebug hanganalyze 3

Hang Analysis in /oracle/diag/rdbms/guowang/guowang/trace/guowang_ora_6813.trc

*** 2015-11-02 07:16:19.252

===============================================================================

HANG ANALYSIS:

instances (db_name.oracle_sid): guowang.guowang

oradebug_node_dump_level: 3

analysis initiated by oradebug

===============================================================================

Chains most likely to have caused the hang:

[a] Chain 1 Signature: <='latch: cache buffers chains'

Chain 1 Signature Hash: 0xccebf225

[b] Chain 2 Signature: <='buffer busy waits'

Chain 2 Signature Hash: 0x9a13abed

[c] Chain 3 Signature: <='latch: cache buffers chains'

Chain 3 Signature Hash: 0xccebf225

--经在TRACE FILE过滤等待会话的sid,即164,以后可以直接查session id:164或者对应的进程号process id:184,或者对应的操作系统进程os id:4096

发现它是在一个chain中,这个chain中清晰标明了,等待会话到底在等待什么资源,以及是哪个会话阻塞它,并且会列出阻塞会话的详细 信息

Chain 10:

-------------------------------------------------------------------------------

Oracle session identified by:

{

instance: 1 (guowang.guowang)

os id: 4046

process id: 184, oracle@seconary (TNS V1-V3)

session id: 164

session serial #: 10419

}

is waiting for 'enq: TX - row lock contention' with wait info:

{

p1: 'name|mode'=0x54580006

p2: 'usn<<16 | slot'=0xca001d

p3: 'sequence'=0x2842

time in wait: 1 min 23 sec

timeout after: never

wait id: 33

blocking: 0 sessions

current sql: update t_hang set a=2 where a=1  -会列出当前等待会话正在执行的SQL以及调用堆栈

short stack: ksedsts()+461

wait history:  --等待会话近期的等待事件历史列表,且基于时间反向排序

* time between current wait and wait #1: 0.043281 sec

1.       event: 'Disk file operations I/O'

time waited: 0.000090 sec

wait id: 32              p1: 'FileOperation'=0x2

p2: 'fileno'=0x4

p3: 'filetype'=0x2

* time between wait #1 and #2: 0.000764 sec

2.       event: 'SQL*Net message from client'

time waited: 1 min 0 sec

wait id: 31              p1: 'driver id'=0x62657100

p2: '#bytes'=0x1

* time between wait #2 and #3: 0.000007 sec

3.       event: 'SQL*Net message to client'

time waited: 0.000003 sec

wait id: 30              p1: 'driver id'=0x62657100

p2: '#bytes'=0x1

}

and is blocked by  --标明被哪个会话所阻塞

=> Oracle session identified by:

{

instance: 1 (guowang.guowang)

os id: 3812

process id: 165, oracle@seconary (TNS V1-V3)

session id: 302

session serial #: 9226

}

which is waiting for 'SQL*Net message from client' with wait info:  -同上理,也会列出持锁会话近期的历史等待事件列

{

p1: 'driver id'=0x62657100

p2: '#bytes'=0x1

time in wait: 2 min 49 sec

timeout after: never

wait id: 42

blocking: 1 session

current sql:

short stack: ksedsts()+461

wait history:

* time between current wait and wait #1: 0.000015 sec

1.       event: 'SQL*Net message to client'

time waited: 0.000005 sec

wait id: 41              p1: 'driver id'=0x62657100

p2: '#bytes'=0x1

* time between wait #1 and #2: 0.338132 sec

2.       event: 'SQL*Net message from client'

time waited: 24.359100 sec

wait id: 40              p1: 'driver id'=0x62657100

p2: '#bytes'=0x1

* time between wait #2 and #3: 0.000013 sec

3.       event: 'SQL*Net message to client'

time waited: 0.000005 sec

wait id: 39              p1: 'driver id'=0x62657100

p2: '#bytes'=0x1

}

经后查,hanganalyze会列出所有ORACLE进程的信息

并且chain又分为intersecting chains和non-intersecting chains

我们来分析下intersecting chains和non-intersecting chains的区别是什么,先看

non-intersecting chains

经过对比分析,non-intersecting chains的特色就是所包含的会话的阻塞会话,不隶属于任何chain

Chain 2:

-------------------------------------------------------------------------------

Oracle session identified by:

{

instance: 1 (guowang.guowang)

os id: 6728

process id: 146, oracle@seconary (J117)

session id: 29

session serial #: 10069

}

is waiting for 'buffer busy waits' with wait info:

{

p1: 'file#'=0x1

p2: 'block#'=0x15ab9

p3: 'class#'=0x1

time in wait: 0.012849 sec

timeout after: never

wait id: 31

blocking: 0 sessions

current sql:

short stack:

wait history:

* time between current wait and wait #1: 0.000028 sec

1.       event: 'enq: TX - index contention'

time waited: 0.016712 sec

wait id: 30              p1: 'name|mode'=0x54580004

p2: 'usn<<16 | slot'=0x12a0009

p3: 'sequence'=0x1953

* time between wait #1 and #2: 0.000066 sec

2.       event: 'buffer busy waits'

time waited: 0.552796 sec

wait id: 29              p1: 'file#'=0x1

p2: 'block#'=0x15ab9

p3: 'class#'=0x1

* time between wait #2 and #3: 0.000340 sec

3.       event: 'buffer busy waits'

time waited: 0.497199 sec

wait id: 28              p1: 'file#'=0x1

p2: 'block#'=0x225eb

p3: 'class#'=0x1

}

and is blocked by

=> Oracle session identified by:

{

instance: 1 (guowang.guowang)

os id: 6439  --也就是说持锁会话不隶属于chain

process id: 111, oracle@seconary (J083)

session id: 205

session serial #: 9696

再来看看intersecting chains

发现intersecting chains包含的会话的阻塞会话隶属于另一个chains中的持锁会话,也就是说chains包含在另一个chains中

-------------------------------------------------------------------------------

Chain 12:

-------------------------------------------------------------------------------

Oracle session identified by:

{

instance: 1 (guowang.guowang)

os id: 6612

process id: 59, oracle@seconary (J031)

session id: 33

session serial #: 11792

}

is waiting for 'buffer busy waits' with wait info:

{

p1: 'file#'=0x1

p2: 'block#'=0x15ab9

p3: 'class#'=0x1

time in wait: 0.007726 sec

timeout after: never

wait id: 0

blocking: 0 sessions

current sql:

short stack:

}

and is blocked by 'instance: 1, os id: 6439, session id: 205',

which is a member of 'Chain 2'.

Chain 12 Signature: <='buffer busy waits'

Chain 12 Signature Hash: 0x9a13abed

最后学习下state of all nodes(注:如果是单实例数据库,其state of all nodes和state of local nodes内容相同)

先摘录一节内容

State of ALL nodes

([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):

[9]/1/10/1/0xdcba9740/10574/NLEAF/[31]

[11]/1/12/1/0xdcba3a60/10578/SINGLE_NODE/

[28]/1/29/10069/0xdcb724f0/6728/NLEAF/[204]

[31]/1/32/9465/0xdcb699a0/6664/LEAF_NW/

[32]/1/33/11792/0xdcb66b30/6612/NLEAF/[204]

[33]/1/34/3/0xdcb63cc0/10707/SINGLE_NODE/

[38]/1/39/9453/0xdcb55490/6754/NLEAF/[204]

其实每个列的含义都有注明,我们依次来分析,我采用的方法仍是基于我上述构建的测试2个会话即持锁会话以及等待锁会话,在TRACE FILE查等待会话164,匹配结果如下:

[163]/1/164/10419/0xdc9ea7e0/4046/NLEAF/[301]

根据其每列含义

([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):

我们来分解下:

SQL> select saddr from v$session where sid=164;

SADDR

----------------

00000000DC9EA7E0

数据库实例编号         会话sid    会话serial#   会话的地址saddr   会话对应的操作系统ID          表明会话是否等待           如有值,表明是持锁会话的CHAIN编号,否则为空

[163]   /    1              /   164    /   10419    /   0xdc9ea7e0   /    4046              /           NLEAF         /          [301]

但是163是什么含义呢,以及301是何义呢,会不会是chains的编号呢,还有NLEAF是何义,另急,我们试着分析下,我试着查301,定位到如下内容

[301]/1/302/9226/0xdc859f80/3812/LEAF/

经过对比分析,301就是持锁会话所在的chains的编号,所以可知NLEAF表明是等待锁的会话,而LEAF就是不等待锁的会话

还不不够,因为state列发现有几个不同的值:NLEAF,SINGLE_NODE,LEAF_NEW,继续分析

以chain 9为例

([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):

[9]/1/10/1/0xdcba9740/10574/NLEAF/[31]

从adjlist 31定位到chain 31

[31]/1/32/9465/0xdcb699a0/6664/LEAF_NW/

先看chain 9

Chain 1:

-------------------------------------------------------------------------------

Oracle session identified by:

{

instance: 1 (guowang.guowang)

os id: 10574

process id: 10, oracle@seconary (DBW0)

session id: 10

session serial #: 1

}

is waiting for 'latch: cache buffers chains' with wait info:

{

p1: 'address'=0xda4f8850

p2: 'number'=0x96

p3: 'tries'=0x0

time in wait: 0.012628 sec

timeout after: never

wait id: 50243

blocking: 0 sessions

current sql:

short stack: ksedsts()+461

wait history:

* time between current wait and wait #1: 0.000058 sec

1.       event: 'db file async I/O submit'

time waited: 0.328988 sec

wait id: 50242           p1: 'requests'=0x1

p2: 'interrupt'=0x0

p3: 'timeout'=0x0

* time between wait #1 and #2: 0.000076 sec

2.       event: 'db file async I/O submit'

time waited: 0.000748 sec

wait id: 50241           p1: 'requests'=0x1

p2: 'interrupt'=0x0

p3: 'timeout'=0x0

* time between wait #2 and #3: 0.000086 sec

3.       event: 'db file async I/O submit'

time waited: 0.000177 sec

wait id: 50240           p1: 'requests'=0x1

p2: 'interrupt'=0x0

p3: 'timeout'=0x0

}

and is blocked by

=> Oracle session identified by:

{

instance: 1 (guowang.guowang)

os id: 6664

process id: 91, oracle@seconary (J063)

session id: 32

session serial #: 9465

再看下SINGLE_NODE的含义是什么

[33]/1/34/3/0xdcb63cc0/10707/SINGLE_NODE/

可见SINGLE_NODE表明会话不被任何会话或进程所阻塞

Chain 146:

-------------------------------------------------------------------------------

Oracle session identified by:

{

instance: 1 (guowang.guowang)

os id: 10707

process id: 29, oracle@seconary (CJQ0)

session id: 34

session serial #: 3

}

is waiting for 'job scheduler coordinator slave wait' with wait info:

{

time in wait: 0.449655 sec

timeout after: 15.550345 sec

wait id: 81976

blocking: 0 sessions

current sql:

short stack: ksedsts()+461

wait history:

* time between current wait and wait #1: 0.000108 sec

1.       event: 'rdbms ipc message'

time waited: 2.064533 sec

wait id: 81975           p1: 'timeout'=0xc8

* time between wait #1 and #2: 0.006081 sec

2.       event: 'rdbms ipc message'

time waited: 0.501429 sec

wait id: 81974           p1: 'timeout'=0x32

* time between wait #2 and #3: 0.000706 sec

3.       event: 'job scheduler coordinator slave wait'

time waited: 16.003771 sec

wait id: 81973

}

Chain 146 Signature: 'job scheduler coordinator slave wait'

Chain 146 Signature Hash: 0x8c8a9c97

总结:NLEAF表明是等待会话

SINGLE_NODE表明它是等待任何会话或资源

LEAF表明它是持锁会话,即它不等待任何会话或资源

LEAF_NEW它也是持锁会话

LEAF和LEAF_NEW的区别是什么?我们将在下文进行分析与测试。

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/9240380/viewspace-1823479/,如需转载,请注明出处,否则将追究法律责任。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值