一次hanganalyze分析实例
问题
在某项目组遇到一次xtclusteruser表被无数个会话占住,尝试把所有会话全部kill session后问题仍没有解决,锁仍然还在,而且kill session时执行时间异常的长(约10s左右)。由于应用中对该表的操作特别多,类似这样的操作会继续增多,如果不能找到最根源的占住资源的会话并把它kill的话,结果将是整个系统将无法正常使用。而且查资源堵塞情况也没有查到堵塞。
遇以上问题,好像我们常用的手段也就都使用过了,再强硬的手段就是重启数据库了(这样显得太不专业了)。
分析:
Hanganalyze工具可以查看数据库中各种资源的堵塞情况。
具体如下(前部分是现在在我机器上做的,后半部分是生产环境上生成的trc文件):
Microsoft Windows XP [版本
5.1.2600]
(C) 版权所有 1985-2001 Microsoft Corp.
C:\Documents and
Settings\Administrator>cd\
C:\>set ORACLE_SID=DEMO
C:\>SQLPLUS / AS SYSDBA
SQL*Plus: Release 10.2.0.1.0 - Production
on 星期日 2月
20 11:19:41 2011
Copyright (c) 1982, 2005,
Oracle. All rights
reserved.
连接到:
Oracle Database 10g Enterprise Edition Release
10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining
options
SQL> SHOW PARAMETER
INSTANCE
NAME
TYPE
VALUE
------------------------------------ -----------
------------------------------
active_instance_count
integer
cluster_database_instances
integer
1
instance_groups
string
instance_name
string
DEMO
instance_number
integer
0
instance_type
string
RDBMS
open_links_per_instance
integer
4
parallel_instance_group
string
parallel_server_instances
integer
1
SQL> ORADEBUG HANGANALYZE
3
Hang Analysis in
d:\oracle\product\10.2.0\admin\DEMO\udump\DEMO_ora_3728.trc
SQL>
贴出当时的trc文件
/oracle/app/admin/jxcwgk/udump/jxcwgk1_ora_246610.trc
Oracle Database 10g Enterprise Edition Release
10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters,
OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME =
/oracle/app/product/10.2.0/db_1
System
name:
AIX
Node
name:
jxcwgka
Release:
3
Version:
5
Machine:
00C23B654C00
Instance name: jxcwgk1
Redo thread mounted by this instance: 1
Oracle process number: 144
Unix process pid: 246610, image: oracle@jxcwgka
(TNS V1-V3)
*** ACTION NAME:() 2011-01-19
15:48:29.328
*** MODULE NAME:(sqlplus@jxcwgka (TNS V1-V3))
2011-01-19 15:48:29.328
*** SERVICE NAME:(SYS$USERS) 2011-01-19
15:48:29.328
*** SESSION ID:(368.4554) 2011-01-19
15:48:29.328
*** 2011-01-19 15:48:29.328
==============
HANG ANALYSIS:
==============
Open chains found:
Other chains found:
Chain 1 :
:
<0/355/10547/0x8236848/205666/enq: TX - row lock
contention>
Chain 2 :
:
<0/359/43794/0xf2409e0/247480/db file sequential
read>
Chain 3 :
:
<0/367/4937/0xf2448e0/241678/enq: TX - row lock
contention>
Chain 4 :
:
<0/368/4554/0x925da60/246610/No
Wait>
Chain 5 :
:
<0/374/1014/0x925caa0/169098/enq: TX - row lock
contention>
Chain 6 :
:
<0/377/297/0x925c2c0/205966/enq: TX - row lock
contention>
Chain 7 :
:
<0/378/240/0x8237808/156828/enq: TX - row lock
contention>
Chain 8 :
:
<0/381/1834/0x8239788/234354/enq: TX - row lock
contention>
Chain 9 :
:
<0/384/992/0x925a340/209766/enq: TX - row lock
contention>
Chain 10 :
:
<0/385/565/0x8237fe8/230270/enq: TX - row lock
contention>
Chain 11 :
:
<0/386/48/0xa259078/173146/enq: TX - row lock
contention>
Chain 12 :
:
<0/387/80/0x925bae0/205494/enq: TX - row lock
contention>
Chain 13 :
:
<0/389/22/0x8236068/292316/enq: TX - row lock
contention>
Chain 14 :
:
<0/390/148/0xf242180/239136/enq: TX - row lock
contention>
Chain 15 :
:
<0/392/725/0x82387c8/258282/enq: TX - row lock
contention>
…未完!
解决:
从上面的信息中看得比较清楚了,alter system kill session
‘355,10547’;后系统恢复正常。
说明:
LEVEL
1-2 Only HANGANALYZE output, no process dump at all(只有hanganalyeze输出,不dump任何进程)
3 Level 2 + Dump only processes thought to be in a hang (IN_HANG
state)(DUMP IN_HANGE状态的进程)
4 Level 3 + Dump leaf nodes (blockers) in wait chains
(LEAF,LEAF_NW,IGN_DMP state)
5 Level 4 + Dump all processes involved in wait chains (NLEAF
state)
10 Dump all processes (IGN state)
提供人:张允君