Author: rainnyzhong
Date:2010-1-15
1. 症状描述:
FALB12从EXCEL IMPORT DATA到DB,预计事务会运行1个多小时,在开始操作后40分钟左右,ORACLE挂死,任何用户都不可以再登陆了。
2. 分析
(1) 下面是挂死时OS的资源状况:
09:37:54 up 73 days, 23:31, 1 user, load average: 0.11, 0.24, 0.18
194 processes: 191 sleeping, 3 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 6.7% 0.0% 0.3% 0.3% 0.0% 1.9% 90.4%
Mem: 2186616k av, 2168780k used, 17836k free, 0k shrd, 154716k buff
1633584k actv, 498176k in_d, 64108k in_c
Swap: 2040244k av, 18624k used, 2021620k free 2283068k cached
从top命令的输出中,我们可以看出,MEMORY使用几乎已经达到100%。
(2) 下面是ORACLE报警日志的内容:
Thu Jan 14 16:22:44 2010
Errors in file /opt/oracle9i/admin/ORA16G/udump/ora16g_ora_29758.trc:
ORA-00604: error occurred at recursive SQL level 4
ORA-04021: timeout occurred while waiting to lock object
ORA-00604: error occurred at recursive SQL level 4
ORA-04021: timeout occurred while waiting to lock object
ORA-00604: error occurred at recursive SQL level 3
ORA-04021: timeout occurred while waiting to lock object
Thu Jan 14 16:26:06 2010
可以看出,在解析递归SQL时出问题了,在等待某种锁时延时了。
(3) 下面是跟踪文件ora16g_ora_29758.trc的部分内容
*** 2010-01-14 16:22:44.412
=====================
PARSE ERROR #8:len=56 dep=2 uid=0 oct=3 lid=0 tim=1233845082434255 err=604
select value$ from props$ where name = 'GLOBAL_DB_NAME'
*** 2010-01-14 16:22:44.412
ksedmp: internal or fatal error
ORA-00604: error occurred at recursive SQL level 4
ORA-04021: timeout occurred while waiting to lock object
ORA-00604: error occurred at recursive SQL level 4
ORA-04021: timeout occurred while waiting to lock object
ORA-00604: error occurred at recursive SQL level 3
ORA-04021: timeout occurred while waiting to lock object
Current SQL statement for this session:
select value$ from props$ where name = 'GLOBAL_DB_NAME'
……………
SO: 0x5e6ec29c, type: 4, owner: 0x5e61d48c, flag: INIT/-/-/0x00
(session) sid: 132 trans: (nil), creator: 0x5e61d48c, flag: (c0000041) USR/- BSY/-/-/-/-/-
DID: 0001-0015-00000004, short-term DID: 0000-0000-00000000
txn branch: (nil)
oct: 0, prv: 0, sql: (nil), psql: (nil), user: 86/FALB12
O/S info: user: , term: , ospid: 1234, machine: ecv09
program:
last wait for 'library cache pin' blocking sess=0x(nil) seq=7 wait_time=2929670 seconds since wait started=906
handle address=5e24edf4, pin address=5b0f5364, 100*mode+namespace=c8
Dumping Session Wait History
for 'library cache pin' count=1 wait_time=2929670
handle address=5e24edf4, pin address=5b0f5364, 100*mode+namespace=c8
for 'library cache pin' count=1 wait_time=2929670
handle address=5e24edf4, pin address=5b0f5364, 100*mode+namespace=c8
从跟踪文件可以看出,在解析ORACLE递归SQL时报错。何为recursive SQL?
当用户发布SQL语句时,ORACLE需要对SQL语句进行语法与语义检查,这时,ORACLE会发布一些内部的SQL语句,对这条SQL所访问的对象进行各种检查,如检查访问的表存不存在,用户有没有权限对该表进行访问等,这种ORACLE内部的SQL称之为递归SQL。递归SQL也是分LEVEL的。如,如果是由LEVEL 1递归SQL再引起的recursive SQL,则此条recursive SQL的LEVEL就是2,由此类推。
从跟踪文件我们可以看出,ORACLE在等待library cache pin。我们知道,ORACLE解析用户的SQL和PLSQL代码是在共享池(share pool)中进行的,库缓存是共享池的一个组件(library cache,简称LC),还有一个缓存叫做字典缓存(dictionary cache,简称DC),ORACLE每次解析SQL语句时,都要获取一块LC内存地址并把它锁住(独占,即把内存钉住,称之为pin),此例中,很显然,ORACLE没办法再钉住任何一块LC的内存地址了,所以一直在等。当新的用户登陆时,ORACLE要发布内部SQL对该用户进行权限验证等,这些递归SQL也需要获取LC内存来进行解析,由于LC已经没有任何空闲的地址了,所以,这些用户验证的SQL也没办法运行,所以导致新的用户没办法登录ORCLE(SYS用户也没办法登录),整个数据库出现挂死的现象。
3. 诊断
初步诊断为ORACLE内存严重不足。IMPORT DATA时由于事务很长,没办法释放资源(表锁,内存锁等),导致ORACLE挂死。
4. 建议
将IMPORT DATA放到一个比较大内存的ORACLE服务器上执行,且在做IMPORT DATA时,不要做太多其它的操作。
[@more@]来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/684234/viewspace-1030665/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/684234/viewspace-1030665/