今天一台生产库挂死,最后只有通过重启数据库解决了问题。
数据库日志报如下错误
---------------------
Tue Jan 4 09:02:46 2011
skgpspawn failed:category = 27143, depinfo = 23, p = pipe, loc = skgpspawn2
skgpspawn failed:category = 27143, depinfo = 23, p = pipe, loc = skgpspawn2
skgpspawn failed:category = 27143, depinfo = 23, p = pipe, loc = skgpspawn2
skgpspawn failed:category = 27143, depinfo = 23, p = pipe, loc = skgpspawn2
skgpspawn failed:category = 27143, depinfo = 23, p = pipe, loc = skgpspawn2
skgpspawn failed:category = 27143, depinfo = 23, p = pipe, loc = skgpspawn2
----------------------
操作系统日志报如下错误
----------------------
Jan 4 09:07:59 cqrm1-1 cmcld: Unable to copy file to cqrm1-1: I/O error
Jan 4 09:07:59 cqrm1-1 cmcld: Failed to dump Flight Recorder log buffer.
Jan 4 09:08:13 cqrm1-1 prngd[2938]: pipe() failed: File table overflow
Jan 4 09:09:10 cqrm1-1 cmclconfd[7615]: Unable to lookup cluster information in CDB: File table overflow
Jan 4 09:09:10 cqrm1-1 cmclconfd[7615]: cl_msg_tcp_send: Invalid argument
Jan 4 09:09:04 cqrm1-1 prngd[2938]: pipe() failed: File table overflow
Jan 4 09:09:10 cqrm1-1 inetd[1062]: hacl-cfg/udp: Exit status 1
Jan 4 09:09:55 cqrm1-1 prngd[2938]: pipe() failed: File table overflow
此时在系统上执行任何命令都会报错,在准备进行关电源重启主机的时候,命令可以用了,通过sar -v 5 10发现file table已经达到了30000多,已经满了,从系统日志也可以发现这个问题。
由于是生产库,所以来不及慢慢分析原因,赶快趁系统能动的时候敲下了停包命令,然后杀LOCAL=NO的进程,重启了数据库,恢复正常。
数据库启动后,发现io等待异常的大,经检查,发现存储的电池坏了一个,导致一个控制器不能使用,而oracle就由于性能严重下降,导致session数增加,最终导致了file table的溢出。
此问题最终通过更换磁阵的电池而解决。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/11088128/viewspace-683221/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/11088128/viewspace-683221/