1 事件概述
9月9日,业务无法连接到RAC数据库。
1.1时间
时间为2013年09月9日
1.2地点
北京、现场&远程操作
1.3
1.4事件
接到用户通知,RAC数据库无法处理业务的连接。
2.分析过程
经过现场工程师紧急到达现场,收集现场日志,分析:发现alert日志有如下报错:
Sun Sep 8 16:31:35 2013
Process startup failed, error stack:
Sun Sep 8 16:31:35 2013
Errors in file /oracle/admin/drutt/bdump/drutt1_psp0_3820.trc:
ORA-27300: OS system dependent operation:fork failed with status: 12
ORA-27301: OS failure message: Not enough space
ORA-27302: failure occurred at: skgpspawn3
Sun Sep 8 16:31:35 2013
Process J005 died, see its trace file
Sun Sep 8 16:31:35 2013
kkjcre1p: unable to spawn jobq slave process
Sun Sep 8 16:31:35 2013
Errors in file /oracle/admin/drutt/bdump/drutt1_cjq0_3881.trc:
Mon Sep 9 01:40:34 2013
Process startup failed, error stack:
Mon Sep 9 01:40:34 2013
Errors in file /oracle/admin/drutt/bdump/drutt1_psp0_3820.trc:
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn5
Mon Sep 9 01:40:35 2013
Process J005 died, see its trace file
Mon Sep 9 01:40:35 2013
kkjcre1p: unable to spawn jobq slave process
Mon Sep 9 01:40:35 2013
Errors in file /oracle/admin/drutt/bdump/drutt1_cjq0_3881.trc:
3.问题定位
判断是由于系统无法分配新的内存空间处理会话连接,导致连接故障:
通过metalink(oracle官方)查询相关文档:
roubleshooting ORA-27300 ORA-27301 ORA-27302 errors (文档 ID 579365.1)
Ora-27300 OS system dependent operation:fork failed with status: 11 (文档 ID 392006.1)
Database Crashes With ORA-04030 ORA-07445 ORA-27300 ORA-27301 ORA-27302 (文档 ID 580552.1)
Skgpspawn Errors In Alert Log, New Connections to Database Fail (文档 ID 435787.1)
分析定位如下:
Status 11:AGAIN (status 11) : The system lacked the necessary resources to create another process, or the system-imposed limit on the total number of processes under execution system-wide or by a single user {CHILD_MAX} would be exceeded. EAGAIN corresponds to status 11.
Maximum number of PROCESSES allowed per user may be too low;
Status12:STATUS 12 - ENOMEM Not enough core / memory
During an exec or a break, the program asked for more memory than the one available by the system. This error also occurs when there are too many segmentation registers which are required for the arrangement of text data or stack segments.
Swap空间分配不足
4. 处理建议
1.查询系统分配参数nproc大小,根据Oracle的安装文档nproc的值至少为4096,而maxuprc的值为nproc*9/10,如果当前进程数量超过设置的值,则根据实际需求重新调整两个值。
2.swap当时分配不足,建议检查swap使用情况,注意系统性能情况,当前分配为8G大小,整个物理内存为16G。
3.不排除系统内存溢出bug导致资源分配问题。
4.如果再次出现此类问题,建议观察内存和swap使用情况,系统日志,建议重启服务器重新释放资源来解决。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/500314/viewspace-1063633/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/500314/viewspace-1063633/