遇到个oracle 10.2.0.5的bug把oracle rac都hang住了,还好是周末发生的,不然又是安全事故了。Bug号:9949948
日志有如下记录:
节点1:
Sat Nov 20 13:19:45 CST 2010
Errors in file /u01/app/oracle/admin/qhportal/bdump/qhportal1_p038_8366.trc:
Sat Nov 20 14:18:44 CST 2010
IPC Send timeout detected. Receiver ospid 28185
Sat Nov 20 14:18:55 CST 2010
Errors in file /u01/app/oracle/admin/qhportal/bdump/qhportal1_p000_28185.trc:
Sat Nov 20 14:58:33 CST 2010
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=505
System State dumped to trace file /u01/app/oracle/admin/qhportal/udump/qhportal1_ora_14836.trc
Sat Nov 20 14:59:10 CST 2010
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=532
Sat Nov 20 15:12:04 CST 2010
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=407
System State dumped to trace file /u01/app/oracle/admin/qhportal/udump/qhportal1_ora_5168.trc
节点2:
Sat Nov 20 14:19:01 CST 2010
IPC Send timeout detected.Sender: ospid 6270
Receiver: inst 1 binc 20 ospid 28185
Sat Nov 20 14:19:01 CST 2010
IPC Send timeout detected.Sender: ospid 22968
Receiver: inst 1 binc 20 ospid 28185
Sat Nov 20 14:58:50 CST 2010
>>> WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=19
System State dumped to trace file /u01/app/oracle/admin/qhportal/bdump/qhportal2_mmon_1487.trc
查看跟踪文件,有大量的如下信息:
WARNING:Could not increase the asynch I/O limit to 1412 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 1442 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 1472 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 1502 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 1532 for SQL direct I/O. It is set to 128
WARNING:Could not increase the asynch I/O limit to 1562 for SQL direct I/O. It is set to 128
查找metalink,发现可能是10.2.0.5的bug,
解决办法是增大系统aio的限制值,或者禁用异步io
1.增大限制值:
2.禁用异步功能:
设置Oracle参数DISK_ASYNC_IO=FALSE
metalink原文如下:
Bug 9949948 - Linux: Process spin under ksfdrwat0 if OS Async IO not configured high enough [ID 9949948.8] | |||||
| |||||
修改时间 04-AUG-2010 类型 PATCH 状态 PUBLISHED |
This note gives a brief overview bug 9949948.
The content was last updated on: 03-AUG-2010
Click here for details of each of the sections below.
Affects:
Product (Component) Oracle Server (Rdbms) Range of versions believed to be affected Versions >= 10.2.0.5 but BELOW 11.1 Versions confirmed as being affected
- 10.2.0.5
Platforms affected
- Linux X86-64bit
- Linux 32bit
It is believed to be a regression in default behaviour thus:
Regression introduced in 10.2.0.5
Fixed:
This issue is fixed in
- 11.1.0.6 (Base Release)
Symptoms: | Related To: |
|
|
Description
This problem is introduced in 10.2.0.5 . It only affects platforms where Oracle has to reserve async IO slots, such as Linux platforms. If the OS async IO layer is underconfigured and an Oracle process cannot get sufficient AIO slots then rather than reverting to using non AIO call the process may go into an infinite spin under ksfdrwat0. Rediscovery notes: The spin will be preceded by messages in the trace file of the form. WARNING:io_submit failed due to kernel limitations MAXAIO for process=0 pending aio=0 WARNING:asynch I/O kernel limits is set at AIO-MAX-NR=65536 AIO-NR=65518 WARNING:1 Oracle process running out of OS kernelI/O resources aiolimit=0 Notice specifically that the value for aiolimit is reported as "0" for this bug. The process then spins in ksfdrwat0 typically with a stack showing skgfqio () ksfdgo () ksfdwtio () ksfdwat1 () ksfdrwat0 () <<< Spin point ksfdblock () kcflwi () kcflci () kcblci () kcblcio () kcblgt () kcbldrget () It will show repeated waits for "i/o slave wait", which can be misleading as that is normally considered an idle wait event. Workaround Raise the OS AIO limits such that the number of concurrent slot requirements never exceeds the OS limit. ie: Increase AIO-MAX-NR OR Disable async IO (Set DISK_ASYNC_IO=FALSE)
Please note: The above is a summary description only. Actual symptoms can vary. Matching to any symptoms here does not confirm that you are encountering this problem. Always consult with Oracle Support for advice. |
References
Bug:9949948 (This link will only work for PUBLISHED bugs)
Note:245840.1 Information on the sections in this article
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/23135684/viewspace-678596/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/23135684/viewspace-678596/