中午11时40分左右,QDCJ(132.113.64.68)数据库所有连接断开。远程登录主机后发现Instance Crash。立即进入SQLPLUS手动重启,并检查Alert.log。发现类似下面的错误(截取为之后一次宕机信息):
Thu Aug 25 12:11:11 2005
Errors in file d:/oracle/admin/xdcj/udump/xdcj_ora_3256.trc:
ORA-00600: 内部错误代码,参数: [1433], [60], [], [], [], [], [], []
Thu Aug 25 12:12:17 2005
Errors in file d:/oracle/admin/xdcj/bdump/xdcj_dbw0_2656.trc:
ORA-00600: internal error code, arguments: [1433], [60], [], [], [], [], [], []
Thu Aug 25 12:12:17 2005
DBW0: terminating instance due to error 600
Thu Aug 25 12:12:21 2005
Errors in file d:/oracle/admin/xdcj/bdump/xdcj_pmon_3744.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], []
Thu Aug 25 12:12:23 2005
Errors in file d:/oracle/admin/xdcj/udump/xdcj_ora_3388.trc:
ORA-00600: 内部错误代码,参数: [1433], [60], [], [], [], [], [], []
未及退出远程桌面,又发现数据库Instance Crash及如上的错误信息,重启无效。
Ora-600错误一般与自身的Bug、内部特性有关,登陆Metalink搜索后发现相关的提示如下:
<quote>
文档 ID: 注释:138413.1
主题: ORA-600 [1433]
类型: REFERENCE
状态: PUBLISHED
内容类型: TEXT/X-HTML
创建日期: 02-APR-2001
上次修订日期: 23-JUN-2005
Note: For additional ORA-600 related information please read Note 146580.1
PURPOSE:
This article represents a partially published OERI note.
It has been published because the ORA-600 error has been
reported in at least one confirmed bug.
Therefore, the SUGGESTIONS section of this article may help
in terms of identifying the cause of the error.
This specific ORA-600 error may be considered for full publication
at a later date. If/when fully published, additional information
will be available here on the nature of this error.
SUGGESTIONS:
If the Known Issues section below does not help in terms of identifying
a solution, please submit the trace files and alert.log to Oracle
Support Services for further analysis.
Known Issues:
Bug# 2108072 See Note 2108072.8
OERI:1433 if LGWR archival is ASYNC and network congested
Fixed: 9.2.0.2, 10.1.0.2
Historic Issues:
Bug 404949
ORA-600 [1433] [60], INSTANCE CRASHES
Fixed: 8.0
Workaround: Increase _messages init.ora parameter to a large value.
</quote>
同时检查Oracle支持文档:
<quote>
文档 ID: 注释:2108072.8
主题: Support Description of Bug 2108072
类型: PATCH
状态: PUBLISHED
内容类型: TEXT/X-HTML
创建日期: 13-AUG-2003
上次修订日期: 17-NOV-2003
Click here for details of sections in this note.
Bug 2108072 OERI:1433 if LGWR archival is ASYNC and network congested
This note gives a brief overview of bug 2108072.
Affects:
Product (Component) Oracle Server (Rdbms)
Range of versions believed to be affected Versions >= 9.0 but < 10G
Versions confirmed as being affected
* 9.0.1.4
* 9.2.0.1
Platforms affected Generic (all / most platforms affected)
Fixed:
This issue is fixed in
* 9.0.1.5 iAS Patch Set
* 9.2.0.2 (Server Patch Set)
* 10g Production Base Release
Symptoms:
* Instance may crash
* Internal Error may occur (ORA-600)
Related To:
* Physical Standby Database / Dataguard
Description
When LGWR archival is turned on in ASYNC mode then sometimes during
network congestion the instance crashes with ORA 600[1433].
Workaround:
Set '_message' to a large value and bounce the instance.
The full bug text (if published) can be seen at Bug 2108072
This link will not work for UNPUBLISHED bugs.
</quote>
从上面的两篇文档可以发现,主要原因Bug 2108072,9.0.1.4和9.2.0.1数据库在同时满足网络拥塞和LGWR为异步模式(现在大多数操作系统上都是支持异步模式,同步读写在近几 年的系统上还没有看到过)两个条件时会引发Ora-600错误,并可能Instance Crash。处理的方法是升版到9.2.0.2以上;其他的权宜之计是增加Oracle隐藏(Undocumented)的初始化参数 _messages。决定先采用后者。
继续搜索没有找到_messages的说明,只能使用如下语句在其他9i数据库获得相关资料(必须使用SYS用户):
SELECT ksppinm, ksppstvl, ksppdesc
FROM x$ksppi x, x$ksppcv y
WHERE x.indx = y.indx
AND translate(ksppinm,'_','#') like '#%'
AND ksppinm = '_messages';
找到:
_messages 1600 message queue resources - dependent on # processes & # buffers
看起来是指定消息队列的容量,缺省值是1600了,无法确定增加多少(从文档看Oracle是建议增加到解决故障为止),考虑翻番到3200。建立 pfile后增加相应的“*._messages=3200”并使用pfile启动后正常,运行上面语句后发现参数值已经修改为3200,加上一定负载观 察十多分钟正常。建立spfile,手工关闭数据库,再次使用spfile启动后故障解除。
Thu Aug 25 12:11:11 2005
Errors in file d:/oracle/admin/xdcj/udump/xdcj_ora_3256.trc:
ORA-00600: 内部错误代码,参数: [1433], [60], [], [], [], [], [], []
Thu Aug 25 12:12:17 2005
Errors in file d:/oracle/admin/xdcj/bdump/xdcj_dbw0_2656.trc:
ORA-00600: internal error code, arguments: [1433], [60], [], [], [], [], [], []
Thu Aug 25 12:12:17 2005
DBW0: terminating instance due to error 600
Thu Aug 25 12:12:21 2005
Errors in file d:/oracle/admin/xdcj/bdump/xdcj_pmon_3744.trc:
ORA-00600: internal error code, arguments: [], [], [], [], [], [], [], []
Thu Aug 25 12:12:23 2005
Errors in file d:/oracle/admin/xdcj/udump/xdcj_ora_3388.trc:
ORA-00600: 内部错误代码,参数: [1433], [60], [], [], [], [], [], []
未及退出远程桌面,又发现数据库Instance Crash及如上的错误信息,重启无效。
Ora-600错误一般与自身的Bug、内部特性有关,登陆Metalink搜索后发现相关的提示如下:
<quote>
文档 ID: 注释:138413.1
主题: ORA-600 [1433]
类型: REFERENCE
状态: PUBLISHED
内容类型: TEXT/X-HTML
创建日期: 02-APR-2001
上次修订日期: 23-JUN-2005
Note: For additional ORA-600 related information please read Note 146580.1
PURPOSE:
This article represents a partially published OERI note.
It has been published because the ORA-600 error has been
reported in at least one confirmed bug.
Therefore, the SUGGESTIONS section of this article may help
in terms of identifying the cause of the error.
This specific ORA-600 error may be considered for full publication
at a later date. If/when fully published, additional information
will be available here on the nature of this error.
SUGGESTIONS:
If the Known Issues section below does not help in terms of identifying
a solution, please submit the trace files and alert.log to Oracle
Support Services for further analysis.
Known Issues:
Bug# 2108072 See Note 2108072.8
OERI:1433 if LGWR archival is ASYNC and network congested
Fixed: 9.2.0.2, 10.1.0.2
Historic Issues:
Bug 404949
ORA-600 [1433] [60], INSTANCE CRASHES
Fixed: 8.0
Workaround: Increase _messages init.ora parameter to a large value.
</quote>
同时检查Oracle支持文档:
<quote>
文档 ID: 注释:2108072.8
主题: Support Description of Bug 2108072
类型: PATCH
状态: PUBLISHED
内容类型: TEXT/X-HTML
创建日期: 13-AUG-2003
上次修订日期: 17-NOV-2003
Click here for details of sections in this note.
Bug 2108072 OERI:1433 if LGWR archival is ASYNC and network congested
This note gives a brief overview of bug 2108072.
Affects:
Product (Component) Oracle Server (Rdbms)
Range of versions believed to be affected Versions >= 9.0 but < 10G
Versions confirmed as being affected
* 9.0.1.4
* 9.2.0.1
Platforms affected Generic (all / most platforms affected)
Fixed:
This issue is fixed in
* 9.0.1.5 iAS Patch Set
* 9.2.0.2 (Server Patch Set)
* 10g Production Base Release
Symptoms:
* Instance may crash
* Internal Error may occur (ORA-600)
Related To:
* Physical Standby Database / Dataguard
Description
When LGWR archival is turned on in ASYNC mode then sometimes during
network congestion the instance crashes with ORA 600[1433].
Workaround:
Set '_message' to a large value and bounce the instance.
The full bug text (if published) can be seen at Bug 2108072
This link will not work for UNPUBLISHED bugs.
</quote>
从上面的两篇文档可以发现,主要原因Bug 2108072,9.0.1.4和9.2.0.1数据库在同时满足网络拥塞和LGWR为异步模式(现在大多数操作系统上都是支持异步模式,同步读写在近几 年的系统上还没有看到过)两个条件时会引发Ora-600错误,并可能Instance Crash。处理的方法是升版到9.2.0.2以上;其他的权宜之计是增加Oracle隐藏(Undocumented)的初始化参数 _messages。决定先采用后者。
继续搜索没有找到_messages的说明,只能使用如下语句在其他9i数据库获得相关资料(必须使用SYS用户):
SELECT ksppinm, ksppstvl, ksppdesc
FROM x$ksppi x, x$ksppcv y
WHERE x.indx = y.indx
AND translate(ksppinm,'_','#') like '#%'
AND ksppinm = '_messages';
找到:
_messages 1600 message queue resources - dependent on # processes & # buffers
看起来是指定消息队列的容量,缺省值是1600了,无法确定增加多少(从文档看Oracle是建议增加到解决故障为止),考虑翻番到3200。建立 pfile后增加相应的“*._messages=3200”并使用pfile启动后正常,运行上面语句后发现参数值已经修改为3200,加上一定负载观 察十多分钟正常。建立spfile,手工关闭数据库,再次使用spfile启动后故障解除。