一网友数据库alert报如下错误信息:
Timed out trying to start process J000.
Wed Jul 30 16:16:05 2008
Timed out trying to start process J000.
Wed Jul 30 16:19:14 2008
Errors in file /home/oracle/admin/rac/bdump/rac4_pmon_22083.trc:
ORA-00600: internal error code, arguments: [OSDEP_INTERNAL], [], [], [], [], [], [], []
ORA-27302: failure occurred at: skgslfr
Wed Jul 30 16:19:17 2008
Errors in file /home/oracle/admin/rac/bdump/rac4_pmon_22083.trc:
ORA-00600: internal error code, arguments: [513], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [OSDEP_INTERNAL], [], [], [], [], [], [], []
ORA-27302: failure occurred at: skgslfr
Wed Jul 30 16:19:17 2008
Errors in file /home/oracle/admin/rac/bdump/rac4_pmon_22083.trc:
ORA-00600: internal error code, arguments: [513], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [OSDEP_INTERNAL], [], [], [], [], [], [], []
ORA-27302: failure occurred at: skgslfr
Wed Jul 30 16:19:17 2008
PMON: terminating instance due to error 600
从alert看,是J000启动失败,遭遇600,pmon进程终止了实例。
读完trace文件,发现超过100个的死进程,在trace文件末尾发现写sq enqueue的信息。
询问,原来开始实例hang住了,登录不进去,dba也进不去。于是手动kill掉了许多进程,,结果实例down了,重启ok了,该问题出现不止一次。hang时alert中记录回滚段疯狂增长,session也增加很多。环境是linux+9204,应该是4节点rac环境。
于是乎,觉得600是因为进程被kill掉导致的,没处理的必要了,转到实例为什么hang住了,已经hang时表现的一些异常现象。
问题就先打住吧,要说的是原来ora-600并不是最可怕的,可怕的是造成它的根源。
从改问题的第一经手人,从改问题的第二经手人角度,都有些要思考的问题啊,:-)
最终原因不能肯定,已知的一个因素是audses$ 的cachesize,相关资料也记录下吧:
PURPOSE
-------
Prevent hangs in RAC due to high login rate and low cache setting of AUDSES$ sequence.
Default of 20 is not adequate in many cases.
PROBLEM:
--------
The default setting for the SYS.AUDSES$ sequence is 20, this is too low for a RAC system
where logins can occur concurrently from multiple nodes. During high login rate
such low value can cause slowness and even hangs. Some of the symptoms are:
- Checkpoint not completing on all RAC nodes
- Waits expire on row cache enqueue lock dc_sequences
- RAC hangs due to QMON deadlocking
During those hangs session login is not possible (or *extremely* slow) due
to extremely high contention on the above sequence.
SOLUTION:
---------
Manually increase that sequence cache on each affected database:
alter sequence sys.audses$ cache 10000;
This is fixed in 10.2.0.3 patchset. (Affected releases 9i to 10.2.0.2)
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/7591490/viewspace-1008198/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/7591490/viewspace-1008198/