ORA-600(KCLEXPAND_5)错误

今天碰到一个比较严重的bug,导致了RAC环境的一个实例重新启动。

 

 

alert文件中,错误信息如下:

Wed Oct 21 11:15:19 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade2_lmon_16650.trc:
ORA-00600: internal error code, arguments: [kclexpand_5], [], [], [], [], [], [], []
Wed Oct 21 11:15:19 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade2_lmon_16650.trc:
ORA-00600: internal error code, arguments: [kclexpand_5], [], [], [], [], [], [], []
Wed Oct 21 11:15:19 2009
LMON: terminating instance due to error 481
Wed Oct 21 11:15:20 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade2_lms0_16654.trc:
ORA-00481: LMON process terminated with error
Wed Oct 21 11:15:20 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade2_lms1_16658.trc:
ORA-00481: LMON process terminated with error
Wed Oct 21 11:15:20 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade2_mman_16662.trc:
ORA-00481: LMON process terminated with error
Wed Oct 21 11:15:20 2009
Trace dumping is performing id=[cdmp_20091021111520]
Wed Oct 21 11:15:20 2009
Errors in file /data/oracle/admin/newtrade/bdump/newtrade2_lmd0_16652.trc:
ORA-00481: LMON process terminated with error
Wed Oct 21 11:15:21 2009
Shutting down instance (abort)
License high water mark = 129
Wed Oct 21 11:15:24 2009
Instance terminated by LMON, pid = 16650
Wed Oct 21 11:15:26 2009
Instance terminated by USER, pid = 6371
Wed Oct 21 11:15:44 2009
Starting ORACLE instance (normal)
Wed Oct 21 11:16:04 2009
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 ce1 10.0.1.0 configured from OCR for use as a cluster interconnect
Interface type 1 ce0 172.0.2.0 configured from OCR for use as  a public interface
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.3.0.
System parameters with non-default values:
  processes                = 500
  sessions                 = 555
  __shared_pool_size       = 7482638336
  __large_pool_size        = 16777216
  __java_pool_size         = 16777216
  __streams_pool_size      = 0
  spfile                   = /dev/vx/rdsk/datavg/newtrade_spfile
  backup_tape_io_slaves    = FALSE
  sga_target               = 20971520000
  control_files            = /dev/vx/rdsk/datavg/newtrade_control1, /dev/vx/rdsk/datavg/newtrade_control2, /dev/vx/rdsk/datavg/newtrade_control3
  control_file_record_keep_time= 30
  db_block_size            = 16384
  __db_cache_size          = 13438550016
  compatible               = 10.2.0.3.0
  log_archive_dest_1       = LOCATION=/data1/oracle/oradata/newtrade/archivelog
  log_archive_dest_2       = service=newtrade1
  log_archive_local_first  = FALSE
  log_archive_format       = %t_%s_%r.dbf
  db_file_multiblock_read_count= 16
  cluster_database         = TRUE
  cluster_database_instances= 2
  thread                   = 2
  instance_number          = 2
  undo_management          = AUTO
  undo_tablespace          = UNDOTBS2
  undo_retention           = 10800
  remote_login_passwordfile= EXCLUSIVE
  db_domain                = us.oracle.com
  dispatchers              = (PROTOCOL=TCP) (SERVICE=newtradeXDB)
  local_listener           = (ADDRESS = (PROTOCOL = TCP)(HOST = 172.0.2.59)(PORT = 1521))
  remote_listener          = LISTENERS_NEWTRADE
  job_queue_processes      = 10
  background_dump_dest     = /data/oracle/admin/newtrade/bdump
  user_dump_dest           = /data/oracle/admin/newtrade/udump
  core_dump_dest           = /data/oracle/admin/newtrade/cdump
  audit_file_dest          = /data/oracle/admin/newtrade/adump
  db_name                  = newtrade
  open_cursors             = 5000
  pga_aggregate_target     = 4294967296
Cluster communication is configured to use the following interface(s) for this instance
  10.0.1.4
Wed Oct 21 11:16:05 2009
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
PMON started with pid=3, OS id=7660
DIAG started with pid=5, OS id=7662
PSP0 started with pid=7, OS id=7664
LMON started with pid=9, OS id=7666
LMD0 started with pid=2, OS id=7668
LMS0 started with pid=4, OS id=7670
LMS1 started with pid=11, OS id=7674
MMAN started with pid=13, OS id=7678
DBW0 started with pid=6, OS id=7680
DBW1 started with pid=15, OS id=7693
LGWR started with pid=17, OS id=7713
CKPT started with pid=19, OS id=7715
SMON started with pid=21, OS id=7717
RECO started with pid=23, OS id=7719
CJQ0 started with pid=8, OS id=7721
MMON started with pid=25, OS id=7723
Wed Oct 21 11:16:06 2009
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=27, OS id=7725
Wed Oct 21 11:16:06 2009
starting up 1 shared server(s) ...
Wed Oct 21 11:16:06 2009
lmon registered with NM - instance id 2 (internal mem no 1)
Wed Oct 21 11:16:10 2009
Reconfiguration started (old inc 0, new inc 34)
List of nodes:
 0 1
 Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
 Communication channels reestablished
 * domain 0 valid = 1 according to instance 0
Wed Oct 21 11:16:10 2009
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Wed Oct 21 11:16:10 2009
 LMS 1: 0 GCS shadows cancelled, 0 closed
Wed Oct 21 11:16:10 2009
 LMS 0: 0 GCS shadows cancelled, 0 closed
 Set master node info
 Submitted all remote-enqueue requests
 Dwn-cvts replayed, VALBLKs dubious
 All grantable enqueues granted
Wed Oct 21 11:16:11 2009
 LMS 0: 0 GCS shadows traversed, 0 replayed
Wed Oct 21 11:16:11 2009
 LMS 1: 0 GCS shadows traversed, 0 replayed
Wed Oct 21 11:16:11 2009
 Submitted all GCS remote-cache requests
 Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=14, OS id=7992
Wed Oct 21 11:16:16 2009
ALTER DATABASE   MOUNT
Wed Oct 21 11:16:20 2009
Setting recovery target incarnation to 1
Wed Oct 21 11:16:20 2009
Successful mount of redo thread 2, with mount id 2817896731
Wed Oct 21 11:16:20 2009
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Completed: ALTER DATABASE   MOUNT
Wed Oct 21 11:16:21 2009
ALTER DATABASE OPEN
Picked broadcast on commit scheme to generate SCNs
Wed Oct 21 11:16:21 2009
LGWR: STARTING ARCH PROCESSES
ARC0 started with pid=16, OS id=8141
Wed Oct 21 11:16:21 2009
ARC0: Archival started
ARC1: Archival started
LGWR: STARTING ARCH PROCESSES COMPLETE
ARC1 started with pid=31, OS id=8143
Wed Oct 21 11:16:21 2009
Thread 2 opened at log sequence 5162
  Current log# 6 seq# 5162 mem# 0: /dev/vx/rdsk/datavg/newtrade_redo2_3_1_1g
  Current log# 6 seq# 5162 mem# 1: /dev/vx/rdsk/datavg/newtrade_redo2_3_2_1g
Successful open of redo thread 2
Wed Oct 21 11:16:21 2009
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Wed Oct 21 11:16:21 2009
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
Wed Oct 21 11:16:21 2009
ARC0: Becoming the heartbeat ARCH
Wed Oct 21 11:16:21 2009
SMON: enabling cache recovery
Wed Oct 21 11:16:26 2009
Successfully onlined Undo Tablespace 4.
Wed Oct 21 11:16:26 2009
SMON: enabling tx recovery
Wed Oct 21 11:16:26 2009
Database Characterset is ZHS16GBK
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
QMNC started with pid=18, OS id=8302
Wed Oct 21 11:16:28 2009
Completed: ALTER DATABASE OPEN

在对应的trace文件中包含下面的信息:

Dump event group for SESSION
Dump event group for SYSTEM
DYNAMICALLY MODIFIED PARAMETERS:
  control_file_record_keep_time= 30
  log_archive_dest_1       = LOCATION=/data1/oracle/oradata/newtrade/archivelog
  log_archive_dest_2       = service=newtrade1
  log_archive_local_first  = FALSE
  undo_retention           = 10800
  local_listener           = (ADDRESS = (PROTOCOL = TCP)(HOST = 172.0.2.59)(PORT = 1521))
  remote_listener          = LISTENERS_NEWTRADE
  open_cursors             = 5000
error 600 detected in background process
ORA-00600: internal error code, arguments: [kclexpand_5], [], [], [], [], [], [], []
ksuitm: waiting for [5] seconds before killing DIAG

由于ORA-600错误出现后,Oracle开始结束后台的进程,且上面的trace文件中包含了ksuitm: waiting for [5] seconds before killing DIAG的信息,不难推断出是由于ORA-600kclexpand_5)错误的出现,才导致了当前节点上实例的重启。

查询METALINK,果然是Oraclebug,文档描述为:Doc ID:  6658484.8。在RAC节点间获取主站点信息时出现的问题,导致了其中一个实例被重新启动。这个bug影响10.2.0.310.2.0.4版本。在10.2.0.511.1.0.7中,这个问题被解决。

引发错误的具体还不清楚,因此暂时不好确定这个问题出现的几率。不过这个bug会导致一个实例重启,应该说对数据库环境的影响还是很大的。

Oracle虽然提供了单独的补丁,但是提供的平台十分有限,以SOLARIS SPARC 64平台为例,如果要打这个补丁,数据库的版本必须是10.2.0.4

 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/4227/viewspace-617117/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/4227/viewspace-617117/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值