ora-00600 [keltnfy-ldmInit] [46] [1]疑惑再现和ora-01242

最新推荐文章于 2022-03-29 22:46:52 发布

cuiheji0772

最新推荐文章于 2022-03-29 22:46:52 发布

阅读量213

点赞数

Alter日志中出现如下错误，而后数据库就奔溃了。

Thu Aug 16 20:21:25 2012

Detected change in CPU count to 8

Thu Aug 16 20:23:29 2012

Process J000 died, see its trace file

Thu Aug 16 20:23:29 2012

kkjcre1p: unable to spawn jobq slave process

Thu Aug 16 20:23:29 2012

Errors in file /db/oracle10g/admin/benguo/bdump/benguo_cjq0_3249.trc:

Thu Aug 16 20:25:25 2012

Detected change in CPU count to 8

Thu Aug 16 12:25:55 2012

Errors in file /db/oracle10g/admin/benguo/udump/benguo_ora_31075.trc:

ORA-00600: Message 600 not found; No message file for product=RDBMS, facility=ORA; arguments: [keltnfy-ldmInit] [46] [1]

Thu Aug 16 20:28:25 2012

Determining CPU socket count failed!

Detected change in CPU count to 1

Thu Aug 16 20:29:15 2012

Process J000 died, see its trace file

Thu Aug 16 20:29:15 2012

kkjcre1p: unable to spawn jobq slave process

Thu Aug 16 20:29:15 2012

Errors in file /db/oracle10g/admin/benguo/bdump/benguo_cjq0_3249.trc:

Thu Aug 16 20:29:26 2012

Detected change in CPU count to 8

Thu Aug 16 20:29:50 2012

OER 7451 in Load Indicator : Error Code = Linux-x86_64 Error: 11086: Unknown system error

Thu Aug 16 20:30:26 2012

Determining CPU socket count failed!

Detected change in CPU count to 1

Thu Aug 16 20:31:26 2012

Detected change in CPU count to 8

Thu Aug 16 22:22:49 2012

Errors in file /db/oracle10g/admin/benguo/bdump/benguo_lgwr_3241.trc:

ORA-00471: DBWR process terminated with error

Instance terminated by DBW0, pid = 3239

根据ora-00600 [keltnfy-ldmInit] [46] [1]的错误，一般是由于主机名不一致导致数据库无法启动等原因，不过我查看数据库中的/etc/hosts文件和hostname主机名确实是一致的。查看相应的trace文件都发现文件不存在了，这确实令我非常的疑惑。

[oracle@server127 bdump]$ uptime

10:31:30 up 24 days, 23:33, 3 users, load average: 2.12, 2.29, 2.55

而系统也没有重启过，不过好在数据库也能正常的startup，不知是否Detected change in CPU count to 8

Determining CPU socket count failed等这类cpu信息有关。

不一会儿该服务器又再次奔溃

Fri Aug 17 10:46:02 2012

Process J000 died, see its trace file

Fri Aug 17 10:46:02 2012

kkjcre1p: unable to spawn jobq slave process

Fri Aug 17 10:46:02 2012

Errors in file /db/oracle10g/admin/benguo/bdump/benguo_cjq0_19052.trc:

Fri Aug 17 10:48:47 2012

Errors in file /db/oracle10g/admin/benguo/udump/benguo_ora_19187.trc:

ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode

Fri Aug 17 10:48:48 2012

Errors in file /db/oracle10g/admin/benguo/bdump/benguo_lgwr_19044.trc:

ORA-01242: data file suffered media failure: database in NOARCHIVELOG mode

Instance terminated by CKPT, pid = 19046

[oracle@server127 bdump]$ oerr ora 01242

01242, 00000, "data file suffered media failure: database in NOARCHIVELOG mode"

// *Cause: The database is in NOARCHIVELOG mode and a database file was

// detected as inaccessible due to media failure.

// *Action: Restore accessibility to the file mentioned in the error stack

// and restart the instance.

之前也遇到过由于磁盘坏道引起的ora-01242错误。

http://blog.itpub.net/post/43172/527958

metalink中给出的：

The File suffered media failure as before that there was some I/O error in writing to the datafile as seen in the alert.log. The root-cause is that the datafile was locked by an OS-tool making a filesystem backup, like Netbackup or ArcServ. The RDBMS could not open the datafile and failed accordingly .

The instance will crash in NOARCHIVELOG-mode, while in ARCHIVELOG-mode, the instance will remain running, but the datafile will be put OFFLINE and will require recovery.

Solution

If the Media recovery is required then
-- restore the old backup of the datafile
-- recover the datafile/tablespace
If there was no logswitch after the failure then the file can be recovered from the current redo log and no need to restore the old backup , so just recover database/tablespace will do

Also make sure that the backup window does not exceed and does not clash with the db open time

Online backup should be recommended , to avoid these problems

这个数据库并没有netbackup啊，可能还是磁盘引起。

Linux的系统日志中出现了如下错误：

end_request: I/O error, dev sr0, sector 6979968

Buffer I/O error on device sr0, logical block 872496

sr 1:0:0:0: SCSI error: return code = 0x08000002

sr0: Current: sense key: Medium Error

Add. Sense: No seek complete

end_request: I/O error, dev sr0, sector 0

Buffer I/O error on device sr0, logical block 0

Buffer I/O error on device sr0, logical block 1

Buffer I/O error on device sr0, logical block 2

Buffer I/O error on device sr0, logical block 3