ORA-00600 [2103]的分析

昨天下午5点有一台Production DB Crash, Alert.log察看发现是ORA-00600 [2103]

这个ERROR我是第一次见到。

[@more@]

----------------------------------------------------

Wed Jun 3 01:17:45 2009

ARC1: Unable to archive log 3 thread 1 sequence 1935

Log actively being archived by another process

Wed Jun 3 01:17:46 2009

Creating archive destination LOG_ARCHIVE_DEST_1: '/u01/archive/ems/arc1_1935.dbf'

ARCH: Completed archiving log 3 thread 1 sequence 1935

Wed Jun 3 02:52:20 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_arc0_1250.trc:

ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []

Wed Jun 3 02:57:32 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_arc1_26800.trc:

ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []

Wed Jun 3 02:57:33 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_rsm0_1222.trc:

ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []

Wed Jun 3 02:58:19 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_arc1_26800.trc:

ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []

Wed Jun 3 02:58:19 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_arc0_1250.trc:

ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []

Wed Jun 3 02:58:19 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_arc1_26800.trc:

ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []

Wed Jun 3 02:58:19 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_arc0_1250.trc:

ORA-00600: internal error code, arguments: [2103], [1], [0], [1], [900], [], [], []

RSM0 started with pid=22, OS id=3191

------------------------------------------------------------------------

Metalink查了下:正好有一个文档

Diagnosing ORA-600 [2103] issues in a NON RAC environment

照这篇文档的看法,原因主要是这几种:

An ORA-600 [2103] is signaled when an Oracle process cannot get a CF(Control File) enqueue for 900 seconds.

- Very slow I/O subsystem where the Control files are stored.

- Frequent log switching, redo logs to small or low number.

- Async IO issue or multiple db_writers, you can't use both of them, back one out

- OS / Hardware issues

查了一下相关的TRC,只有

/u01/app/oracle/admin/ems/bdump/ems_rsm0_1222.trc

有当时的system state dump.

CF enqueue的状况如下:

CKPT:

O/S info: user: oracle, term: UNKNOWN, ospid: 24365

OSD pid info: Unix process pid: 24365, image: oracle@sh-iecdb-01 (CKPT)

……

SO: 0x822cfe60, type: 4, owner: 0x82296840, flag: INIT/-/-/0x00

(session) trans: (nil), creator: 0x82296840, flag: (51) USR/- BSY/-/-/-/-/-

DID: 0001-0005-00000005, short-term DID: 0000-0000-00000000

txn branch: (nil)

oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS

waiting for 'control file parallel write' blocking sess=0x0 seq=12984 wait_time=0

files=3, blocks=3, requests=3

temporary object counter: 0

SO: 0x823be098, type: 6, owner: 0x8233f800, flag: INIT/-/-/0x00

(enqueue) CF-00000000-00000003 DID: 0001-0005-00000005

lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

res: 82416e20, mode: X, prv: 82416e30, sess: 822cfe60, proc: 82296840

RMAN Process:

O/S info: user: oracle, term: , ospid: 2194, machine: sh-iecdb-01

program: rman@sh-iecdb-01 (TNS V1-V3)

application name: rman@sh-iecdb-01 (TNS V1-V3), hash value=0

action name: 0000002 STARTED, hash value=145214725

waiting for 'control file sequential read' blocking sess=0x0 seq=321 wait_time=0

SO: 0x823bd798, type: 6, owner: 0x822e9a50, flag: INIT/-/-/0x00

(enqueue) CF-00000000-00000002 DID: 0001-0016-0001F0FB

lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

res: 8241a948, mode: X, prv: 8241a958, sess: 822e9a50, proc: 8229c3a0

----------------------------------------

SO: 0x823bd960, type: 6, owner: 0x82341d08, flag: INIT/-/-/0x00

(enqueue) CF-00000000-00000004 DID: 0001-0016-0001F0FB

lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

res: 82421080, mode: S, prv: 82421090, sess: 822e9a50, proc: 8229c3a0

----------------------------------------

SO: 0x823bd830, type: 6, owner: 0x82341d08, flag: INIT/-/-/0x00

(enqueue) CF-00000000-00000000 DID: 0001-0016-0001F0FB

lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

res: 82411bb0, mode: X, prv: 82411bc0, sess: 822e9a50, proc: 8229c3a0

----------------------------------------

ARC0:

last wait for 'enqueue' blocking sess=0x0 seq=64935 wait_time=2930614

name|mode=43460004, id1=0, id2=0

ARC1:

last wait for 'enqueue' blocking sess=0x0 seq=19514 wait_time=2929638

name|mode=43460004, id1=0, id2=0

可见CKPTRMAN持有CF Enqueue, 并且分别在等待

control file parallel write & control file sequential read

ARCn在等待 CF enqueue超时。

到这里原因就明白了,打开Crontab发现2点和2点半各有一个RMAN backup job.

这台Server和另外一台Server上的Production DB 互相做Data Guard, 2点钟会做standby DBbackup to NFS mount point, 2点半会做本机Productiondelete backup archived log

Server本身性能并不强,只是2Raid1,因为本身的Loading并不高,是做为报关用的DB

以前资料量比较小,第一个JOB会在第二个之前完成,现在资料量大了之后,有可能2JOB会出现同时跑的情况,9IRMAN又是出了名的耗I/O, 使得在对Control fileI/O上出现很大问题。

根据Backup job log显示,第一个JOB完成在304分,第二个JOB完成在258

ORA-00600出现在252分。

在晚上1719分的时侯,当redo需要archive时,

Wed Jun 3 17:19:54 2009

LGWR: Detected ARCH process failure

LGWR: Detected ARCH process failure

LGWR: STARTING ARCH PROCESSES

Wed Jun 3 17:19:54 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_lgwr_24363.trc:

ORA-00445: background process "ARC0" did not start after 120 seconds

ARC0 started with pid=24, OS id=5991

Wed Jun 3 17:19:54 2009

Errors in file /u01/app/oracle/admin/ems/bdump/ems_lgwr_24363.trc:

ORA-00449: background process 'ARC0' unexpectedly terminated with error 445

ORA-00445: background process "" did not start after seconds

Wed Jun 3 17:19:54 2009

LGWR: terminating instance due to error 449

Instance terminated by LGWR, pid = 24363

Metalink上还有一个类似的例子是RMAN backup to NFS的时侯出现这个错误。

它认为是在backup control file to a non-catalog backup的时侯,control file concurrently update, 从而在control file 上产生deadlock.

这个问题只出现在backup destination NFS的情况,影响范围9.0.1~10.2.0.3

解决方案是把control filebackup放在local.

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/10856805/viewspace-1022782/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/10856805/viewspace-1022782/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值