下班收到的ora-00494

下班回家路上接到用户邮件,一个库意外crash。

回家查看了下alert日志,妹的,切换日志这么频繁,

Mon Nov  3 08:14:29 2014
Thread 1 advanced to log sequence 132924 (LGWR switch)
  Current log# 4 seq# 132924 mem# 0: /jsocr/ocrdata/paes/redo04.log
Mon Nov  3 08:14:47 2014
Thread 1 advanced to log sequence 132925 (LGWR switch)
  Current log# 5 seq# 132925 mem# 0: /jsocr/ocrdata/paes/redo05.log
Mon Nov  3 08:14:58 2014
Thread 1 advanced to log sequence 132926 (LGWR switch)
  Current log# 6 seq# 132926 mem# 0: /jsocr/ocrdata/paes/redo06.log
Mon Nov  3 08:15:19 2014
Thread 1 advanced to log sequence 132927 (LGWR switch)
  Current log# 4 seq# 132927 mem# 0: /jsocr/ocrdata/paes/redo04.log
Mon Nov  3 08:15:46 2014
Thread 1 advanced to log sequence 132928 (LGWR switch)
  Current log# 5 seq# 132928 mem# 0: /jsocr/ocrdata/paes/redo05.log
Mon Nov  3 08:16:07 2014
Thread 1 advanced to log sequence 132929 (LGWR switch)
  Current log# 6 seq# 132929 mem# 0: /jsocr/ocrdata/paes/redo06.log
Mon Nov  3 08:19:23 2014
Thread 1 advanced to log sequence 132930 (LGWR switch)

 

检查DOWN机时间情况

Fri Oct 31 17:30:52 2014
Errors in file /oracle10/admin/paes/bdump/paes_lgwr_233790.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 196896'
Fri Oct 31 17:30:57 2014
System State dumped to trace file /oracle10/admin/paes/bdump/paes_lgwr_233790.trc
Killing enqueue blocker (pid=196896) on resource CF-00000000-00000000

 

关于这个00494,metalink描述如下


This could be due to any of the following:

Cause#1: The lgwr has killed the ckpt process, causing the instance to crash.
From the alert.log we can see:
•The database has waited too long for a CF enqueue, so the next error is reported:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 38356'


•Then the LGWR killed the blocker, which was in this case the CKPT process which then causes the instance to crash.

Checking the alert.log further we can see that the frequency of redo log files switch is very high (almost every 1 min).

Cause#2: Checking the I/O State in the AWR report we find that:
Average Read per ms (Av Rd(ms)) for the database files which are located on this mount point " /oracle/oa1l/data/" is facing I/O issue as per the data collection which was perform

Tablespace Av Rd(ms)
=========== ==========
APPS_TS_TX_DATA 928.93
APPS_TS_TX_IDX 531.43
APPS_TS_SUMMARY 60.14
TEMP 76.57
SYSTEM 38.08
APPS_UNDOTS1 103.75
SYSAUX 61.43
APPS_TS_INTERFACE 38.50
APPS_TS_QUEUES 72.60
APPS_TS_ARCHIVE 49.90
NOETIX_TS 83.39
APPS_TS_SEED 123.20
TOOLS 30.00
APPS_TS_NOLOGGING 50.00
and as per Doc ID 1275596.1 How to Tell if the IO of the Database is Slow that A typical multi-block synchronous read of 64 x 8k blocks (512kB total) should have an average of at most 20 milliseconds before worrying about 'slow IO'. Smaller requests should be faster(10-20ms) whereas for larger requests, the elapsed time should be no more than 25ms.for all tablespaces are greater than 20ms

Cause#3: The problem has been investigated in Bug 7692631 - 'DATABASE CRASHES WITH ORA-494 AFTER UPGRADE TO 10.2.0.4'
and unpublished Bug 7914003 'KILL BLOCKER AFTER ORA-494 LEADS TO FATAL BG PROCESS BEING KILLED'

 

 

Solution

Solution#1:
We usually suggest to configure the redo log switches to be done every 20~30 min to reduce the contention on the control files.
You can use the V$INSTANCE_RECOVERY view column OPTIMAL_LOGFILE_SIZE to determine a recommended size for your online redo logs. This field shows the redo log file size in megabytes that is considered optimal based on the current setting of FAST_START_MTTR_TARGET. If this field consistently shows a value greater than the size of your smallest online log, then you should configure all your online logs to be at least this size.

Solution#2:
Check the Storage used for storing the database as this issue is I/O issue as per collected data

Solution#3:


This kill blocker interface / ORA-494 was introduced in 10.2.0.4. This new mechanism will kill *any* kind of blocking process, non-background or background.
•The difference will be that if the enqueue holder is a non-background process, even if it is killed, the instance can function without it.
•In case the holder is a background process, for example the LGWR, the kill of the holder leads to instance crash.

If you want to avoid the kill of the blocker (background or non-background process), you can set

_kill_controlfile_enqueue_blocker=false.


This means that no type of blocker will be killed anymore although the resolution to this problem should focus on why the process is holding the enqueue for so long. Also, you may prefer to only avoid killing background processes, since they are vital to the instance, and you may want to allow the killing of non-background blokers.

This has been addressed in a secondary bug - unpublished Bug 7914003 'KILL BLOCKER AFTER ORA-494 LEADS TO FATAL BG PROCESS BEING KILLED' which was closed as Not a bug.

 

In order to prevent a background blocker from being killed, you can set the following init.ora parameter to 1 (default is 3).

_kill_enqueue_blocker=1

 

 

 

最后建议用户调整redolog大小,减少切换过多对控制文件争用。生产系统,万不得已再去关闭kill block.

 

回邮件睡觉!

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值