下班收到的ora-00494

最新推荐文章于 2023-06-08 16:57:32 发布

肚皮猪

最新推荐文章于 2023-06-08 16:57:32 发布

阅读量776

点赞数 1

分类专栏： ora错误

本文链接：https://blog.csdn.net/heavenmouse/article/details/40748163

版权

ora错误专栏收录该内容

16 篇文章 0 订阅

订阅专栏

下班回家路上接到用户邮件，一个库意外crash。

回家查看了下alert日志，妹的，切换日志这么频繁，

Mon Nov 3 08:14:29 2014
Thread 1 advanced to log sequence 132924 (LGWR switch)
Current log# 4 seq# 132924 mem# 0: /jsocr/ocrdata/paes/redo04.log
Mon Nov 3 08:14:47 2014
Thread 1 advanced to log sequence 132925 (LGWR switch)
Current log# 5 seq# 132925 mem# 0: /jsocr/ocrdata/paes/redo05.log
Mon Nov 3 08:14:58 2014
Thread 1 advanced to log sequence 132926 (LGWR switch)
Current log# 6 seq# 132926 mem# 0: /jsocr/ocrdata/paes/redo06.log
Mon Nov 3 08:15:19 2014
Thread 1 advanced to log sequence 132927 (LGWR switch)
Current log# 4 seq# 132927 mem# 0: /jsocr/ocrdata/paes/redo04.log
Mon Nov 3 08:15:46 2014
Thread 1 advanced to log sequence 132928 (LGWR switch)
Current log# 5 seq# 132928 mem# 0: /jsocr/ocrdata/paes/redo05.log
Mon Nov 3 08:16:07 2014
Thread 1 advanced to log sequence 132929 (LGWR switch)
Current log# 6 seq# 132929 mem# 0: /jsocr/ocrdata/paes/redo06.log
Mon Nov 3 08:19:23 2014
Thread 1 advanced to log sequence 132930 (LGWR switch)

检查DOWN机时间情况

Fri Oct 31 17:30:52 2014
Errors in file /oracle10/admin/paes/bdump/paes_lgwr_233790.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 196896'
Fri Oct 31 17:30:57 2014
System State dumped to trace file /oracle10/admin/paes/bdump/paes_lgwr_233790.trc
Killing enqueue blocker (pid=196896) on resource CF-00000000-00000000

关于这个00494，metalink描述如下

This could be due to any of the following:

Cause#1: The lgwr has killed the ckpt process, causing the instance to crash.
From the alert.log we can see:
•The database has waited too long for a CF enqueue, so the next error is reported:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 38356'

•Then the LGWR killed the blocker, which was in this case the CKPT process which then causes the instance to crash.

Checking the alert.log further we can see that the frequency of redo log files switch is very high (almost every 1 min).

Cause#2: Checking the I/O State in the AWR report we find that:
Average Read per ms (Av Rd(ms)) for the database files which are located on this mount point " /oracle/oa1l/data/" is facing I/O issue as per the data collection which was perform

Tablespace Av Rd(ms)
=========== ==========
APPS_TS_TX_DATA 928.93
APPS_TS_TX_IDX 531.43
APPS_TS_SUMMARY 60.14
TEMP 76.57
SYSTEM 38.08
APPS_UNDOTS1 103.75
SYSAUX 61.43
APPS_TS_INTERFACE 38.50
APPS_TS_QUEUES 72.60
APPS_TS_ARCHIVE 49.90
NOETIX_TS 83.39
APPS_TS_SEED 123.20
TOOLS 30.00
APPS_TS_NOLOGGING 50.00
and as per Doc ID 1275596.1 How to Tell if the IO of the Database is Slow that A typical multi-block synchronous read of 64 x 8k blocks (512kB total) should have an average of at most 20 milliseconds before worrying about 'slow IO'. Smaller requests should be faster(10-20ms) whereas for larger requests, the elapsed time should be no more than 25ms.for all tablespaces are greater than 20ms

Cause#3: The problem has been investigated in Bug 7692631 - 'DATABASE CRASHES WITH ORA-494 AFTER UPGRADE TO 10.2.0.4'
and unpublished Bug 7914003 'KILL BLOCKER AFTER ORA-494 LEADS TO FATAL BG PROCESS BEING KILLED'

Solution

Solution#1:
We usually suggest to configure the redo log switches to be done every 20~30 min to reduce the contention on the control files.
You can use the V$INSTANCE_RECOVERY view column OPTIMAL_LOGFILE_SIZE to determine a recommended size for your online redo logs. This field shows the redo log file size in megabytes that is considered optimal based on the current setting of FAST_START_MTTR_TARGET. If this field consistently shows a value greater than the size of your smallest online log, then you should configure all your online logs to be at least this size.

Solution#2:
Check the Storage used for storing the database as this issue is I/O issue as per collected data

Solution#3:

This kill blocker interface / ORA-494 was introduced in 10.2.0.4. This new mechanism will kill *any* kind of blocking process, non-background or background.
•The difference will be that if the enqueue holder is a non-background process, even if it is killed, the instance can function without it.
•In case the holder is a background process, for example the LGWR, the kill of the holder leads to instance crash.

If you want to avoid the kill of the blocker (background or non-background process), you can set

_kill_controlfile_enqueue_blocker=false.

This means that no type of blocker will be killed anymore although the resolution to this problem should focus on why the process is holding the enqueue for so long. Also, you may prefer to only avoid killing background processes, since they are vital to the instance, and you may want to allow the killing of non-background blokers.

This has been addressed in a secondary bug - unpublished Bug 7914003 'KILL BLOCKER AFTER ORA-494 LEADS TO FATAL BG PROCESS BEING KILLED' which was closed as Not a bug.

In order to prevent a background blocker from being killed, you can set the following init.ora parameter to 1 (default is 3).

_kill_enqueue_blocker=1

最后建议用户调整redolog大小，减少切换过多对控制文件争用。生产系统，万不得已再去关闭kill block.

回邮件睡觉！