记一次郁闷的故障处理

首先来看一下故障发生的来龙去脉:
数据库版本:9.2.0.4
操作系统版本:5200-04
1、日志推进至95761,准备归档95760,未释放redo02.log
Fri Feb 25 04:22:00 2011
Thread 1 advanced to log sequence 95760
Current log# 2 seq# 95760 mem# 0: /zzrkdata/zzrk/redo02.log
。。。
Thread 1 advanced to log sequence 95761
Current log# 1 seq# 95761 mem# 0: /zzrkdata/zzrk/redo01.log
Fri Feb 25 04:23:11 2011
ARC0: Evaluating archive log 2 thread 1 sequence 95760
ARC0: Beginning to archive log 2 thread 1 sequence 95760
Creating archive destination LOG_ARCHIVE_DEST_1: '/zzrkarch/zzrk/arch_1_95760.arc'

2、ARC1不能归档95760,和ARC0进程发生争用。
Fri Feb 25 04:24:15 2011
ARC1: Completed archiving log 1 thread 1 sequence 95761
ARC1: Evaluating archive log 2 thread 1 sequence 95760
ARC1: Unable to archive log 2 thread 1 sequence 95760
Log actively being archived by another process

3、由于不能归档,导致rman备份失败,进而导致归档目录慢,业务挂起
Mon Mar 14 04:20:09 2011
Errors in file /oracle/app/admin/zzrk/bdump/zzrk_arc1_3846274.trc:
ORA-19502: write error on file "/zzrkarch/zzrk/arch_1_97795.arc", blockno 114689 (blocksize=512)
ORA-27063: skgfospo: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 28: No space left on device

4、日志准备推进至98186,准备归档98184,未释放redo04.log
Thu Mar 17 04:26:54 2011
ARCH: Connecting to console port...
Thread 1 cannot allocate new log, sequence 98185
All online logs needed archiving
Current log# 4 seq# 98184 mem# 0: /zzrkdata/zzrk/redo04.log
。。。
Thu Mar 17 04:27:09 2011
Thread 1 advanced to log sequence 98185
Current log# 1 seq# 98185 mem# 0: /zzrkdata/zzrk/redo01.log
Thu Mar 17 04:27:09 2011
ARC1: Evaluating archive log 2 thread 1 sequence 95760
ARC1: Unable to archive log 2 thread 1 sequence 95760
Log actively being archived by another process
ARC1: Evaluating archive log 4 thread 1 sequence 98184
ARC1: Beginning to archive log 4 thread 1 sequence 98184
Creating archive destination LOG_ARCHIVE_DEST_1: '/zzrkarch/zzrk/arch_1_98184.arc'

5、由于归档失败,98185占用redo01.log,未能推进至98186
Thu Mar 17 04:27:56 2011
ORACLE Instance zzrk - Can not allocate log, archival required
Thu Mar 17 04:27:56 2011
ARCH: Connecting to console port...
Thread 1 cannot allocate new log, sequence 98186
All online logs needed archiving
Current log# 1 seq# 98185 mem# 0: /zzrkdata/zzrk/redo01.log
Thu Mar 17 09:52:12 2011
ARCH: Evaluating archive log 2 thread 1 sequence 95760
ARCH: Unable to archive log 2 thread 1 sequence 95760
Log actively being archived by another process
ARCH: Evaluating archive log 4 thread 1 sequence 98184
ARCH: Unable to archive log 4 thread 1 sequence 98184
Log actively being archived by another process
Thu Mar 17 09:53:41 2011
Shutting down instance: further logons disabled
Shutting down instance (immediate)
License high water mark = 360

6、手动添加日志文件,推进至98186,并占用redo03.log
Thu Mar 17 09:57:12 2011
/* OracleOEM */ ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 3 ('/zzrkdata/zzrk/redo03.log') SIZE 102400K REUSE;
Thu Mar 17 09:57:17 2011
Thread 1 advanced to log sequence 98186
Current log# 3 seq# 98186 mem# 0: /zzrkdata/zzrk/redo03.log

7、手动添加数据文件,推进至98187,并占用redo05.log,未能推进至98188
Thread 1 cannot allocate new log, sequence 98187
All online logs needed archiving
Current log# 3 seq# 98186 mem# 0: /zzrkdata/zzrk/redo03.log
Thu Mar 17 09:57:45 2011
/* OracleOEM */ ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 5 ('/zzrkdata/zzrk/redo05.log') SIZE 102400K
Thu Mar 17 09:57:50 2011
Thread 1 advanced to log sequence 98187
Current log# 5 seq# 98187 mem# 0: /zzrkdata/zzrk/redo05.log
。。。
Thu Mar 17 09:58:19 2011
ARCH: Connecting to console port...
Thread 1 cannot allocate new log, sequence 98188

8、98185归档成功,释放redo01.log资源,推进至98188,占用redo01.log
Thu Mar 17 10:04:13 2011
ARCH: Evaluating archive log 2 thread 1 sequence 95760
ARCH: Unable to archive log 2 thread 1 sequence 95760
Log actively being archived by another process
ARCH: Evaluating archive log 4 thread 1 sequence 98184
ARCH: Unable to archive log 4 thread 1 sequence 98184
Log actively being archived by another process
ARCH: Evaluating archive log 1 thread 1 sequence 98185
ARCH: Beginning to archive log 1 thread 1 sequence 98185
Creating archive destination LOG_ARCHIVE_DEST_1: '/zzrkarch/zzrk/arch_1_98185.arc'
ARCH: Completed archiving log 1 thread 1 sequence 98185
Thu Mar 17 10:04:19 2011
Thread 1 advanced to log sequence 98188
Current log# 1 seq# 98188 mem# 0: /zzrkdata/zzrk/redo01.log

9、98186,98187归档成功,释放redo03.log,redo05.log
Thu Mar 17 10:04:28 2011
ARCH: Completed archiving log 3 thread 1 sequence 98186
ARCH: Evaluating archive log 2 thread 1 sequence 95760
ARCH: Unable to archive log 2 thread 1 sequence 95760
Log actively being archived by another process
ARCH: Evaluating archive log 4 thread 1 sequence 98184
ARCH: Unable to archive log 4 thread 1 sequence 98184
Log actively being archived by another process
ARCH: Evaluating archive log 5 thread 1 sequence 98187
ARCH: Beginning to archive log 5 thread 1 sequence 98187
Creating archive destination LOG_ARCHIVE_DEST_1: '/zzrkarch/zzrk/arch_1_98187.arc'
ARCH: Completed archiving log 5 thread 1 sequence 98187

9、推进至98190。98189占用redo03.log,98190占用redo05.log。未能推进至98191
Thu Mar 17 10:04:38 2011
Thread 1 advanced to log sequence 98189
Current log# 3 seq# 98189 mem# 0: /zzrkdata/zzrk/redo03.log
Thu Mar 17 10:05:38 2011
ORACLE Instance zzrk - Can not allocate log, archival required
Thu Mar 17 10:05:38 2011
ARCH: Connecting to console port...
Thread 1 cannot allocate new log, sequence 98190
All online logs needed archiving
Current log# 3 seq# 98189 mem# 0: /zzrkdata/zzrk/redo03.log
Thu Mar 17 10:06:42 2011
Thread 1 advanced to log sequence 98190
Current log# 5 seq# 98190 mem# 0: /zzrkdata/zzrk/redo05.log
Thu Mar 17 10:07:10 2011
ORACLE Instance zzrk - Can not allocate log, archival required
Thu Mar 17 10:07:10 2011
ARCH: Connecting to console port...
Thread 1 cannot allocate new log, sequence 98191
All online logs needed archiving
Current log# 5 seq# 98190 mem# 0: /zzrkdata/zzrk/redo05.log

10、手动添加日志文件失败,
Thu Mar 17 10:08:14 2011
/* OracleOEM */ ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 6 ('/zzrkdata/zzrk/redo06.log') SIZE 102400K
Thu Mar 17 10:08:14 2011
ORA-1185 signalled during: /* OracleOEM */ ALTER DATABASE ADD LOGFILE THREAD ...
默认情况下MAXLOGMEMBERS为5
CREATE CONTROLFILE REUSE DATABASE "ORA9I" NORESETLOGS NOARCHIVELOG
-- SET STANDBY TO MAXIMIZE PERFORMANCE
MAXLOGFILES 50
MAXLOGMEMBERS 5
MAXDATAFILES 100
MAXINSTANCES 1
MAXLOGHISTORY 2722

ora10g@localhost ~]$ oerr ora 1185
01185, 00000, "logfile group number %s is invalid"
// *Cause: An ALTER DATABASE ADD LOGFILE command specified a log number for
// the new log which is too large.
// *Action: Specify a correct logfile number.

11、尝试关闭数据库,失败。采用手工杀进程
Shutting down instance (abort)
License high water mark = 360
USER: terminating instance due to error 1092
Thu Mar 17 10:30:14 2011
USER: terminating instance due to error 1092
Thu Mar 17 10:30:14 2011
DIAG: terminating instance due to error 1092
Thu Mar 17 10:30:14 2011
LMD0: terminating instance due to error 1092
Thu Mar 17 10:30:14 2011
USER: terminating instance due to error 1092
Thu Mar 17 10:30:16 2011
USER: terminating instance due to error 1092
Thu Mar 17 10:30:16 2011
PMON: terminating instance due to error 1092
Thu Mar 17 10:30:33 2011
Instance terminated by DIAG, pid = 454866

[ora10g@localhost ~]$ oerr ora 1092
01092, 00000, "ORACLE instance terminated. Disconnection forced"
// *Cause: The instance this process was connected to was terminated
// abnormally, probably via a shutdown abort. This process
// was forced to disconnect from the instance.
// *Action: Examine the alert log for more details. When the instance has been
// restarted, retry action.

12、重启Oracle,数据库完成crash recover,但open阶段hang,如果事务恢复的话,磁盘会有相应活动信息,但通过iostat观察,磁盘没有任何io信息,于是决定数据库再次abort
Thu Mar 17 10:40:56 2011
Starting ORACLE instance (normal)
Thu Mar 17 10:40:56 2011
Global Enqueue Service Resources = 64, pool = 8
Thu Mar 17 10:40:56 2011
Global Enqueue Service Enqueues = 128
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
GES IPC: Receivers 1 Senders 1
GES IPC: Buffers Receive 1000 Send 1430 Reserve 1000
GES IPC: Msg Size Regular 440 Batch 2048
SCN scheme 3
Using log_archive_dest parameter default value
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 9.2.0.4.0.
System parameters with non-default values:
processes = 600
timed_statistics = TRUE
shared_pool_size = 838860800
large_pool_size = 167772160
java_pool_size = 33554432
control_files = /zzrkdata/zzrk/control01.ctl
db_block_size = 16384
db_cache_size = 4009754624
compatible = 9.2.0.0.0
log_archive_start = TRUE
log_archive_dest_1 = location=/zzrkarch/zzrk
log_archive_format = arch_%t_%s.arc
db_file_multiblock_read_count= 32
fast_start_mttr_target = 300
undo_management = AUTO
undo_tablespace = UNDOTBS1
undo_retention = 10800
remote_login_passwordfile= EXCLUSIVE
db_domain =
instance_name = zzrk
hash_join_enabled = TRUE
background_dump_dest = /oracle/app/admin/zzrk/bdump
user_dump_dest = /oracle/app/admin/zzrk/udump
core_dump_dest = /oracle/app/admin/zzrk/cdump
sort_area_size = 10485760
db_name = zzrk
open_cursors = 1500
star_transformation_enabled= FALSE
query_rewrite_enabled = FALSE
pga_aggregate_target = 1024000000
Thu Mar 17 10:40:56 2011
cluster interconnect IPC version:Oracle UDP/IP
IPC Vendor 1 proto 2 Version 1.0
PMON started with pid=2
DIAG started with pid=3
LMON started with pid=4
LMD0 started with pid=5
DBW0 started with pid=6
LGWR started with pid=7
CKPT started with pid=8
SMON started with pid=9
RECO started with pid=10
Thu Mar 17 10:41:00 2011
ARCH: STARTING ARCH PROCESSES
ARC0 started with pid=11
ARC0: Archival started
ARC1 started with pid=12
ARC1: Archival started
Thu Mar 17 10:41:00 2011
ARCH: STARTING ARCH PROCESSES COMPLETE
Thu Mar 17 10:41:00 2011
ARC1: Thread not mounted
Thu Mar 17 10:41:00 2011
ARC0: Thread not mounted
Thu Mar 17 10:41:00 2011
ALTER DATABASE MOUNT
Thu Mar 17 10:41:01 2011
lmon registered with NM - instance id 1 (internal mem no 0)
Thu Mar 17 10:41:01 2011
Reconfiguration started
List of nodes: 0,
Global Resource Directory frozen
one node partition
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Resources and enqueues cleaned out
Resources remastered 0
set master node info
Submitted all remote-enqueue requests
Update rdomain variables
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
0 GCS shadows traversed, 0 replayed, 0 unopened
Submitted all GCS remote-cache requests
Thu Mar 17 10:41:04 2011
Reconfiguration complete
Post SMON to start 1st pass IR
Thu Mar 17 10:41:08 2011
Successful mount of redo thread 1, with mount id 1306399872.
Thu Mar 17 10:41:08 2011
Database mounted in Exclusive Mode.
Completed: ALTER DATABASE MOUNT
Thu Mar 17 10:41:08 2011
ALTER DATABASE OPEN
Thu Mar 17 10:41:08 2011
Beginning crash recovery of 1 threads
Thu Mar 17 10:41:08 2011
Started first pass scan
Thu Mar 17 10:41:09 2011
Completed first pass scan
184006 redo blocks read, 7257 data blocks need recovery
Thu Mar 17 10:41:10 2011
Started recovery at
Thread 1: logseq 98190, block 20793, scn 0.0
Recovery of Online Redo Log: Thread 1 Group 5 Seq 98190 Reading mem 0
Mem# 0 errs 0: /zzrkdata/zzrk/redo05.log
Thu Mar 17 10:41:17 2011
Completed redo application
Thu Mar 17 10:41:51 2011
Ended recovery at
Thread 1: logseq 98190, block 204799, scn 2825.638644552
7257 data blocks read, 5528 data blocks written, 184006 redo blocks read
Crash recovery completed successfully
Thu Mar 17 10:41:52 2011
LGWR: Primary database is in CLUSTER CONSISTENT mode
Thu Mar 17 11:08:38 2011
操作系统查看ls -l /zzrkdata/zzrk/redo05.log出现hang
在数据库查看状态为
SQL> select * from v$log;

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARCHIVED
---------- ---------- ---------- ---------- ---------- ---------
STATUS FIRST_CHANGE#
------------------------------------------------ -------------
FIRST_TIME
------------------
1 1 98188 104857600 1 NO
INACTIVE 1.2134E+13
17-MAR-11

2 1 95760 104857600 1 NO
INACTIVE 1.2117E+13
25-FEB-11

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARCHIVED
---------- ---------- ---------- ---------- ---------- ---------
STATUS FIRST_CHANGE#
------------------------------------------------ -------------
FIRST_TIME
------------------

3 1 98189 104857600 1 NO
INACTIVE 1.2134E+13
17-MAR-11

4 1 98184 104857600 1 NO
INACTIVE 1.2134E+13

GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARCHIVED
---------- ---------- ---------- ---------- ---------- ---------
STATUS FIRST_CHANGE#
------------------------------------------------ -------------
FIRST_TIME
------------------
17-MAR-11

5 1 98190 104857600 1 NO
INVALIDATED 1.2134E+13
17-MAR-11

13、郁闷的,只有重启主机,经过漫长的等待。。。。查看ls -l redo*是否正常。
再次启动数据库至mount状态,为保险起见,采用recover database;但是报如下错,这是因为上次启动时已经完成crash recover。
Thu Mar 17 12:34:47 2011
Media Recovery Start
Media Recovery failed with error 264
ORA-283 signalled during: ALTER DATABASE RECOVER database ...

[ora10g@localhost ~]$ oerr ora 264
00264, 00000, "no recovery required"
// *Cause: An attempt was made to perform media recovery on files that do not // need any type of recovery.
// *Action: Do not attempt to perform media recovery on the selected
// files. Check to see that the filenames were entered properly.
// If not, retry the command with the proper filenames.

14、接下来就是open数据库,进行tx recover
Thu Mar 17 12:35:26 2011
alter database open
Thu Mar 17 12:35:26 2011
LGWR: Primary database is in CLUSTER CONSISTENT mode
Thu Mar 17 12:35:26 2011
ARCH: Evaluating archive log 2 thread 1 sequence 95760
ARCH: Beginning to archive log 2 thread 1 sequence 95760
Creating archive destination LOG_ARCHIVE_DEST_1: '/zzrkarch/zzrk/arch_1_95760.arc'
ARCH: Completed archiving log 2 thread 1 sequence 95760
Thu Mar 17 12:35:31 2011
LGWR: Primary database is in CLUSTER CONSISTENT mode
Thread 1 advanced to log sequence 98191
Thread 1 opened at log sequence 98191
Current log# 2 seq# 98191 mem# 0: /zzrkdata/zzrk/redo02.log
Successful open of redo thread 1.
Thu Mar 17 12:35:32 2011
SMON: enabling cache recovery
Thu Mar 17 12:35:32 2011
ARC0: Evaluating archive log 4 thread 1 sequence 98184
ARC0: Beginning to archive log 4 thread 1 sequence 98184
Creating archive destination LOG_ARCHIVE_DEST_1: '/zzrkarch/zzrk/arch_1_98184.arc'
Thu Mar 17 12:35:33 2011
Undo Segment 1 Onlined
Undo Segment 2 Onlined
Undo Segment 3 Onlined
Undo Segment 4 Onlined
Undo Segment 5 Onlined
Undo Segment 6 Onlined
Undo Segment 7 Onlined
Undo Segment 8 Onlined
Undo Segment 9 Onlined
Undo Segment 10 Onlined
Successfully onlined Undo Tablespace 1.
Thu Mar 17 12:35:33 2011
SMON: enabling tx recovery
Thu Mar 17 12:35:33 2011
Database Characterset is ZHS16GBK
Thu Mar 17 12:35:34 2011
SMON: about to recover undo segment 16
SMON: about to recover undo segment 16
SMON: mark undo segment 16 as available
SMON: about to recover undo segment 16
SMON: mark undo segment 16 as available
SMON: about to recover undo segment 16
SMON: mark undo segment 16 as available
SMON: about to recover undo segment 16
SMON: mark undo segment 16 as available
Thu Mar 17 12:35:34 2011
replication_dependency_tracking turned off (no async multimaster replication found)
Thu Mar 17 12:35:34 2011
SMON: about to recover undo segment 16
SMON: mark undo segment 16 as available
SMON: about to recover undo segment 16
SMON: mark undo segment 16 as available
Thu Mar 17 12:35:35 2011
Completed: alter database open
Thu Mar 17 12:35:35 2011
SMON: about to recover undo segment 16
SMON: mark undo segment 16 as available
SMON: about to recover undo segment 16
SMON: mark undo segment 16 as available
。。。
SMON: Parallel transaction recovery tried
。。。

故障是解决了,但是郁闷的是最终通过重启主机来解决的。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值