存储宕机导致Oracle异常故障处理---惜分飞

存储突然掉线,导致数据库crash,报大量ORA-00206 ORA-00202 ORA-15081以及Linux-x86_64 Error: 5: Input/output error之类的错误

Sun Jul 21 20:00:11 2024

Thread 1 advanced to log sequence 1594398 (LGWR switch)

  Current log# 5 seq# 1594398 mem# 0: +DATA/xff/onlinelog/group_5.412.906718739

Sun Jul 21 20:53:17 2024

WARNING: Write Failed. group:2 disk:0 AU:506916 offset:49152 size:16384

Sun Jul 21 20:53:17 2024

WARNING: Read Failed. group:2 disk:2 AU:506931 offset:49152 size:16384

WARNING: failed to read mirror side 1 of virtual extent 4 logical extent 0 of file 415 in group [2.34109396]

from disk ORACLE_DATA_0002  allocation unit 506931 reason error; if possible, will try another mirror side

Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ckpt_42142.trc:

ORA-15080: 与磁盘的同步 I/O 操作失败

ORA-27061: 异步 I/O 等待失败

Linux-x86_64 Error: 5: Input/output error

Additional information: -1

Additional information: 16384

WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0

of file 415 in group 2 on disk 0 allocation unit 506916

Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ckpt_42142.trc:

ORA-00206: 写入控制文件时出错 (块 3, # 块 1)

ORA-00202: 控制文件: ''+DATA/xff/controlfile/current.415.906718737''

ORA-15081: 无法将 I/O 操作提交到磁盘

ORA-15081: 无法将 I/O 操作提交到磁盘

Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ckpt_42142.trc:

ORA-00221: 写入控制文件时出错

ORA-00206: 写入控制文件时出错 (块 3, # 块 1)

ORA-00202: 控制文件: ''+DATA/xff/controlfile/current.415.906718737''

ORA-15081: 无法将 I/O 操作提交到磁盘

ORA-15081: 无法将 I/O 操作提交到磁盘

CKPT (ospid: 42142): terminating the instance due to error 221

Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_lmon_42087.trc:

ORA-00202: 控制文件: ''+DATA/xff/controlfile/current.415.906718737''

ORA-15081: 无法将 I/O 操作提交到磁盘

ORA-27072: 文件 I/O 错误

Linux-x86_64 Error: 5: Input/output error

Additional information: 4

Additional information: 1038194784

Additional information: -1

Sun Jul 21 20:53:19 2024

ORA-1092 : opitsk aborting process

Sun Jul 21 20:53:24 2024

ORA-1092 : opitsk aborting process

Sun Jul 21 20:53:24 2024

License high water mark = 59

Sun Jul 21 20:53:28 2024

Instance terminated by CKPT, pid = 42142

USER (ospid: 64660): terminating the instance

Instance terminated by USER, pid = 64660

存储恢复之后启动数据库报ORA-600 2131错误

Mon Jul 22 09:10:04 2024

ALTER DATABASE   MOUNT

This instance was first to mount

Mon Jul 22 09:10:04 2024

Sweep [inc][490008]: completed

Sweep [inc2][490008]: completed

NOTE: Loaded library: System

SUCCESS: diskgroup ORACLE_DATA was mounted

NOTE: dependency between database rac and diskgroup resource ora.ORACLE_DATA.dg is established

Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_14301.trc  (incident=492409):

ORA-00600: ??????, ??: [2131], [33], [32], [], [], [], [], [], [], [], [], []

Incident details in: /users/oracle/app/db/diag/rdbms/xff/xff1/incident/incdir_492409/xff1_ora_14301_i492409.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

ORA-600 signalled during: ALTER DATABASE   MOUNT...

客户尝试重建ctl进行恢复,结果由于分析不正确,导致在重建ctl的时候,遗漏了3个数据文件,并且在屏蔽一致性的情况下,强制resetlogs操作,结果数据库没有被正常打开,而是报ORA-600 2662错误

alter database open resetlogs

RESETLOGS is being done without consistancy checks. This may result

in a corrupted database. The database should be recreated.

RESETLOGS after incomplete recovery UNTIL CHANGE 9965567206652

Clearing online redo logfile 1 +DATA/xff/onlinelog/group_1.414.906718739

Clearing online log 1 of thread 1 sequence number 0

Clearing online redo logfile 1 complete

Clearing online redo logfile 2 +DATA/xff/onlinelog/group_2.413.906718739

Clearing online log 2 of thread 1 sequence number 0

Clearing online redo logfile 2 complete

Clearing online redo logfile 5 +DATA/xff/onlinelog/group_5.412.906718739

Clearing online log 5 of thread 1 sequence number 0

Clearing online redo logfile 5 complete

Expanded controlfile section 2 from 1 to 63 records

The number of logical blocks in section 2 remains the same

Expanded controlfile section 1 from 4 to 66 records

Requested to grow by 62 records; added 32 blocks of records

Expanded controlfile section 30 from 1 to 63 records

The number of logical blocks in section 30 remains the same

Expanded controlfile section 29 from 1 to 63 records

The number of logical blocks in section 29 remains the same

Control file has been expanded to support 63 threads

Mon Jul 22 23:04:07 2024

Redo thread 2 enabled by open resetlogs or standby activation

Online log +DATA/xff/onlinelog/group_1.414.906718739: Thread 1 Group 1 was previously cleared

Online log +DATA/xff/onlinelog/group_2.413.906718739: Thread 1 Group 2 was previously cleared

Online log +DATA/xff/onlinelog/group_3.501.1175036643: Thread 2 Group 3 was previously cleared

Online log +DATA/xff/onlinelog/group_4.502.1175036645: Thread 2 Group 4 was previously cleared

Online log +DATA/xff/onlinelog/group_5.412.906718739: Thread 1 Group 5 was previously cleared

Mon Jul 22 23:04:08 2024

Setting recovery target incarnation to 2

Initializing SCN for created control file

Database SCN compatibility initialized to 3

Warning - High Database SCN: Current SCN value is 9965567206655, threshold SCN value is 0

If you have not previously reported this warning on this database,

please notify Oracle Support so that additional diagnosis can be performed.

Mon Jul 22 23:04:09 2024

Assigning activation ID 2763017873 (0xa4b04e91)

Thread 1 opened at log sequence 1

  Current log# 1 seq# 1 mem# 0: +DATA/xff/onlinelog/group_1.414.906718739

Successful open of redo thread 1

MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set

Mon Jul 22 23:04:10 2024

SMON: enabling cache recovery

Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_64210.trc  (incident=624374):

ORA-00600: 内部错误代码, 参数: [2662], [2320], [1243079939], [2320], [1243211805], [12583040], [], [], [], [], [], []

Incident details in: /users/oracle/app/db/diag/rdbms/xff/xff1/incident/incdir_624374/xff1_ora_64210_i624374.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_64210.trc:

ORA-00600: 内部错误代码, 参数: [2662], [2320], [1243079939], [2320], [1243211805], [12583040], [], [], [], [], [], []

Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_ora_64210.trc:

ORA-00600: 内部错误代码, 参数: [2662], [2320], [1243079939], [2320], [1243211805], [12583040], [], [], [], [], [], []

Error 600 happened during db open, shutting down database

USER (ospid: 64210): terminating the instance due to error 600

Instance terminated by USER, pid = 64210

ORA-1092 signalled during: alter database open resetlogs...

操作到这里,后续问题就比较麻烦了,因为在asm磁盘组中数据文件重建ctl的时候遗漏3个并且还被resetlogs操作过,导致这三个文件的resetlogs scn和其他数据文件不一致,对于这个问题,解决办法通过Oracle Recovery Tools工具或者bbed修改相关resetlogs scn,然后重建ctl

SQL> @rectl.sql

Control file created.

SQL> RECOVER DATABASE;

Media recovery complete

然后解决之前数据库启动报ORA-600 2662问题,通过修改数据库scn进行解决,可以使用Patch_SCN工具进行快速解决,然后open数据库成功

SQL> ALTER DATABASE OPEN;

  

Database altered.

但是查看alert日志数据库报大量ORA-600 4194、ORA-01595Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0xC21D511] [PC:0x97F4EFA, kgegpa()+40]之类错误

Wed Jul 24 15:24:21 2024

alter database open

Beginning crash recovery of 1 threads

 parallel recovery started with 32 processes

Started redo scan

Completed redo scan

 read 0 KB redo, 0 data blocks need recovery

…………

Database Characterset is ZHS16GBK

No Resource Manager plan active

Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_smon_40279.trc  (incident=777938):

ORA-00600: 内部错误代码, 参数: [4194], [], [], [], [], [], [], [], [], [], [], []

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

replication_dependency_tracking turned off (no async multimaster replication found)

Starting background process QMNC

Wed Jul 24 15:24:40 2024

QMNC started with pid=79, OS id=54632

Block recovery from logseq 2, block 74 to scn 9965587206835

Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0

  Mem# 0: +DATA/xff/onlinelog/redo02

LOGSTDBY: Validating controlfile with logical metadata

Wed Jul 24 15:24:40 2024

Block recovery stopped at EOT rba 2.82.16

Block recovery completed at rba 2.82.16, scn 2320.1263080114

Block recovery from logseq 2, block 74 to scn 9965587206833

Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0

  Mem# 0: +DATA/xff/onlinelog/redo02

Block recovery completed at rba 2.82.16, scn 2320.1263080114

Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_smon_40279.trc:

ORA-01595: 释放区 (4) 回退段 (20) 时出错

ORA-00600: 内部错误代码, 参数: [4194], [], [], [], [], [], [], [], [], [], [], []

LOGSTDBY: Validation complete

Wed Jul 24 15:24:41 2024

Sweep [inc][777938]: completed

Sweep [inc2][777938]: completed

Wed Jul 24 15:24:41 2024

Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_q001_54657.trc  (incident=778362):

ORA-00600: 内部错误代码, 参数: [4194], [], [], [], [], [], [], [], [], [], [], []

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Starting background process SMCO

Wed Jul 24 15:24:42 2024

SMCO started with pid=83, OS id=54691

Block recovery from logseq 2, block 74 to scn 9965587206835

Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0

  Mem# 0: +DATA/xff/onlinelog/redo02

Block recovery completed at rba 2.82.16, scn 2320.1263080118

Block recovery from logseq 2, block 74 to scn 9965587206838

Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0

  Mem# 0: +DATA/xff/onlinelog/redo02

Block recovery completed at rba 2.83.16, scn 2320.1263080119

Error 600 in kwqmnpartition(), aborting txn

Errors in file /users/oracle/app/db/diag/rdbms/xff/xff1/trace/xff1_q001_54657.trc  (incident=778363):

ORA-25319: 队列表重新分区已中止

Completed: alter database open

Block recovery from logseq 2, block 74 to scn 9965587206835

Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0

  Mem# 0: +DATA/rac/onlinelog/redo02

Block recovery completed at rba 2.82.16, scn 2320.1263080118

Block recovery from logseq 2, block 74 to scn 9965587207538

Recovery of Online Redo Log: Thread 1 Group 2 Seq 2 Reading mem 0

  Mem# 0: +DATA/rac/onlinelog/redo02

Block recovery completed at rba 2.1097.16, scn 2320.1263080819

Errors in file /users/oracle/app/db/diag/rdbms/rac/rac1/trace/rac1_cjq0_55657.trc  (incident=778427):

ORA-00600: 内部错误代码, 参数: [600], [ORA-00600: 内部错误代码, 参数:

[4194], [], [], [], [], [], [], [], [], [], [], []], [], [], [], [], [], [], [], [], [], []

Incident details in: /users/oracle/app/db/diag/rdbms/xff/xff1/incident/incdir_778427/xff1_cjq0_55657_i778427.trc

Exception [type:SIGSEGV, Address not mapped to object][ADDR:0xC21D511][PC:0x97F4EFA, kgegpa()+40][flags: 0x0, count: 1]

Exception [type:SIGSEGV, Address not mapped to object][ADDR:0xC21D511][PC:0x97F396E, kgebse()+776][flags: 0x2, count: 2]

Exception [type:SIGSEGV, Address not mapped to object][ADDR:0xC21D511][PC:0x97F396E, kgebse()+776][flags: 0x2, count: 2]

从报错分析是由于undo异常导致,处理异常undo回滚段之后,数据库open正常,安排逻辑迁移数据,完成本次恢复

  • 4
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值