OC4J ORA600 6002（IBM 4700磁阵控制器重启造成的数据库宕机故障）

最新推荐文章于 2024-06-19 23:59:25 发布

congliang1851

最新推荐文章于 2024-06-19 23:59:25 发布

阅读量595

点赞数

文章标签：数据库操作系统

IBM磁阵重启后数据库上报：（凌晨5点41分数据库宕机）

Fri Jul 24 02:00:00 2015

clearing Resource Manager plan via parameter
Fri Jul 24 05:41:31 2015
********************* ATTENTION: ********************
The controlfile header block returned by the OS
has a sequence number that is too old.
The controlfile might be corrupted.
PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE
without following the steps below.
RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE
TO THE DATABASE, if the controlfile is truly corrupted.
In order to re-start the instance safely,
please do the following:
(1) Save all copies of the controlfile for later
analysis and contact your OS vendor and Oracle support.
(2) Mount the instance and issue:
ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
(3) Unmount the instance.
(4) Use the script in the trace file to
RE-CREATE THE CONTROLFILE and open the database.
*****************************************************
CKPT (ospid: 1103): terminating the instance
Instance terminated by CKPT, pid = 1103

凌晨5点47分21秒分数据库重启（由于控制文件损坏数据库没有mount上）
Fri Jul 24 05:47:21 2015
Adjusting the default value of parameter parallel_max_servers
from 1280 to 185 due to the value of parameter processes (200)
Starting ORACLE instance (normal)
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /home/oracle/product/11g/dbs/arch
Autotune of undo retention is turned on.
IMODE=BR
ILAT =24
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 11.1.0.7.0.
Using parameter settings in server-side spfile /dev/raw/raw28
System parameters with non-default values:
processes = 200
sessions = 225
spfile = "/dev/raw/raw28"
nls_language = "SIMPLIFIED CHINESE"
nls_territory = "CHINA"
sga_target = 1536M
control_files = "/dev/raw/raw29"
control_files = "/dev/raw/raw30"
control_files = "/dev/raw/raw31"
db_block_size = 8192
compatible = "11.1.0.0.0"
undo_tablespace = "UNDOTBS1"
recyclebin = "OFF"
remote_login_passwordfile= "EXCLUSIVE"
db_domain = ""
local_listener = "LISTENER_ORA11G"
audit_file_dest = "/home/oracle/admin/ora11g/adump"
audit_trail = "NONE"
db_name = "ora11g"
open_cursors = 300
pga_aggregate_target = 8464M
diagnostic_dest = "/home/oracle"
Fri Jul 24 05:47:21 2015
PMON started with pid=2, OS id=27064
Fri Jul 24 05:47:21 2015
VKTM started with pid=6, OS id=27066 at elevated priority
VKTM running at (20)ms precision
Fri Jul 24 05:47:21 2015
DIAG started with pid=10, OS id=27070
Fri Jul 24 05:47:21 2015
DBRM started with pid=14, OS id=27072
Fri Jul 24 05:47:21 2015
PSP0 started with pid=18, OS id=27074
Fri Jul 24 05:47:21 2015
DIA0 started with pid=22, OS id=27076
Fri Jul 24 05:47:21 2015
MMAN started with pid=26, OS id=27078
Fri Jul 24 05:47:21 2015
DBW0 started with pid=30, OS id=27080
Fri Jul 24 05:47:22 2015
DBW1 started with pid=3, OS id=27082
Fri Jul 24 05:47:22 2015
DBW2 started with pid=4, OS id=27084
Fri Jul 24 05:47:22 2015
DBW3 started with pid=5, OS id=27086
Fri Jul 24 05:47:22 2015
DBW4 started with pid=34, OS id=27088
Fri Jul 24 05:47:22 2015
DBW5 started with pid=7, OS id=27090
Fri Jul 24 05:47:22 2015
DBW6 started with pid=8, OS id=27092
Fri Jul 24 05:47:22 2015
DBW7 started with pid=9, OS id=27094
Fri Jul 24 05:47:22 2015
LGWR started with pid=11, OS id=27096
Fri Jul 24 05:47:22 2015
CKPT started with pid=38, OS id=27098
Fri Jul 24 05:47:22 2015
SMON started with pid=42, OS id=27100
Fri Jul 24 05:47:22 2015
RECO started with pid=46, OS id=27102
Fri Jul 24 05:47:22 2015
MMON started with pid=50, OS id=27104
Fri Jul 24 05:47:22 2015
MMNL started with pid=54, OS id=27106
ORACLE_BASE from environment = /home/oracle
Fri Jul 24 05:47:22 2015
ALTER DATABASE MOUNT
ORA-214 signalled during: ALTER DATABASE MOUNT...
Fri Jul 24 05:47:22 2015
Checker run found 1 new persistent data failures

上午10点发现数据库宕机

先把数据库shutdown
Fri Jul 24 10:33:11 2015
Instance shutdown complete

查看告警日志控制文件损坏数据库mount不上
转储恢复控制文件：数据库正常打开
startup mount
Fri Jul 24 10:33:17 2015
Adjusting the default value of parameter parallel_max_servers
from 1280 to 185 due to the value of parameter processes (200)
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /home/oracle/product/11g/dbs/arch
Autotune of undo retention is turned on.
IMODE=BR
ILAT =24
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 11.1.0.7.0.
Using parameter settings in server-side spfile /home/oracle/product/11g/dbs/spfileora11g.ora
System parameters with non-default values:
processes = 200
sessions = 225
nls_language = "SIMPLIFIED CHINESE"
nls_territory = "CHINA"
sga_target = 1536M
control_files = "/dev/raw/raw29"
control_files = "/dev/raw/raw31"
db_block_size = 8192
compatible = "11.1.0.0.0"
undo_tablespace = "UNDOTBS1"
recyclebin = "OFF"
remote_login_passwordfile= "EXCLUSIVE"
db_domain = ""
local_listener = "LISTENER_ORA11G"
audit_file_dest = "/home/oracle/admin/ora11g/adump"
audit_trail = "NONE"
db_name = "ora11g"
open_cursors = 300
pga_aggregate_target = 8464M
diagnostic_dest = "/home/oracle"
Fri Jul 24 10:33:17 2015
PMON started with pid=2, OS id=18668
Fri Jul 24 10:33:17 2015
VKTM started with pid=6, OS id=18670 at elevated priority
VKTM running at (20)ms precision
Fri Jul 24 10:33:17 2015
DIAG started with pid=10, OS id=18674
Fri Jul 24 10:33:17 2015
DBRM started with pid=14, OS id=18676
Fri Jul 24 10:33:17 2015
PSP0 started with pid=18, OS id=18678
Fri Jul 24 10:33:17 2015
DIA0 started with pid=22, OS id=18680
Fri Jul 24 10:33:17 2015
MMAN started with pid=26, OS id=18682
Fri Jul 24 10:33:17 2015
DBW0 started with pid=30, OS id=18684
Fri Jul 24 10:33:17 2015
DBW1 started with pid=3, OS id=18686
Fri Jul 24 10:33:17 2015
DBW2 started with pid=4, OS id=18688
Fri Jul 24 10:33:17 2015
DBW3 started with pid=5, OS id=18690
Fri Jul 24 10:33:17 2015
DBW4 started with pid=34, OS id=18692
Fri Jul 24 10:33:17 2015
DBW5 started with pid=7, OS id=18694
Fri Jul 24 10:33:17 2015
DBW6 started with pid=8, OS id=18696
Fri Jul 24 10:33:17 2015
DBW7 started with pid=9, OS id=18698
Fri Jul 24 10:33:17 2015
LGWR started with pid=11, OS id=18700
Fri Jul 24 10:33:17 2015
CKPT started with pid=38, OS id=18702
Fri Jul 24 10:33:17 2015
SMON started with pid=42, OS id=18704
Fri Jul 24 10:33:17 2015
RECO started with pid=46, OS id=18706
Fri Jul 24 10:33:17 2015
MMON started with pid=50, OS id=18708
Fri Jul 24 10:33:17 2015
MMNL started with pid=54, OS id=18710
ORACLE_BASE from environment = /home/oracle
Fri Jul 24 10:33:18 2015
ALTER DATABASE MOUNT
Setting recovery target incarnation to 1
Successful mount of redo thread 1, with mount id 4282081454
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE MOUNT

上午10点33分26秒数据库完全打开
alter database open
Starting background process QMNC
Fri Jul 24 10:33:26 2015
QMNC started with pid=82, OS id=18800
Starting background process SMCO
Fri Jul 24 10:33:26 2015
SMCO started with pid=86, OS id=18802
Completed: ALTER DATABASE OPEN

打开后数据库上报 ora-01013和00353告警
和ORA-01013告警
Fri Jul 24 10:33:27 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_18804.trc:
Fri Jul 24 10:33:28 2015
Starting background process CJQ0
Fri Jul 24 10:33:28 2015
CJQ0 started with pid=94, OS id=18818
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_18804.trc:
ORA-01013: user requested cancel of current operation

Fri Jul 24 10:33:38 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_18804.trc (incident=91922):
ORA-00353: log corruption near block 7640641 change 327466050 time 07/24/2015 05:10:39
ORA-00334: archived log: '/dev/raw/raw40'

Fri Jul 24 10:33:52 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_m000_18892.trc (incident=92114):
ORA-00353: 日志损坏接近块 7640641 更改 327466050 时间 07/24/2015 05:10:39
ORA-00312: 联机日志 2 线程 1: '/dev/raw/raw40'

Fri Jul 24 10:35:59 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_19944.trc (incident=91925):
ORA-00353: log corruption near block 7640641 change 327466050 time 07/24/2015 05:10:39
ORA-00334: archived log: '/dev/raw/raw40'
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_19944.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 7640641 change 327466050 time 07/24/2015 05:10:39
ORA-00334: archived log: '/dev/raw/raw40'

文件系统只读造成ORA-00354: corrupt redo log block header
SQL> alter database clear logfile group 1;

SQL> alter database clear unarchived logfile group 1;

Database altered.
上午10点52分清空了没有归档成功的日志文件raw40

Fri Jul 24 10:52:34 2015

alter database clear unarchived logfile '/dev/raw/raw40'
Clearing online log 2 of thread 1 sequence number 188
Fri Jul 24 10:53:05 2015
Completed: alter database clear unarchived logfile '/dev/raw/raw40'

相应的告警消失
但是数据库一直上报ORA-01013
Fri Jul 24 10:53:18 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_23280.trc:
Fri Jul 24 10:53:20 2015
Trace dumping is performing id=[cdmp_20150724105320]
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_23280.trc:
ORA-01013: user requested cancel of current operation
Fri Jul 24 10:54:23 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_23506.trc:
Fri Jul 24 10:54:25 2015
Trace dumping is performing id=[cdmp_20150724105425]
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_23506.trc:
ORA-01013: user requested cancel of current operation
Fri Jul 24 10:55:29 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_23835.trc:
Fri Jul 24 10:55:30 2015
Trace dumping is performing id=[cdmp_20150724105530]
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_23835.trc:
ORA-01013: user requested cancel of current operation

查看trace文件ora11g_ora_23835.trc
oracle@L2GW-408-A7-APPDB-3:~/diag/rdbms/ora11g/ora11g/trace> more /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_23835.trc

上午10点57分重启数据库
继续报错
ORA-01013: user requested cancel of current operation
Fri Jul 24 10:57:29 2015
SMCO started with pid=70, OS id=24286
Completed: ALTER DATABASE OPEN
Starting background process CJQ0
Fri Jul 24 10:57:30 2015
CJQ0 started with pid=74, OS id=24300
Fri Jul 24 10:57:39 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_24338.trc:
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_24338.trc:
ORA-01013: user requested cancel of current operation

再次查看trace文件ora11g_ora_24338.trc:\

过滤seg/找到itl事物槽的obj

oracle@L2GW-408-A7-APPDB-3:~/diag/rdbms/ora11g/ora11g/trace> more /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_24338.trc
Trace file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_24338.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /home/oracle/product/11g
System name: Linux
Node name: L2GW-408-A7-APPDB-3
Release: 2.6.16.60-0.21-smp
Version: #1 SMP Tue May 6 12:41:02 UTC 2008
Machine: x86_64
Instance name: ora11g
Redo thread mounted by this instance: 1
Oracle process number: 86
Unix process pid: 24338, image: oracle@L2GW-408-A7-APPDB-3

*** 2015-07-24 10:57:39.927
*** SESSION ID:(27.2) 2015-07-24 10:57:39.927
*** CLIENT ID:() 2015-07-24 10:57:39.927
*** SERVICE NAME:(ora11g) 2015-07-24 10:57:39.927
*** MODULE NAME:(SQL Loader Conventional Path Load) 2015-07-24 10:57:39.927
*** ACTION NAME:(01htaccount.ctl) 2015-07-24 10:57:39.927

Block header dump: 0x004260ae
Object id on Block? Y
seg/obj: 0x27 csc: 0x00.13850bd0 itc: 19 flg: O typ: 2 - INDEX
fsl: 0 fnx: 0x4260af ver: 0x01

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x000a.00c.00087156 0x00c6ed59.1842.01 CB-- 0 scn 0x0000.13837fa8
0x02 0x0009.016.00017479 0x00c1fa2d.0aaa.30 C--- 0 scn 0x0000.138478e2
0x03 0x000a.002.0008750e 0x00c88302.184d.13 --U- 2 fsc 0x0014.13850bd1
0x04 0x000a.007.00087169 0x00c87b1a.184c.03 C--- 0 scn 0x0000.1384879a
0x05 0x0009.00d.0001748a 0x00c1faf4.0aaa.1e C--- 0 scn 0x0000.138487aa
0x06 0x000a.020.00086f6e 0x00c87b1c.184c.20 C--- 0 scn 0x0000.13848ae4
0x07 0x000a.001.000871d1 0x00c87b7f.184c.07 C--- 0 scn 0x0000.13849def
0x08 0x000a.010.00087363 0x00c87b81.184c.1d C--- 0 scn 0x0000.13849dfd
0x09 0x000a.019.000871c3 0x00c87b81.184c.24 C--- 0 scn 0x0000.13849e11
0x0a 0x0002.01e.00003e4e 0x00d0cf2d.02f1.27 C--- 0 scn 0x0000.13849e1a
0x0b 0x000a.015.00087281 0x00c88055.184d.16 C--- 0 scn 0x0000.1384d219
0x0c 0x000a.021.000874d5 0x00c882f3.184d.13 C--- 0 scn 0x0000.13850350
0x0d 0x0009.002.000174c7 0x00c1fb8f.0aaa.2e C--- 0 scn 0x0000.1384d2bd
0x0e 0x0009.003.000174ab 0x00c1fb90.0aaa.04 C--- 0 scn 0x0000.1384d322
0x0f 0x0006.011.00003e85 0x00c80ee5.0424.11 C--- 0 scn 0x0000.1384d347
0x10 0x0009.010.0001745f 0x00c1fb73.0aaa.1f C--- 0 scn 0x0000.1384d350
0x11 0x0008.013.000043a4 0x00c18740.03e8.15 C--- 0 scn 0x0000.1384de73
0x12 0x000a.015.0008728d 0x00c882d1.184d.0d C--- 0 scn 0x0000.1384f254
0x13 0x000a.008.0008742f 0x00c882ed.184d.10 C--- 0 scn 0x0000.13850269

Block header dump: 0x004075f0
Object id on Block? Y
seg/obj: 0x25 csc: 0x00.13039a19 itc: 2 flg: - typ: 2 - INDEX
     fsl: 0 fnx: 0x0 ver: 0x01

Itl           Xid                  Uba         Flag Lck        Scn/Fsc
0x01   0x000a.001.00082135 0x00c93059.1798.43 --U-    1 fsc 0x0000.13039a1a
0x02   0x000a.01f.0001e840 0x00ce2f07.0477.1a C---    0 scn 0x0000.0319cf92

SQL> select object_name from dba_objects where object_id=to_number('25','xxxxx');

OBJECT_NAME
----------------------------------------------------------------------------------------------------
I_OBJ4

SQL> select object_name from dba_objects where object_id=to_number('25','xxxxx');

OBJECT_NAME
----------------------------------------------------------------------------------------------------
I_OBJ2

文档ID：1587581.1

Solution

1- Take the backup of the database .

2- start the database in upgrade mode , then execute the utlmmig.sql script.

$sqlplus "/as sysdba"

SQL> shutdown immediate

SQL> startup upgrade

SQL> ALTER SESSION SET NLS_LENGTH_SEMANTICS = BYTE;

SQL> @?/rdbms/admin/utlmmig.sql

Note : the same steps can be used to recreate the indexes , this can be confirmed by checking the LAST_DDL_TIME value in dba_objects before and after executing the above action plan :

for example : select LAST_DDL_TIME from dba_objects where OBJECT_NAME='I_OBJ4';

跑完脚本后数据库14:32分

shutdown immediat

Sun Jul 26 15:33:59 2015
Instance shutdown complete

15点33分51秒启动数据库

SQL>startup

Sun Jul 26 15:33:51 2015
Adjusting the default value of parameter parallel_max_servers
from 1280 to 185 due to the value of parameter processes (200)
Starting ORACLE instance (normal)

15点36分数据库启动完成

Sun Jul 26 15:36:15 2015
QMNC started with pid=70, OS id=15319
Completed: ALTER DATABASE OPEN

跑完脚本数据库启动完成ora-01013告警消失，但是ora-600 6002告警一直上报

Sun Jul 26 15:39:07 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_15715.trc (incident=126183):
ORA-00600: internal error code, arguments: [6002], [0], [255], [2], [0], [], [], [], [], [], [], []
Incident details in: /home/oracle/diag/rdbms/ora11g/ora11g/incident/incdir_126183/ora11g_ora_15715_i126183.trc
Sun Jul 26 15:39:09 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_15713.trc (incident=126087):
ORA-00600: internal error code, arguments: [6002], [0], [255], [2], [0], [], [], [], [], [], [], []
Incident details in: /home/oracle/diag/rdbms/ora11g/ora11g/incident/incdir_126087/ora11g_ora_15713_i126087.trc

重建用户所有索引合计16000+的索引

plsql导出创建脚本

drop在create

2个小时重建

重建索引过程中数据库上报

持续至20:22分14秒（每隔10分钟上报一次）

Sun Jul 26 19:31:29 2015
DDE: Problem Key 'ORA 600 [6002]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
Sun Jul 26 19:41:38 2015
DDE: Problem Key 'ORA 600 [6002]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
Sun Jul 26 19:51:47 2015
DDE: Problem Key 'ORA 600 [6002]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
Sun Jul 26 20:01:56 2015
DDE: Problem Key 'ORA 600 [6002]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
Sun Jul 26 20:12:04 2015
DDE: Problem Key 'ORA 600 [6002]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
Sun Jul 26 20:22:14 2015
DDE: Problem Key 'ORA 600 [6002]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes

索引重建在20:30分左右完成

Sun Jul 26 20:22:14 2015
DDE: Problem Key 'ORA 600 [6002]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
Sun Jul 26 20:36:52 2015
Thread 1 advanced to log sequence 203 (LGWR switch)
Current log# 1 seq# 203 mem# 0: /dev/raw/raw38
Sun Jul 26 20:37:21 2015
Archived Log entry 2 added for thread 1 sequence 202 ID 0xfa968840 dest 1:
Sun Jul 26 22:09:13 2015

数据库恢复正常

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/30204651/viewspace-1752803/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/30204651/viewspace-1752803/