OC4J ORA600 6002(IBM 4700磁阵控制器重启造成的数据库宕机故障)

IBM磁阵重启后数据库上报:(凌晨5点41分数据库宕机)


 Fri Jul 24 02:00:00 2015

clearing Resource Manager plan via parameter
 Fri Jul 24 05:41:31 2015
********************* ATTENTION: ******************** 
 The controlfile header block returned by the OS
 has a sequence number that is too old. 
 The controlfile might be corrupted.
 PLEASE DO NOT ATTEMPT TO START UP THE INSTANCE 
 without following the steps below.
 RE-STARTING THE INSTANCE CAN CAUSE SERIOUS DAMAGE 
 TO THE DATABASE, if the controlfile is truly corrupted.
 In order to re-start the instance safely, 
 please do the following:
 (1) Save all copies of the controlfile for later 
     analysis and contact your OS vendor and Oracle support.
 (2) Mount the instance and issue: 
     ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
 (3) Unmount the instance. 
 (4) Use the script in the trace file to
     RE-CREATE THE CONTROLFILE and open the database. 
*****************************************************
CKPT (ospid: 1103): terminating the instance
Instance terminated by CKPT, pid = 1103


凌晨5点47分21秒分数据库重启(由于控制文件损坏数据库没有mount上)
Fri Jul 24 05:47:21 2015
Adjusting the default value of parameter parallel_max_servers
from 1280 to 185 due to the value of parameter processes (200)
Starting ORACLE instance (normal)
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /home/oracle/product/11g/dbs/arch
Autotune of undo retention is turned on. 
IMODE=BR
ILAT =24
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 11.1.0.7.0.
Using parameter settings in server-side spfile /dev/raw/raw28
System parameters with non-default values:
  processes                = 200
  sessions                 = 225
  spfile                   = "/dev/raw/raw28"
  nls_language             = "SIMPLIFIED CHINESE"
  nls_territory            = "CHINA"
  sga_target               = 1536M
  control_files            = "/dev/raw/raw29"
  control_files            = "/dev/raw/raw30"
  control_files            = "/dev/raw/raw31"
  db_block_size            = 8192
  compatible               = "11.1.0.0.0"
  undo_tablespace          = "UNDOTBS1"
  recyclebin               = "OFF"
  remote_login_passwordfile= "EXCLUSIVE"
  db_domain                = ""
  local_listener           = "LISTENER_ORA11G"
  audit_file_dest          = "/home/oracle/admin/ora11g/adump"
  audit_trail              = "NONE"
  db_name                  = "ora11g"
  open_cursors             = 300
  pga_aggregate_target     = 8464M
  diagnostic_dest          = "/home/oracle"
Fri Jul 24 05:47:21 2015
PMON started with pid=2, OS id=27064 
Fri Jul 24 05:47:21 2015
VKTM started with pid=6, OS id=27066 at elevated priority
VKTM running at (20)ms precision
Fri Jul 24 05:47:21 2015
DIAG started with pid=10, OS id=27070 
Fri Jul 24 05:47:21 2015
DBRM started with pid=14, OS id=27072 
Fri Jul 24 05:47:21 2015
PSP0 started with pid=18, OS id=27074 
Fri Jul 24 05:47:21 2015
DIA0 started with pid=22, OS id=27076 
Fri Jul 24 05:47:21 2015
MMAN started with pid=26, OS id=27078 
Fri Jul 24 05:47:21 2015
DBW0 started with pid=30, OS id=27080 
Fri Jul 24 05:47:22 2015
DBW1 started with pid=3, OS id=27082 
Fri Jul 24 05:47:22 2015
DBW2 started with pid=4, OS id=27084 
Fri Jul 24 05:47:22 2015
DBW3 started with pid=5, OS id=27086 
Fri Jul 24 05:47:22 2015
DBW4 started with pid=34, OS id=27088 
Fri Jul 24 05:47:22 2015
DBW5 started with pid=7, OS id=27090 
Fri Jul 24 05:47:22 2015
DBW6 started with pid=8, OS id=27092 
Fri Jul 24 05:47:22 2015
DBW7 started with pid=9, OS id=27094 
Fri Jul 24 05:47:22 2015
LGWR started with pid=11, OS id=27096 
Fri Jul 24 05:47:22 2015
CKPT started with pid=38, OS id=27098 
Fri Jul 24 05:47:22 2015
SMON started with pid=42, OS id=27100 
Fri Jul 24 05:47:22 2015
RECO started with pid=46, OS id=27102 
Fri Jul 24 05:47:22 2015
MMON started with pid=50, OS id=27104 
Fri Jul 24 05:47:22 2015
MMNL started with pid=54, OS id=27106 
ORACLE_BASE from environment = /home/oracle
Fri Jul 24 05:47:22 2015
ALTER DATABASE   MOUNT
ORA-214 signalled during: ALTER DATABASE   MOUNT...
Fri Jul 24 05:47:22 2015
Checker run found 1 new persistent data failures

上午10点发现数据库宕机

先把数据库shutdown
Fri Jul 24 10:33:11 2015
Instance shutdown complete

查看告警日志控制文件损坏数据库mount不上
转储恢复控制文件:数据库正常打开
startup mount
Fri Jul 24 10:33:17 2015
Adjusting the default value of parameter parallel_max_servers
from 1280 to 185 due to the value of parameter processes (200)
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /home/oracle/product/11g/dbs/arch
Autotune of undo retention is turned on. 
IMODE=BR
ILAT =24
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 11.1.0.7.0.
Using parameter settings in server-side spfile /home/oracle/product/11g/dbs/spfileora11g.ora
System parameters with non-default values:
  processes                = 200
  sessions                 = 225
  nls_language             = "SIMPLIFIED CHINESE"
  nls_territory            = "CHINA"
  sga_target               = 1536M
  control_files            = "/dev/raw/raw29"
  control_files            = "/dev/raw/raw31"
  db_block_size            = 8192
  compatible               = "11.1.0.0.0"
  undo_tablespace          = "UNDOTBS1"
  recyclebin               = "OFF"
  remote_login_passwordfile= "EXCLUSIVE"
  db_domain                = ""
  local_listener           = "LISTENER_ORA11G"
  audit_file_dest          = "/home/oracle/admin/ora11g/adump"
  audit_trail              = "NONE"
  db_name                  = "ora11g"
  open_cursors             = 300
  pga_aggregate_target     = 8464M
  diagnostic_dest          = "/home/oracle"
Fri Jul 24 10:33:17 2015
PMON started with pid=2, OS id=18668 
Fri Jul 24 10:33:17 2015
VKTM started with pid=6, OS id=18670 at elevated priority
VKTM running at (20)ms precision
Fri Jul 24 10:33:17 2015
DIAG started with pid=10, OS id=18674 
Fri Jul 24 10:33:17 2015
DBRM started with pid=14, OS id=18676 
Fri Jul 24 10:33:17 2015
PSP0 started with pid=18, OS id=18678 
Fri Jul 24 10:33:17 2015
DIA0 started with pid=22, OS id=18680 
Fri Jul 24 10:33:17 2015
MMAN started with pid=26, OS id=18682 
Fri Jul 24 10:33:17 2015
DBW0 started with pid=30, OS id=18684 
Fri Jul 24 10:33:17 2015
DBW1 started with pid=3, OS id=18686 
Fri Jul 24 10:33:17 2015
DBW2 started with pid=4, OS id=18688 
Fri Jul 24 10:33:17 2015
DBW3 started with pid=5, OS id=18690 
Fri Jul 24 10:33:17 2015
DBW4 started with pid=34, OS id=18692 
Fri Jul 24 10:33:17 2015
DBW5 started with pid=7, OS id=18694 
Fri Jul 24 10:33:17 2015
DBW6 started with pid=8, OS id=18696 
Fri Jul 24 10:33:17 2015
DBW7 started with pid=9, OS id=18698 
Fri Jul 24 10:33:17 2015
LGWR started with pid=11, OS id=18700 
Fri Jul 24 10:33:17 2015
CKPT started with pid=38, OS id=18702 
Fri Jul 24 10:33:17 2015
SMON started with pid=42, OS id=18704 
Fri Jul 24 10:33:17 2015
RECO started with pid=46, OS id=18706 
Fri Jul 24 10:33:17 2015
MMON started with pid=50, OS id=18708 
Fri Jul 24 10:33:17 2015
MMNL started with pid=54, OS id=18710 
ORACLE_BASE from environment = /home/oracle
Fri Jul 24 10:33:18 2015
ALTER DATABASE   MOUNT
Setting recovery target incarnation to 1
Successful mount of redo thread 1, with mount id 4282081454
Database mounted in Exclusive Mode
Lost write protection disabled
Completed: ALTER DATABASE   MOUNT

上午10点33分26秒数据库完全打开
alter database open
Starting background process QMNC
Fri Jul 24 10:33:26 2015
QMNC started with pid=82, OS id=18800 
Starting background process SMCO
Fri Jul 24 10:33:26 2015
SMCO started with pid=86, OS id=18802 
Completed: ALTER DATABASE OPEN

打开后数据库上报 ora-01013和00353告警
和ORA-01013告警
Fri Jul 24 10:33:27 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_18804.trc:
Fri Jul 24 10:33:28 2015
Starting background process CJQ0
Fri Jul 24 10:33:28 2015
CJQ0 started with pid=94, OS id=18818 
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_18804.trc:
ORA-01013: user requested cancel of current operation

Fri Jul 24 10:33:38 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_18804.trc  (incident=91922):
ORA-00353: log corruption near block 7640641 change 327466050 time 07/24/2015 05:10:39
ORA-00334: archived log: '/dev/raw/raw40'

Fri Jul 24 10:33:52 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_m000_18892.trc  (incident=92114):
ORA-00353: 日志损坏接近块 7640641 更改 327466050 时间 07/24/2015 05:10:39
ORA-00312: 联机日志 2 线程 1: '/dev/raw/raw40'

Fri Jul 24 10:35:59 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_19944.trc  (incident=91925):
ORA-00353: log corruption near block 7640641 change 327466050 time 07/24/2015 05:10:39
ORA-00334: archived log: '/dev/raw/raw40'
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_19944.trc:
ORA-00354: corrupt redo log block header
ORA-00353: log corruption near block 7640641 change 327466050 time 07/24/2015 05:10:39
ORA-00334: archived log: '/dev/raw/raw40'

文件系统只读造成ORA-00354: corrupt redo log block header
SQL> alter database clear logfile group 1;

SQL> alter database clear unarchived logfile group 1;

Database altered.
上午10点52分清空了没有归档成功的日志文件raw40

Fri Jul 24 10:52:34 2015

alter database clear unarchived logfile '/dev/raw/raw40'
Clearing online log 2 of thread 1 sequence number 188
Fri Jul 24 10:53:05 2015
Completed: alter database clear unarchived logfile '/dev/raw/raw40'

相应的告警消失
但是数据库一直上报ORA-01013
Fri Jul 24 10:53:18 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_23280.trc:
Fri Jul 24 10:53:20 2015
Trace dumping is performing id=[cdmp_20150724105320]
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_23280.trc:
ORA-01013: user requested cancel of current operation
Fri Jul 24 10:54:23 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_23506.trc:
Fri Jul 24 10:54:25 2015
Trace dumping is performing id=[cdmp_20150724105425]
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_23506.trc:
ORA-01013: user requested cancel of current operation
Fri Jul 24 10:55:29 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_23835.trc:
Fri Jul 24 10:55:30 2015
Trace dumping is performing id=[cdmp_20150724105530]
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_23835.trc:
ORA-01013: user requested cancel of current operation

查看trace文件ora11g_ora_23835.trc
oracle@L2GW-408-A7-APPDB-3:~/diag/rdbms/ora11g/ora11g/trace> more /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_23835.trc

上午10点57分重启数据库
继续报错
ORA-01013: user requested cancel of current operation
Fri Jul 24 10:57:29 2015
SMCO started with pid=70, OS id=24286 
Completed: ALTER DATABASE OPEN
Starting background process CJQ0
Fri Jul 24 10:57:30 2015
CJQ0 started with pid=74, OS id=24300 
Fri Jul 24 10:57:39 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_24338.trc:
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_24338.trc:
ORA-01013: user requested cancel of current operation

再次查看trace文件ora11g_ora_24338.trc:\

过滤seg/找到itl事物槽的obj


 oracle@L2GW-408-A7-APPDB-3:~/diag/rdbms/ora11g/ora11g/trace> more /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_24338.trc
Trace file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_24338.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /home/oracle/product/11g
System name:    Linux
Node name:      L2GW-408-A7-APPDB-3
Release:        2.6.16.60-0.21-smp
Version:        #1 SMP Tue May 6 12:41:02 UTC 2008
Machine:        x86_64
Instance name: ora11g
Redo thread mounted by this instance: 1
Oracle process number: 86
Unix process pid: 24338, image: oracle@L2GW-408-A7-APPDB-3


*** 2015-07-24 10:57:39.927
*** SESSION ID:(27.2) 2015-07-24 10:57:39.927
*** CLIENT ID:() 2015-07-24 10:57:39.927
*** SERVICE NAME:(ora11g) 2015-07-24 10:57:39.927
*** MODULE NAME:(SQL Loader Conventional Path Load) 2015-07-24 10:57:39.927
*** ACTION NAME:(01htaccount.ctl) 2015-07-24 10:57:39.927

Block header dump:  0x004260ae
 Object id on Block? Y
 seg/obj: 0x27  csc: 0x00.13850bd0  itc: 19  flg: O  typ: 2 - INDEX
     fsl: 0  fnx: 0x4260af ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.00c.00087156  0x00c6ed59.1842.01  CB--    0  scn 0x0000.13837fa8
0x02   0x0009.016.00017479  0x00c1fa2d.0aaa.30  C---    0  scn 0x0000.138478e2
0x03   0x000a.002.0008750e  0x00c88302.184d.13  --U-    2  fsc 0x0014.13850bd1
0x04   0x000a.007.00087169  0x00c87b1a.184c.03  C---    0  scn 0x0000.1384879a
0x05   0x0009.00d.0001748a  0x00c1faf4.0aaa.1e  C---    0  scn 0x0000.138487aa
0x06   0x000a.020.00086f6e  0x00c87b1c.184c.20  C---    0  scn 0x0000.13848ae4
0x07   0x000a.001.000871d1  0x00c87b7f.184c.07  C---    0  scn 0x0000.13849def
0x08   0x000a.010.00087363  0x00c87b81.184c.1d  C---    0  scn 0x0000.13849dfd
0x09   0x000a.019.000871c3  0x00c87b81.184c.24  C---    0  scn 0x0000.13849e11
0x0a   0x0002.01e.00003e4e  0x00d0cf2d.02f1.27  C---    0  scn 0x0000.13849e1a
0x0b   0x000a.015.00087281  0x00c88055.184d.16  C---    0  scn 0x0000.1384d219
0x0c   0x000a.021.000874d5  0x00c882f3.184d.13  C---    0  scn 0x0000.13850350
0x0d   0x0009.002.000174c7  0x00c1fb8f.0aaa.2e  C---    0  scn 0x0000.1384d2bd
0x0e   0x0009.003.000174ab  0x00c1fb90.0aaa.04  C---    0  scn 0x0000.1384d322
0x0f   0x0006.011.00003e85  0x00c80ee5.0424.11  C---    0  scn 0x0000.1384d347
0x10   0x0009.010.0001745f  0x00c1fb73.0aaa.1f  C---    0  scn 0x0000.1384d350
0x11   0x0008.013.000043a4  0x00c18740.03e8.15  C---    0  scn 0x0000.1384de73
0x12   0x000a.015.0008728d  0x00c882d1.184d.0d  C---    0  scn 0x0000.1384f254
0x13   0x000a.008.0008742f  0x00c882ed.184d.10  C---    0  scn 0x0000.13850269
 



 
Block header dump:  0x004075f0
 Object id on Block? Y
 seg/obj: 0x25  csc: 0x00.13039a19  itc: 2  flg: -  typ: 2 - INDEX
     fsl: 0  fnx: 0x0 ver: 0x01
 
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x000a.001.00082135  0x00c93059.1798.43  --U-    1  fsc 0x0000.13039a1a
0x02   0x000a.01f.0001e840  0x00ce2f07.0477.1a  C---    0  scn 0x0000.0319cf92



SQL> select object_name from dba_objects where object_id=to_number('25','xxxxx');

OBJECT_NAME
----------------------------------------------------------------------------------------------------
I_OBJ4


SQL> select object_name from dba_objects where object_id=to_number('25','xxxxx');

OBJECT_NAME
----------------------------------------------------------------------------------------------------
I_OBJ2


文档ID:1587581.1

Solution

1- Take the backup of the database .

2- start the database in upgrade mode , then execute the utlmmig.sql script.

$sqlplus "/as sysdba"

SQL> shutdown immediate

SQL> startup upgrade

SQL> ALTER SESSION SET NLS_LENGTH_SEMANTICS = BYTE;

SQL> @?/rdbms/admin/utlmmig.sql

 

Note : the same steps can be used to recreate the indexes , this can be confirmed by checking the LAST_DDL_TIME value in dba_objects before and after executing the above action plan :

for example : select LAST_DDL_TIME from dba_objects where OBJECT_NAME='I_OBJ4'; 


跑完脚本后数据库14:32分

shutdown  immediat

Sun Jul 26 15:33:59 2015
Instance shutdown complete



15点33分51秒启动数据库

SQL>startup

Sun Jul 26 15:33:51 2015
Adjusting the default value of parameter parallel_max_servers
from 1280 to 185 due to the value of parameter processes (200)
Starting ORACLE instance (normal)


15点36分数据库启动完成

Sun Jul 26 15:36:15 2015
QMNC started with pid=70, OS id=15319
Completed: ALTER DATABASE OPEN


跑完脚本数据库启动完成ora-01013告警消失,但是ora-600  6002告警一直上报

Sun Jul 26 15:39:07 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_15715.trc  (incident=126183):
ORA-00600: internal error code, arguments: [6002], [0], [255], [2], [0], [], [], [], [], [], [], []
Incident details in: /home/oracle/diag/rdbms/ora11g/ora11g/incident/incdir_126183/ora11g_ora_15715_i126183.trc
Sun Jul 26 15:39:09 2015
Errors in file /home/oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_15713.trc  (incident=126087):
ORA-00600: internal error code, arguments: [6002], [0], [255], [2], [0], [], [], [], [], [], [], []
Incident details in: /home/oracle/diag/rdbms/ora11g/ora11g/incident/incdir_126087/ora11g_ora_15713_i126087.trc

重建用户所有索引合计16000+的索引

plsql导出创建脚本

drop在create

2个小时重建


重建索引过程中数据库上报

持续至20:22分14秒(每隔10分钟上报一次)

Sun Jul 26 19:31:29 2015
DDE: Problem Key 'ORA 600 [6002]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
Sun Jul 26 19:41:38 2015
DDE: Problem Key 'ORA 600 [6002]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
Sun Jul 26 19:51:47 2015
DDE: Problem Key 'ORA 600 [6002]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
Sun Jul 26 20:01:56 2015
DDE: Problem Key 'ORA 600 [6002]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
Sun Jul 26 20:12:04 2015
DDE: Problem Key 'ORA 600 [6002]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
Sun Jul 26 20:22:14 2015
DDE: Problem Key 'ORA 600 [6002]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes


索引重建在20:30分左右完成


Sun Jul 26 20:22:14 2015
DDE: Problem Key 'ORA 600 [6002]' was completely flood controlled (0x6)
Further messages for this problem key will be suppressed for up to 10 minutes
Sun Jul 26 20:36:52 2015
Thread 1 advanced to log sequence 203 (LGWR switch)
  Current log# 1 seq# 203 mem# 0: /dev/raw/raw38
Sun Jul 26 20:37:21 2015
Archived Log entry 2 added for thread 1 sequence 202 ID 0xfa968840 dest 1:
Sun Jul 26 22:09:13 2015


数据库恢复正常

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/30204651/viewspace-1752803/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/30204651/viewspace-1752803/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值