测试环境启动oracle rac时,发现数据库无法正常打开。
1.现象如下:
SQL> startup
ORACLE instance started.
Total System Global Area 1068937216 bytes
Fixed Size 2235208 bytes
Variable Size 822084792 bytes
Database Buffers 239075328 bytes
Redo Buffers 5541888 bytes
Database mounted.
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1],
[315], [70505], [70506], [], [], [], [], [], [], []
SQL>
2.查看alert日志:
Mon Jan 04 16:18:48 2016
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
[name='eth1:1', type=1, ip=169.254.12.15, mac=08-00-27-ec-9f-91, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'eth0' configured from GPnP for use as a public interface.
[name='eth0', type=1, ip=192.168.77.185, mac=08-00-27-d9-2f-ac, net=192.168.77.0/24, mask=255.255.255.0, use=public/1]
Public Interface 'eth0:1' configured from GPnP for use as a public interface.
[name='eth0:1', type=1, ip=192.168.77.187, mac=08-00-27-d9-2f-ac, net=192.168.77.0/24, mask=255.255.255.0, use=public/1]
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options.
ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1
System name: Linux
Node name: racnode1.oracle.com
Release: 2.6.18-308.el5
Version: #1 SMP Fri Jan 27 17:17:51 EST 2012
Machine: x86_64
Using parameter settings in server-side pfile /u01/app/oracle/product/11.2.0/dbhome_1/dbs/initRACDB_1.ora
System parameters with non-default values:
processes = 150
spfile = "+DATADG/racdb/spfileracdb.ora"
memory_target = 1G
control_files = "+DATADG/racdb/control01.ctl"
control_files = "+DATADG/racdb/control02.ctl"
db_block_size = 8192
compatible = "11.2.0.0.0"
cluster_database = TRUE
remote_login_passwordfile= "EXCLUSIVE"
db_domain = ""
dispatchers = "(PROTOCOL=TCP) (SERVICE=RACDBXDB)"
remote_listener = "racnode-cluster-scan:1521"
audit_file_dest = "/u01/app/oracle/admin/RACDB/adump"
audit_trail = "DB"
db_name = "RACDB"
open_cursors = 300
diagnostic_dest = "/u01/app/oracle"
Cluster communication is configured to use the following interface(s) for this instance
169.254.12.15
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
.......................................................
.......................................................
Mon Jan 04 16:18:56 2016
LMD0 started with pid=12, OS id=13703
* Load Monitor used for high load check
* New Low - High Load Threshold Range = [1920 - 2560]
Mon Jan 04 16:18:56 2016
LMS0 started with pid=13, OS id=13707 at elevated priority
.......................................................
.......................................................
Mon Jan 04 16:18:56 2016
MMON started with pid=24, OS id=13753
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
Mon Jan 04 16:18:56 2016
MMNL started with pid=25, OS id=13757
starting up 1 shared server(s) ...
NOTE: initiating MARK startup
Starting background process MARK
Mon Jan 04 16:18:58 2016
MARK started with pid=28, OS id=13773
NOTE: MARK has subscribed
lmon registered with NM - instance number 1 (internal mem no 0)
Reconfiguration started (old inc 0, new inc 2)
List of instances:
1 (myinst: 1)
Global Resource Directory frozen ---------这个是什么意思?
* allocate domain 0, invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
Mon Jan 04 16:18:59 2016
LCK0 started with pid=30, OS id=13785
Mon Jan 04 16:18:59 2016
Starting background process RSMN
Mon Jan 04 16:18:59 2016
RSMN started with pid=31, OS id=13789
ORACLE_BASE from environment = /u01/app/oracle
Mon Jan 04 16:19:02 2016
ALTER DATABASE MOUNT
This instance was first to mount
Mon Jan 04 16:19:02 2016
ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.77.187)(PORT=1521))))' SCOPE=MEMORY SID='RACDB_1';
NOTE: Loaded library: System
SUCCESS: diskgroup DATADG was mounted
NOTE: dependency between database RACDB and diskgroup resource ora.DATADG.dg is established
Mon Jan 04 16:19:07 2016
Successful mount of redo thread 1, with mount id 897839478
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
Lost write protection disabled
Completed: ALTER DATABASE MOUNT
Mon Jan 04 16:19:08 2016
ALTER DATABASE OPEN
This instance was first to open
Beginning crash recovery of 1 threads
parallel recovery started with 2 processes
Started redo scan
Completed redo scan
read 4 KB redo, 9 data blocks need recovery
Errors in file /u01/app/oracle/diag/rdbms/racdb/RACDB_1/trace/RACDB_1_ora_13919.trc (incident=11059):
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [315], [70505], [70506], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/racdb/RACDB_1/incident/incdir_11059/RACDB_1_ora_13919_i11059.trc
Mon Jan 04 16:19:11 2016
Dumping diagnostic data in directory=[cdmp_20160104161911], requested by (instance=1, osid=13919), summary=[incident=11059].
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Abort recovery for domain 0
Aborting crash recovery due to error 600 -----------------实例恢复出错导致ora-600错误
Errors in file /u01/app/oracle/diag/rdbms/racdb/RACDB_1/trace/RACDB_1_ora_13919.trc:
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [315], [70505], [70506], [], [], [], [], [], [], []
Abort recovery for domain 0
Errors in file /u01/app/oracle/diag/rdbms/racdb/RACDB_1/trace/RACDB_1_ora_13919.trc:
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [315], [70505], [70506], [], [], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...
Mon Jan 04 16:20:01 2016
Sweep [inc][11059]: completed
Sweep [inc2][11059]: completed
[oracle@racnode1 trace]$
3.查看trace文件
[oracle@racnode1 trace]$ cat /u01/app/oracle/diag/rdbms/racdb/RACDB_1/trace/RACDB_1_ora_13919.trc
Trace file /u01/app/oracle/diag/rdbms/racdb/RACDB_1/trace/RACDB_1_ora_13919.trc
.......................................................................................
.......................................................................................
*** 2016-01-04 16:19:09.223
*** SESSION ID:(17.3) 2016-01-04 16:19:09.223
*** CLIENT ID:() 2016-01-04 16:19:09.223
*** SERVICE NAME:() 2016-01-04 16:19:09.223
*** MODULE NAME:(sqlplus@racnode1.oracle.com (TNS V1-V3)) 2016-01-04 16:19:09.223
*** ACTION NAME:() 2016-01-04 16:19:09.223
2016-01-04 16:19:09.202395 : Start recovery for domain=0, valid=0, flags=0x4
Successfully allocated 2 recovery slaves
Using 67 overflow buffers per recovery slave
Thread 1 checkpoint: logseq 315, block 2, scn 8074391
cache-low rba: logseq 315, block 70496
on-disk rba: logseq 315, block 70506, scn 8093866 ------------ on-disk rba 70506
start recovery at logseq 315, block 70496, scn 0
Reading mirrors of log member '+DATADG/racdb/redo01.log' (thread 1 seq 315 block 70505)
ksfdrfms:Mirror Read file=+DATADG/racdb/redo01.log fob=0x9c438b58 bufp=0x2b13e9de0000 blkno=70505 nbytes=601600
ksfdrfms: Read success from mirror side=1 logical extent number=0 disk=DATADG_0000 path=/dev/asm-diske
Mirror I/O done from ASM disk /dev/asm-diske
ksfdrnms:Mirror Read file=+DATADG/racdb/redo01.log fob=0x9c438b58 bufp=0x2b13e9de0000 nbytes=601600
ksfdrnms: Read success from mirror side=2 logical extent number=1 disk=DATADG_0001 path=/dev/asm-diskf
Mirror I/O done from ASM disk /dev/asm-diskf
ksfdrnms:Mirror Read file=+DATADG/racdb/redo01.log fob=0x9c438b58 bufp=0x2b13e9de0000 nbytes=601600
*** 2016-01-04 16:19:09.515
Started writing zeroblks thread 1 seq 315 blocks 70505-70512
*** 2016-01-04 16:19:09.516
Completed writing zeroblks thread 1 seq 315
==== Redo read statistics for thread 1 ====
Total physical reads (from disk and memory): 4096Kb
-- Redo read_disk statistics --
Read rate (ASYNC): 4Kb in 0.04s => 0.11 Mb/sec
Longest record: 0Kb, moves: 0/16 (0%)
Change moves: 0/5 (0%), moved: 0Mb
Longest LWN: 1Kb, moves: 0/5 (0%), moved: 0Mb
Last redo scn: 0x0000.007b8095 (8093845)
----------------------------------------------
----- Recovery Hash Table Statistics ---------
Hash table buckets = 262144
Longest hash chain = 1
Average hash chain = 9/9 = 1.0
Max compares per lookup = 1
Avg compares per lookup = 10/24 = 0.4
----------------------------------------------
WARNING! Crash recovery of thread 1 seq 315 is
ending at redo block 70505 but should not have ended before
redo block 70506
*** 2016-01-04 16:19:10.329
Incident 11059 created, dump file: /u01/app/oracle/diag/rdbms/racdb/RACDB_1/incident/incdir_11059/RACDB_1_ora_13919_i11059.trc
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [315], [70505], [70506], [], [], [], [], [], [], []
2016-01-04 16:19:11.323857 : Abort recovery for domain 0, flags = 0x4
2016-01-04 16:19:11.323886 : kjb_abort_recovery: domain flags=0x0, valid=0
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [315], [70505], [70506], [], [], [], [], [], [], []
2016-01-04 16:19:11.324075 : Abort recovery for domain 0, flags = 0x4
2016-01-04 16:19:11.324083 : aborting recovery of 0 (0) with cluster inc 2 (0) recovery 1
2016-01-04 16:19:11.324090 : kjb_abort_recovery: domain flags=0x0, valid=0
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1], [315], [70505], [70506], [], [], [], [], [], [], []
[oracle@racnode1 trace]$
重要信息:
WARNING! Crash recovery of thread 1 seq 315 is
ending at redo block 70505 but should not have ended before
redo block 70506
根据报错信息和trace文件可以推断:
该问题应该是由于服务器异常锻炼,导致LGWR写online redo log时失败,下次重新启动数据库时需要做实例恢复,实例恢复需要的所有信息无法在redo log中得到,因为上次断电导致redo log写失败了。
ora-600里几个参数[kcratr_nab_less_than_odr], [1], [315], [70505], [70506], [], [], [], [], [], [], []意思应该如下:
实例1恢复需要用到日志序列号log sequence为315的日志,需要用到的日志快为70506,但是只能读到70505,70505以后的日志块写失败了,无法读到,所以数据库就起不来了。
查看实例1当前日志,确实是315。
SQL> select * from v$log;
GROUP# THREAD# SEQUENCE# BYTES BLOCKSIZE MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM NEXT_CHANGE# NEXT_TIME
---------- ---------- ---------- ---------- ---------- ---------- --- ---------------- ------------- --------- ------------ ---------
1 1 315 52428800 512 1 NO CURRENT 8074391 26-DEC-15 2.8147E+14
2 1 314 52428800 512 1 NO INACTIVE 8051838 26-DEC-15 8074391 26-DEC-15
3 2 43 52428800 512 1 NO CURRENT 4947005 16-SEP-15 2.8147E+14
4 2 42 52428800 512 1 NO INACTIVE 4251425 12-AUG-15 4947005 16-SEP-15
4. 参照MetaLink文档ID 1299564.1的方案来恢复数据库:
recover database until cancel using backup controlfile;
alter database open resetlogs;
参考文章:
eygle的博客 http://www.eygle.com/archives/2010/05/kcratr_nab_less_than_odr.html