公司的测试服务环境被整崩溃了,简单记录一下错误。
同事发现一个测试数据库无法登录,对数据库执行SHUTDOWN ABORT后,重新启动bootstrap报错:
[ora11g@hpserver2 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Sun Mar 4 00:37:58 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 2555744256 bytes
Fixed Size 2230912 bytes
Variable Size 637535616 bytes
Database Buffers 1862270976 bytes
Redo Buffers 53706752 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-08103: object no longer exists
Process ID: 20964
Session ID: 125 Serial number: 5
数据库怎么会突然出现这么严重的问题,检查告警日志发现了如下的错误:
Errors in file /u02/app/oracle/diag/rdbms/orcl11g/orcl11g/trace/orcl11g_ora_31804.trc (incident=127465):
ORA-00600: internal error code, arguments: [ktecgsc:kcbz_objdchk], [0], [0], [1], [], [], [], [], [], [], [], []
Incident details in: /u02/app/oracle/diag/rdbms/orcl11g/orcl11g/incident/incdir_127465/orcl11g_ora_31804_i127465.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Dumping diagnostic data in directory=[cdmp_20120302133841], requested by (instance=1, sid=31804), summary=[incident=127465].
Sweep [inc][127465]: completed
Sweep [inc2][127465]: completed
Errors in file /u02/app/oracle/diag/rdbms/orcl11g/orcl11g/trace/orcl11g_ora_31900.trc (incident=127449):
ORA-00600: internal error code, arguments: [kkdlGetBaseUser2:authIdType], [0], [27], [GLOBAL_AQ_USER_ROLE], [], [], [], [], [], [], [], []
Incident details in: /u02/app/oracle/diag/rdbms/orcl11g/orcl11g/incident/incdir_127449/orcl11g_ora_31900_i127449.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Dumping diagnostic data in directory=[cdmp_20120302134009], requested by (instance=1, sid=31900), summary=[incident=127449].
Sweep [inc][127449]: completed
Sweep [inc2][127449]: completed
检查第一个ORA-600对应的TRACE,发现出现错误的会话在执行导入:
[ora11g@hpserver2 ~]$ more /u02/app/oracle/diag/rdbms/orcl11g/orcl11g/trace/orcl11g_ora_31804.trc
Trace file /u02/app/oracle/diag/rdbms/orcl11g/orcl11g/trace/orcl11g_ora_31804.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security and Real Application Testing options
ORACLE_HOME = /u02/app/oracle/product/11.2.0/db_1
System name: Linux
Node name: hpserver2.enmotech.com
Release: 2.6.32-100.28.5.el6.x86_64
Version: #1 SMP Wed Feb 2 18:40:23 EST 2011
Machine: x86_64
Instance name: orcl11g
Redo thread mounted by this instance: 1
Oracle process number: 29
Unix process pid: 31804, image: oracle@hpserver2.enmotech.com (TNS V1-V3)
*** 2012-03-02 13:38:34.271
*** SESSION ID:(141.3959) 2012-03-02 13:38:34.271
*** CLIENT ID:() 2012-03-02 13:38:34.271
*** SERVICE NAME:(SYS$USERS) 2012-03-02 13:38:34.271
*** MODULE NAME:(imp@hpserver2.enmotech.com (TNS V1-V3)) 2012-03-02 13:38:34.271
*** ACTION NAME:() 2012-03-02 13:38:34.271
*** SESSION ID:(141.3959) 2012-03-02 13:38:34.270
DATA seg.obj=-2, on-disk bj=27, dsflg=0, dsobj=74, cls=4
Incident 127465 created, dump file: /u02/app/oracle/diag/rdbms/orcl11g/orcl11g/incident/incdir_127465/orcl11g_ora_31804_i127465.trc
ORA-00600: internal error code, arguments: [ktecgsc:kcbz_objdchk], [0], [0], [1], [], [], [], [], [], [], [], []
*** SESSION ID:(141.3959) 2012-03-02 13:38:40.815
DATA seg.obj=0, on-disk bj=27, dsflg=0, dsobj=74, cls=4
和同事确认了一下,确实有个同事在导入数据字典表,而且本来应该导入到单独的用户下,但是导入时忘了添加TOUSER,导致数据被导入到SYS用户下,造成了数据字典的不一致,从而引发了这个问题。
这里并不是要解决这个错误,事实上如果没有完整的备份,数据库已经很难恢复了。这里主要是记录这个问题,希望给其他人引以为戒。
DBA总要做一些高风险的操作,而DBA又有足够的知识和能力导致问题一般情况下难以产生的前提要求得以实现。
比如当前这次事故,如果不是DBA,没有一定专业的知识,无法使用IMP方式来导入数据字典表。而对于缺少了TOUSER参数的情况下,表会导入到SYS用户下,但是这会出现表已经存在的错误,而使得导入失败。而对于DBA来说,这种常见错误并不会引发DBA的足够重视,一个IGNORE=Y参数的使用,最终酿成了这次事故。
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/4227/viewspace-717603/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/4227/viewspace-717603/