今天用户上报DB不能登录的问题,顺藤摸瓜解决DB意外宕机问题。
关键字:Instance terminate ora-07445 IO orapw
[@more@]今天开发的同事上报一个开发测试库登录不了,报没有监听的错误。
处理过程
吸取上次教训(《library cache lock解决一则》中有述)先登录主机验证监听是否开启
果然是没开,上报信息无误。
开启监听
$>lsnrctl start sid
顺便用sys登录DB看了一下DB状态,结果是ORACLE NOT AVAILABLE。原来DB都停了,startup起来。从本机远程登录验证远程登录服务是否恢复,服务恢复。
用户的问题到这里就算是解决了。不过问题如果就这些内容那我自己都觉得写这篇blog没啥意思了。
这时一位同事说了一句:这个库最近怎么老是自己就停了?这一句话让我觉得其中有问题值得研究,我决定查查是什么原因导致这个库“自己就停了”。虽然说只是开发测试环境,但我觉得也不应该这么不稳定。
查看alert_sid.log
发现以下错误导致DB在临晨terminate。
**************************************************************
Tue Jul 19 02:51:57 BEIST 2011
Errors in file /oracle/admin/ccicvsmk/bdump/ccicvsmk_ckpt_393436.trc:
ORA-00206: Message 206 not found; product=RDBMS; facility=ORA
; arguments: [3] [1]
ORA-00202: Message 202 not found; product=RDBMS; facility=ORA
; arguments: [/oracle/oradata/ccicvsmk/control03.ctl]
ORA-27072: Message 27072 not found; product=RDBMS; facility=ORA
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 10
Additional information: 3
……
Tue Jul 19 02:52:55 BEIST 2011
Errors in file /oracle/admin/ccicvsmk/bdump/ccicvsmk_q002_1167456.trc:
ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA
; arguments: [000114e0] [SIGBUS] [unknown code] [0x1000114E0]
ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA
; arguments: [000114e0] [SIGBUS] [unknown code] [0x1000114E0]
ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA
; arguments: [ksihsmratt+0080] [SIGBUS] [unknown code] [0x700000000000050]
Tue Jul 19 02:55:25 BEIST 2011
USER: terminating instance due to error 472
Instance terminated by USER, pid = 1200234
**************************************************************
ORA-00206,ORA-00202,ORA-27072,ORA-07445错误都直接指向文件读写和IO问题。
trace文件ccicvsmk_ckpt_393436.trc中的内容:
**************************************************************
*** 2011-07-19 02:51:57.281
ksedmp: internal or fatal error
ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA
; arguments: [000114e0] [SIGBUS] [unknown code] [0x1000114E0]
ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA
; arguments: [000114e0] [SIGBUS] [unknown code] [0x1000114E0]
ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA
; arguments: [ksihinster+0080] [SIGBUS] [unknown code] [0x7000000000005A2]
ORA-00221: Message 221 not found; product=RDBMS; facility=ORA
ORA-00206: Message 206 not found; product=RDBMS; facility=ORA
; arguments: [3] [1]
ORA-00202: Message 202 not found; product=RDBMS; facility=ORA
; arguments: [/oracle/oradata/ccicvsmk/control03.ctl]
ORA-27072: Message 27072 not found; product=RDBMS; facility=ORA
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: 10
Additional information: 3
ORA-00206: Message 206 not found; product=RDBMS; facility=ORA
; arguments: [3] [1]
ORA-00202: Message 202 not found; product=RDBMS; facility=ORA
; arguments: [/oracle/oradata/ccicvsmk/control02.ctl]
ORA
ksedmp: Obtaining call stack failed twice. not retrying
Exception signal: 10 (SIGBUS), code: 8 (unknown code), addr: 0x103014080, PC: [0x103014080, 03014080]
Registers:
iar: 0000000103014080, msr: a00000000000d0b2
lr: 000000010270b274, cr: 0000000042248228
r00: 000000010000b4e0, r01: 00000001103b11d0, r02: 0000000110231d88,
r03: 0000000000000001, r04: 070000004f755aa0, r05: 000000004e2480cd,
r06: 015776e5b94ba5a3, r07: 1231a94d98df266d, r08: 0950b8fcb8c42c40,
r09: 12e0be826d694b2f, r10: 0000000000000000, r11: 0000000000003600,
r12: 000000010000b4d8, r13: 000000011026ec98, r14: 00000001103b18f0,
r15: 000000000000009a, r16: 0000000000000002, r17: 00000001101b92a0,
r18: 0ffffffffffffed0, r19: 09fffffff000a798, r20: 000000000000000a,
r21: 000000000000000b, r22: 0000000000000006, r23: 0000000000000001,
r24: 0000000104c1f988, r25: 0000000000000001, r26: 0000000000000000,
r27: 0000000000000001, r28: 0000000000000000, r29: 0000000110008c48,
r30: 0000000000000000, r31: 00000001103b1240, ssexhd: crashing the process...
Background_Core_Dump = partial
**************************************************************
trace文件ccicvsmk_q002_1167456.trc中的内容。
**************************************************************
*** 2011-07-19 02:52:55.762
ksedmp: internal or fatal error
ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA
; arguments: [000114e0] [SIGBUS] [unknown code] [0x1000114E0]
ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA
; arguments: [000114e0] [SIGBUS] [unknown code] [0x1000114E0]
ORA-07445: Message 7445 not found; product=RDBMS; facility=ORA
; arguments: [ksihsmratt+0080] [SIGBUS] [unknown code] [0x700000000000050]
ksedmp: Obtaining call stack failed twice. not retrying
Exception signal: 10 (SIGBUS), code: 8 (unknown code), addr: 0x103014080, PC: [0x103014080, 03014080]
Registers:
iar: 0000000103014080, msr: a00000000000d0b2
lr: 000000010270b274, cr: 0000000042448228
r00: 000000010000b4e0, r01: 00000001103b11d0, r02: 0000000110231d88,
r03: 0000000000000001, r04: 070000004f75b1f0, r05: 000000004e248107,
r06: 015776e6ba54bdd4, r07: 1231a95b367c4dd5, r08: 0950b903b1668e40,
r09: 12e0be826d694b2f, r10: 0000000000000000, r11: 0000000000003600,
r12: 000000010000b4d8, r13: 000000011026ec98, r14: 00000001103b18f0,
r15: 000000000000009a, r16: 0000000000000002, r17: 00000001101b92a0,
r18: 070000004f75b4a0, r19: 070000004f75b2e0, r20: 000000000000000a,
r21: 000000000000000b, r22: 0000000000000006, r23: 0000000000000001,
r24: 0000000104c1f988, r25: 0000000000000001, r26: 0000000000000000,
r27: 0000000000000001, r28: 0000000000000000, r29: 0000000110008c48,
r30: 0000000000000000, r31: 00000001103b1240, ssexhd: crashing the process...
Shadow_Core_Dump = PARTIAL
**************************************************************
查询主机OS错误日志。
**************************************************************
$ errpt |more
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
E86653C3 0719025111 P H LVDD I/O ERROR DETECTED BY LVM
AE3E3FAD 0620151911 I O SYSJ2 FSCK FOUND ERRORS
D1E21BA3 0620151611 I S errdemon LOG FILE EXPANDED TO REQUESTED SIZE
A6DF45AA 0620151611 I O RMCdaemon The daemon is started.
AE3E3FAD 0620151511 I O SYSJ2 FSCK FOUND ERRORS
AE3E3FAD 0620151511 I O SYSJ2 FSCK FOUND ERRORS
AE3E3FAD 0620151511 I O SYSJ2 FSCK FOUND ERRORS
AE3E3FAD 0620151511 I O SYSJ2 FSCK FOUND ERRORS
AE3E3FAD 0620151511 I O SYSJ2 FSCK FOUND ERRORS
AE3E3FAD 0620151511 I O SYSJ2 FSCK FOUND ERRORS
E86653C3 0620151211 P H LVDD I/O ERROR DETECTED BY LVM
9DBCFDEE 0620151511 T O errdemon ERROR LOGGING TURNED ON
A39F8A49 0620150411 T S syserrlg ERROR LOGGING BUFFER OVERFLOW
E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM
E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM
E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM
E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM
E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM
E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM
E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM
E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM
E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM
E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM
E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM
E86653C3 0612112511 P H LVDD I/O ERROR DETECTED BY LVM
**************************************************************
果然在临晨02:51:11发生IO错误,然后不到一分钟DB开始报文件读写错误,几分钟后数据库terminate。
将此IO问题上报主机组同事解决。
问题至此告一段落。待主机组同事解决后,再酌情补充。
sys用户无法远程登录
在问题处理过程中还遇到sys用户无法远程登录,顺带处理了,如下
Applies to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
AIX dztest2 3 5 000B06FAD400
Symptoms:
使用SYS用户远程登录,提示insufficient privilege。其他用户无此问题。
Cause:
根据问题现象基本可以确定是orapw密码文件有问题
查询参数remote_login_passwordfile确定DB身份认证方式。
SQL> show parameter remote_login_passwordfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
remote_login_passwordfile string EXCLUSIVE
不是OS认证(OS认证该参数为NONE)则需要orapw密码文件。进入$ORACLE_HOME/dbs没有发现对应的密码文件。问题原因可以确定,没有orapw密码文件。
Solution:
创建密码文件
$>orapwd file=orapwsid password=password entries=num
再次尝试使用用户SYS远程登录,成功。
查看当前具有sysdba权限用户
SQL> select * from v$pwfile_users;
USERNAME SYSDB SYSOP
------------------------------ ----- -----
SYS TRUE TRUE
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/21129591/viewspace-1052718/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/21129591/viewspace-1052718/