ORA-27300,ORA-27301,ORA-27302,ORA-27303导致实例宕的问题分析

4.29上午,生产环境数据库用户无法登陆,提示归档日志错误,只能以超级管理员登陆,但是关于日志的相关操作都hang住,由于时间紧迫,未做hanganzlyze操作,正常停库过程也hang,只能以shutdown abort停掉实例,然后在重新启库过程中正常,未遇到错误提示,数据库正常打开,操作暂时恢复,持续有10分钟左右,中间日志切换无反应,用户登录又hang,于是再做shutdown abort->startup,过5分钟后数据库正常,linux系统,11.2.0.4版本数据库,那么接下来分析问题:

1、  根据数据库警告日志:

  TueApr 28 18:42:53 2015

Errors in file/u01/app/oracle/diag/rdbms/iokbl/iokbl/trace/iokbl_lgwr_128898.trc:

ORA-27152: attempt topost process failed

ORA-27300: OS system dependentoperation:semop failed with status: 7

ORA-27301: OS failuremessage: Argument list too long

ORA-27302: failureoccurred at: sskgpwvp2

ORA-27303: additionalinformation: old_semid post_count 0x158003 39

LGWR (ospid: 128898):terminating the instance due to error 27152

Tue Apr 28 18:42:53 2015

System state dump requested by (instance=1,osid=128898 (LGWR)), summary=[abnormal instance termination].

System State dumped to trace file/u01/app/oracle/diag/rdbms/iokbl/iokbl/trace/iokbl_diag_128882_20150428184253.trc

Dumping diagnostic data indirectory=[cdmp_20150428184253], requested by (instance=1, osid=128898 (LGWR)),summary=[abnormal instance termination].

Instance terminated byLGWR, pid = 128898

Wed Apr 29 10:14:18 2015

Starting ORACLE instance(normal)

          通过红色字体,能够了解到,在4.28日18:42:53时,数据库报错,提示 尝试发送进程失败,操作系统问题,status为7,失败信息为:参数列表过长等等,然后看到LGWR进程导致实例终止。最后在4.29日10:14:18启动实例。LGWR进程为ORALCE核心进程之一,此进程若die 则ORACLE实例会宕。

2、观察iokbl_lgwr_128898.trc追踪文件,发现:

*** 2015-04-2818:42:53.168

Warning: log write elapsed time 33787ms, size 1KB

error 27152detected in background process

ORA-27152:attempt to post process failed

ORA-27300: OSsystem dependent operation:semop failed with status: 7

ORA-27301: OSfailure message: Argument list too long

ORA-27302:failure occurred at: sskgpwvp2

ORA-27303: additionalinformation: old_semid post_count 0x158003 39

kjzduptcctx:Notifying DIAG for crash event

----- AbridgedCall Stack Trace -----

ksedsts()+465<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+63<-ksuitm()+5570<-ksbrdp()+3507<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253

----- End ofAbridged Call Stack Trace -----

 

*** 2015-04-2818:42:53.234

LGWR (ospid:128898): terminating the instance due to error 27152

ksuitm: waitingup to [5] seconds before killing DIAG(128882)

写日志消耗33787ms,大小1KB,IO已明显迟缓。LGWR日志会记录超过500MS的信息。

3、分析SYSTEM STATE DUMP文件,此处直接用ass.awk工具转换:

Ass.Awk Version 1.0.9 - Processing/u01/app/oracle/diag/rdbms/iokbl/iokbl/trace/iokbl_diag_128882_20150428184253.trc

 

System State 1

~~~~~~~~~~~~~~~~

1:                                     

2: 0: waiting for 'pmon timer'        

3: 0: waiting for 'rdbms ipc message' 

4: 0: waiting for 'VKTM Logical Idle Wait'

5: 0: waiting for 'rdbms ipc message' 

6: 0: waiting for 'DIAG idle wait'    

7: 0: waiting for 'rdbms ipc message' 

8: 0: waiting for 'DIAG idle wait'    

9: 0: waiting for 'rdbms ipc message' 

10: 0: waiting for 'rdbms ipc message' 

11: 0: waiting for 'rdbms ipc message' 

12: 0: waiting for 'rdbms ipc message' 

13: 0: waiting for 'rdbms ipc message' 

14:                                    

15: 0: waiting for 'rdbms ipc message' 

16: 0: waiting for 'smon timer'        

17: 0: waiting for 'rdbms ipc message' 

18: 0: waiting for 'rdbms ipc message' 

19: 0: waiting for 'ASM background timer'

20: 0: waiting for 'rdbms ipc message' 

21: 0: waiting for 'rdbms ipc message' 

22:                                    

23: 0: waiting for 'wait for unread messageon broadcast channel'

24:                                    

25: 0: waiting for 'log file sync'     

26: 0: waiting for 'class slave wait'  

27: 0: waiting for 'SQL*Net message fromclient'

28: 0: waiting for 'rdbms ipc message' 

29: 0: waiting for 'rdbms ipc message' 

30: 0: waiting for 'control file sequentialread'

31: 0: waiting for 'rdbms ipc message' 

32: 0: waiting for 'rdbms ipc message' 

33: 0: waiting for 'class slave wait'  

34: 0: waiting for 'Streams AQ: qmncoordinator idle wait'

35: 0: waiting for 'log file sync'     

36: 0: waiting for 'log file sync'     

37: 0: waiting for 'log file sync'     

38: 0: waiting for 'SQL*Net message fromclient'

39: 0: waiting for 'log file sync'     

40: 0: waiting for 'log file sync'     

41: 0: waiting for 'log file sync'     

42: 0: waiting for 'log file sync'     

43: 0: waiting for 'log file sync'     

44: 0: waiting for 'SQL*Net message fromclient'

45: 0: waiting for 'SQL*Net message fromclient'

46: 0: waiting for 'rdbms ipc message' 

47: 0: waiting for 'log file sync'     

48: 0: waiting for 'SQL*Net message fromclient'

49: 0: waiting for 'log file sync'     

50: 0: waiting for 'Space Manager: slaveidle wait'

51: 0: waiting for 'SQL*Net message fromclient'

52: 0: waiting for 'log file sync'     

53: 0: waiting for 'SQL*Net message fromclient'

54: 0: waiting for 'log file sync'     

55: 0: waiting for 'log file sync'     

56: 0: waiting for 'log file sync'     

57: 0: waiting for 'Streams AQ: qmn slaveidle wait'

58: 0: waiting for 'log file sync'     

59: 0: waiting for 'log file sync'     

60: 0: waiting for 'SQL*Net message fromclient'

61: 0: waiting for 'rdbms ipc message' 

62: 0: waiting for 'log file sync'     

63: 0: waiting for 'log file sync'     

64: 0: waiting for 'Streams AQ: qmn slaveidle wait'

66: 0: waiting for 'class slave wait'  

67: 0: waiting for 'class slave wait'  

68: 0: waiting for 'SQL*Net message fromclient'

69: 0: waiting for 'log file sync'     

70: 0: waiting for 'log file sync'     

71: 0: waiting for 'SQL*Net message fromclient'

72: 9: waited for 'Streams AQ: waiting fortime management or cleanup tasks'

73: 0: waiting for 'log file sync'     

74: 0: waiting for 'log file sync'     

75: 0: waiting for 'log file sync'     

76: 0: waiting for 'log file sync'     

77: 0: waiting for 'log file sync'     

78: 0: waiting for 'log file sync'     

79: 0: waiting for 'log file sync'     

80: 0: waiting for 'log file sync'     

81: 0: waiting for 'log file sync'     

82: 0: waiting for 'log file sync'     

83: 0: waiting for 'log file sync'     

84: 0: waiting for 'log file sync'     

85: 0: waiting for 'log file sync'     

86: 0: waiting for 'log file sync'     

87: 0: waiting for 'log file sync'     

88: 0: waiting for 'log file sync'     

89: 0: waiting for 'log file sync'     

90: 0: waiting for 'log file sync'     

 

NO BLOCKING PROCESSES FOUND

 

 

126139 Lines Processed.

可以看出,主要等待事件为log file sync,那么初步分析为此事件导致LGWR进程死掉从而导致实例宕。

4、根据oracle官方文档介绍,ora-27300---ora-27303错误的介绍是:

27300, 00000, "OS system dependentoperation:%s failed with status: %s"

// *Cause: OS system call error

// *Action: contact Oracle Support

通过 MOS查找以上报错信息,未找到status为7的troubleshouting。

5、根据AWR 报告显示,在17:00-18:00之间,在数据库实例宕之前,IO问题已经明显,如图:


6、再看操作系统日志:

Apr 28 18:43:24 iokbldb1 kernel: sd1:0:2:1: [sdag] Unhandled error code

Apr 28 18:43:24 iokbldb1 kernel: sd1:0:2:1: [sdag]

Apr 28 18:43:24 iokbldb1 kernel: Result:hostbyte=DID_ERROR driverbyte=DRIVER_OK

Apr 28 18:43:24 iokbldb1 kernel: sd1:0:2:1: [sdag] CDB:

Apr 28 18:43:24 iokbldb1 kernel: Read(10):28 00 00 0b f4 00 00 04 00 00

Apr 28 18:43:24 iokbldb1 kernel:end_request: I/O error, dev sdag, sector 783360

Apr 28 18:43:24 iokbldb1 kernel:device-mapper: multipath: Failing path 66:0.

Apr 28 18:43:24 iokbldb1 multipathd: 66:0:mark as failed

Apr 28 18:43:24 iokbldb1 multipathd:mpathc: remaining active paths: 7

Apr 28 18:43:25 iokbldb1 kernel: sd1:0:1:1: [sdab] Unhandled error code

Apr 28 18:43:25 iokbldb1 kernel: sd1:0:1:1: [sdab]

Apr 28 18:43:25 iokbldb1 kernel: Result:hostbyte=DID_ERROR driverbyte=DRIVER_OK

Apr 28 18:43:25 iokbldb1 kernel: sd1:0:1:1: [sdab] CDB:

Apr 28 18:43:25 iokbldb1 kernel: Read(10):28 00 00 04 48 00 00 04 00 00

Apr 28 18:43:25 iokbldb1 kernel:end_request: I/O error, dev sdab, sector 280576

Apr 28 18:43:25 iokbldb1 kernel:device-mapper: multipath: Failing path 65:176.

另通过dmesg|grep error可以看到:

IO是有问题的。

日志一直是关于磁盘错误提示,分析可能为存储问题,导致数据库读写文件等待时间长,最终导致LGWR进程死掉,实例宕。
第二天,IBM存储工程师检测存储时发现主机与存储之间链路有问题,更换光纤后正常。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值