aix系统下oracle之ora-00471,SGA: allocation forcing component growth、cursor: pin S wait on X和row cach......

ALTER DATABASE BACKUP CONTROLFILE TO '/usr/mac/tmp/ctrl_macdb.dbf' REUSE

Thu Aug 30 04:26:34 BEIST 2012

Completed: ALTER DATABASE BACKUP CONTROLFILE TO '/usr/mac/tmp/ctrl_macdb.dbf' REUSE

Thu Aug 30 06:12:09 BEIST 2012

Thread 1 advanced to log sequence 6774 (LGWR switch)

Current log# 4 seq# 6774 mem# 0: /dev/rlv_redo41

Thu Aug 30 09:33:42 BEIST 2012

Thread 1 advanced to log sequence 6775 (LGWR switch)

Current log# 5 seq# 6775 mem# 0: /dev/rlv_redo51

Thu Aug 30 11:34:02 BEIST 2012

IPC Send timeout detected. Receiver ospid 336138

Thu Aug 30 11:34:02 BEIST 2012

Errors in file /home/oracle/app/oracle/admin/macdb/bdump/macdb1_pz99_336138.trc:

Thu Aug 30 11:40:35 BEIST 2012

IPC Send timeout detected. Receiver ospid 336138

Thu Aug 30 11:40:35 BEIST 2012

Errors in file /home/oracle/app/oracle/admin/macdb/bdump/macdb1_pz99_336138.trc:

Thu Aug 30 11:45:34 BEIST 2012

MMNL absent for 1201 secs; Foregrounds taking over

Thu Aug 30 11:45:45 BEIST 2012

IPC Send timeout detected. Receiver ospid 336138

Thu Aug 30 11:45:45 BEIST 2012

Errors in file /home/oracle/app/oracle/admin/macdb/bdump/macdb1_pz99_336138.trc:

Thu Aug 30 11:56:52 BEIST 2012

Errors in file /home/oracle/app/oracle/admin/macdb/bdump/macdb1_pmon_360920.trc:

ORA-00471: DBWR process terminated with error

Thu Aug 30 11:56:52 BEIST 2012

PMON: terminating instance due to error 471

Thu Aug 30 11:56:52 BEIST 2012

System state dump is made for local instance

System State dumped to trace file /home/oracle/app/oracle/admin/macdb/bdump/macdb1_diag_217786.trc

Thu Aug 30 11:56:55 BEIST 2012

Shutting down instance (abort)

License high water mark = 2975

Thu Aug 30 11:56:59 BEIST 2012

Instance terminated by PMON, pid = 360920

Thu Aug 30 11:57:05 BEIST 2012

Instance terminated by USER, pid = 6008998

Thu Aug 30 11:57:14 BEIST 2012

Starting ORACLE instance (normal)

sskgpgetexecname failed to get name

LICENSE_MAX_SESSION = 0

LICENSE_SESSIONS_WARNING = 0

Interface type 1 en11 192.168.55.0 configured from OCR for use as a cluster interconnect

Interface type 1 en10 133.0.175.0 configured from OCR for use as  a public interface

Picked latch-free SCN scheme 3

IPC Send timeout detected.信息说明PZ99进程节点间通信超时,查看其后台TRACE文件可以发现如下信息:

/home/oracle/app/oracle/admin/macdb/bdump/macdb1_pz99_336138.trc

Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production

With the Partitioning, Real Application Clusters, OLAP, Data Mining

and Real Application Testing options

ORACLE_HOME = /home/oracle/app/oracle/product/10g

System name:    AIX

Node name:      macdb1

Release:        3

Version:        5

Machine:        00CAF8E64C00

Instance name: macdb1

Redo thread mounted by this instance: 1

Oracle process number: 34

Unix process pid: 336138, image: oracle@macdb1 (PZ99)

*** 2012-08-30 11:28:52.444

*** SERVICE NAME:(SYS$USERS) 2012-08-30 11:28:52.441

*** SESSION ID:(4926.15944) 2012-08-30 11:28:52.441

SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes

SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes

SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes

SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes

SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes

SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes

SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes

SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes

*** 2012-08-30 11:56:48.878

SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes

SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes

SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes

SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes

SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes

SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes

SKGXPSEGRCV: MESSAGE TRUNCATED user data 56 bytes payload 1688 bytes

SKGXPSEGRCV: trucated message buffer data skgxpmsg meta data header 0xffffffffffe6e20 len 56 bytes

SKGXPSEGRCV的信息与Metalink Note描述的问题较为吻合,但是查询其他所有ORACLE进程的后台TRACE没有再发现SKGXPSEGRCV信息,而且PZ99进程之前在等待的是'SGA: allocation forcing component growth'事件:

SO: 700000184b4fb38, type: 2, owner: 0, flag: INIT/-/-/0x00

(process) Oracle pid=34, calls cur/top: 70000011738e4c0/70000018240c1f8, flag: (0) -

int error: 2147494036, call error: 0, sess error: 0, txn error 0

(post info) last post received: 7629 0 4

last post received-location: kslpsr

last process to post me: 700000187b8ece8 1 6

last post sent: 0 0 24

last post sent-location: ksasnd

last process posted by me: 700000187b8ece8 1 6

(latch info) wait_event=0 bits=0

Process Group: DEFAULT, pseudo proc: 700000187d89228

O/S info: user: oracle, term: UNKNOWN, ospid: 336138 (DEAD)

OSD pid info: Unix process pid: 336138, image: oracle@macdb1 (PZ99)

----------------------------------------

SO: 700000181263928, type: 4, owner: 700000184b4fb38, flag: INIT/-/-/0x00

(session) sid: 4926 trans: 0, creator: 700000184b4fb38, flag: (41) USR/- BSY/-/-/-/-/-

DID: 0001-0022-000179C2, short-term DID: 0001-0022-000179C1

txn branch: 0

oct: 3, prv: 0, sql: 0, psql: 0, user: 0/SYS

service name: SYS$USERS

O/S info: user: oracle, term: , ospid: 336138, machine: macdb2

program: oracle@macdb1 (PZ99)

application name: racgimon@macdb2 (TNS V1-V3), hash value=2386761235

last wait for 'SGA: allocation forcing component growth' wait_time=0.000789 sec, seconds since wait started=3

=0, =0, =0

blocking sess=0x0 seq=3823

Dumping Session Wait History

for 'SGA: allocation forcing component growth' count=1 wait_time=0.000789 sec

=0, =0, =0

for 'SGA: allocation forcing component growth' count=1 wait_time=0.000732 sec

=0, =0, =0

for 'SGA: allocation forcing component growth' count=1 wait_time=0.000513 sec

=0, =0, =0

for 'SGA: allocation forcing component growth' count=1 wait_time=0.000521 sec

=0, =0, =0

分析RAC后台诊断进程DIAG进程生成的SYSTEMSTATE DUMP信息:

oracle @ macdb1 : /home/oracle $awk -f ass1033.awk

/home/oracle/app/oracle/admin/macdb/bdump/macdb1_diag_217786.trc

Starting Systemstate 1

..............................................................................

.....

Ass.Awk Version 1.0.33

~~~~~~~~~~~~~~~~~~~~~~

Source file : /home/oracle/app/oracle/admin/macdb/bdump/macdb1_diag_217786.trc

System State 1  (2012-08-30 11:56:52.566)

~~~~~~~~~~~~~~   ~~~~~~~~~~~~~~~~~~~~~~~

1:                                      [DEAD]

2:  last wait for 'pmon timer'

3:  last wait for 'DIAG idle wait'

4:  last wait for 'rdbms ipc message'

5:  last wait for 'rdbms ipc message'

6:  last wait for 'ges remote message'  [DEAD]

7:  last wait for 'gcs remote message'  [DEAD]

8:  last wait for 'gcs remote message'  [DEAD]

9:  last wait for 'gcs remote message'  [DEAD]

10: last wait for 'gcs remote message'  [DEAD]

11: last wait for 'gcs remote message'  [DEAD]

12: last wait for 'gcs remote message'  [DEAD]

13: last wait for 'gcs remote message'  [DEAD]

14: last wait for 'gcs remote message'  [DEAD]

15: last wait for 'gcs remote message'  [DEAD]

16: last wait for 'gcs remote message'  [DEAD]

17: last wait for 'gcs remote message'  [DEAD]

18: last wait for 'gcs remote message'  [DEAD]

19: last wait for 'rdbms ipc message'   [DEAD]

20: last wait for 'rdbms ipc message'   [DEAD]

21: last wait for 'rdbms ipc message'   [DEAD]

22: waiting for 'rdbms ipc message'     [DEAD]

23: waiting for 'rdbms ipc message'     [DEAD]

24: last wait for 'rdbms ipc message'   [DEAD]

25: waiting for 'rdbms ipc message'     [DEAD]

26: last wait for 'rdbms ipc message'   [DEAD]

27: last wait for 'rdbms ipc message'   [DEAD]

28: last wait for 'SGA: allocation forcing component growth' [DEAD]

29: waiting for 'rdbms ipc message'     [DEAD]

30: last wait for 'SGA: allocation forcing component growth' [DEAD]

31: last wait for 'SGA: allocation forcing component growth' [DEAD]

32: last wait for 'cursor: pin S wait on X'[Mutex 5d978f5c] [DEAD]

33: waiting for 'rdbms ipc message'     [DEAD]

34: last wait for 'SGA: allocation forcing component growth' [DEAD]

Cmd: Select

35: waiting for 'SQL*Net message from client' [DEAD]

36: waiting for 'rdbms ipc message'     [DEAD]

37: waiting for 'rdbms ipc message'     [DEAD]

38: waiting for 'SQL*Net message from client' [DEAD]

39: waiting for 'row cache lock'       [Rcache object=70000018ef42ea8,] [DEAD]

40: last wait for 'cursor: pin S wait on X'[Mutex fe59717e] [DEAD]

Cmd: Select

41: waiting for 'row cache lock'       [Rcache object=70000018ef42ea8,] [DEAD]

42: last wait for 'SGA: allocation forcing component growth' [DEAD]

Cmd: Select

43: waiting for 'Streams AQ: qmn coordinator idle wait' [DEAD]

44: last wait for 'row cache lock'     [Rcache object=70000018ef42ea8,] [DEAD]

45: waiting for 'Streams AQ: waiting for time management or cleanup tasks' [DEAD]

46: last wait for 'cursor: pin S wait on X'[Mutex bc5573b6] [DEAD]

47: waiting for 'SQL*Net message from client' [DEAD]

48: last wait for 'cursor: pin S wait on X'[Mutex fe59717e] [DEAD]

Cmd: Select

49: waiting for 'SQL*Net message from client' [DEAD]

50: waiting for 'SQL*Net message from client' [DEAD]

51: waiting for 'SQL*Net message from client' [DEAD]

52: last wait for 'cursor: pin S wait on X'[Mutex fe59717e] [DEAD]

Cmd: Select

53: last wait for 'SGA: allocation forcing component growth' [DEAD]

Cmd: Update

54: last wait for 'cursor: pin S wait on X'[Mutex bc5573b6] [DEAD]

Cmd: Select

55: last wait for 'cursor: pin S wait on X'[Mutex a154ba91] [DEAD]

Cmd: Select

56: last wait for 'SGA: allocation forcing component growth' [DEAD]

Cmd: Select

57: waiting for 'SQL*Net message from client' [DEAD]

58: last wait for 'SGA: allocation forcing component growth' [DEAD]

59: waiting for 'row cache lock'       [Rcache object=70000018ef42ea8,] [DEAD]

60: last wait for 'SGA: allocation forcing component growth' [DEAD]

Cmd: Select

61: last wait for 'cursor: pin S wait on X'[Mutex f3002747] [DEAD]

Cmd: Select

62: last wait for 'row cache lock'     [Rcache object=70000018ef42ea8,] [DEAD]

63: waiting for 'SQL*Net message from client' [DEAD]

64: last wait for 'row cache lock'     [Rcache object=70000018ef42ea8,] [DEAD]

65: last wait for 'row cache lock'     [Rcache object=70000018ef42ea8,] [DEAD]

66: waiting for 'SQL*Net message from client' [DEAD]

67: waiting for 'SQL*Net message from client' [DEAD]

68: waiting for 'SQL*Net message from client' [DEAD]

69: waiting for 'SQL*Net message from client' [DEAD]

70: waiting for 'SQL*Net message from client' [DEAD]

71: waiting for 'SQL*Net message from client' [DEAD]

72: waiting for 'SQL*Net message from client' [DEAD]

73: waiting for 'SQL*Net message from client' [DEAD]

74: waiting for 'SQL*Net message from client' [DEAD]

75: waiting for 'SQL*Net message from client' [DEAD]

76: waiting for 'SQL*Net message from client' [DEAD]

77: waiting for 'SQL*Net message from client' [DEAD]

78: last wait for 'library cache load lock'[LOAD: 70000015c903b10] [DEAD]

Cmd: Select

79: waiting for 'SQL*Net message from client' [DEAD]

80: waiting for 'SQL*Net message from client' [DEAD]

81: waiting for 'SQL*Net message from client' [DEAD]

82: waiting for 'SQL*Net message from client' [DEAD]

83: waiting for 'SQL*Net message from client' [DEAD]

Blockers

~~~~~~~~

Above is a list of all the processes. If they are waiting for a resource

then it will be given in square brackets. Below is a summary of the

waited upon resources, together with the holder of that resource.

Notes:

~~~~~

o A process id of '???' implies that the holder was not found in the

systemstate. (The holder may have released the resource before we

dumped the state object tree of the blocking process).

o Lines with 'Enqueue conversion' below can be ignored *unless*

other sessions are waiting on that resource too. For more, see

http://dlsunuk11.uk.oracle.com/Public/TOOLS/Ass.html#enqcnv)

Resource Holder State

Mutex 5d978f5c    ??? Blocker

Rcache object=70000018ef42ea8,    ??? Blocker

Mutex fe59717e    ??? Blocker

Mutex bc5573b6    ??? Blocker

Mutex a154ba91    ??? Blocker

Mutex f3002747    ??? Blocker

LOAD: 70000015c903b10    ??? Blocker

Blockers According to Tracefile Wait Info:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. This may not work for 64bit platforms. See bug 2902997 for details.

2. If the blocking process is shown as 0 then that session may no longer be

present.

3. If resources are held across code layers then sometimes the tracefile wait

info will not recognise the problem.

No blockers seen.

Object Names

~~~~~~~~~~~~

Mutex 5d978f5c

Rcache object=70000018ef42ea8,  cid=14(dc_profiles)

Mutex fe59717e

Mutex bc5573b6

Mutex a154ba91

Mutex f3002747

LOAD: 70000015c903b10

Summary of Wait Events Seen (count>10)

~~~~~~~~~~~~~~~~~~~~~~~~~~~

No wait events seen more than 10 times

可以从对SYSTEMSTATE DUMP的分析中看到,大量进程在等待'cursor: pin S wait on X' 'SGA: allocation forcing component growth' 和 'row cache lock'。

其中'row cache lock'等待的对象是(Rcache object=70000018ef42ea8,  cid=14(dc_profiles),dc_profiles字典缓存是session登陆的重要资源,该字典缓存锁被持有会导致新的会话无法登陆。

由于系统hang住的原因11:00-12:00的AWR未正常收集,但是参考10:00-11:00的AWR可以发现shared pool存在SHRINK操作:

Cache Sizes

Begin        End

Buffer Cache:        4,816M        4,832M        Std Block Size:        16K

Shared Pool Size:        1,264M        1,248M        Log Buffer:        30,708K

Shared Pool由1264MB收缩至1248MB,共享池的收缩可能导致部分关键的字典缓存被锁住或者FLUSH OUT,进而引起严重的解析等待和row cache lock,这将阻塞新的会话登陆。

但是在DIAG进程的SYSTEM DUMP中未体现出Blocker阻塞进程的信息,这可能是DIAG进程收集信息存在时间限制,在指定时间内仅够DUMP不完整的process信息。

由于缺乏完整的SYSTEMSTATE DUMP信息将阻碍我们进一步诊断分析问题,所以我们建议在系统hang住无反映的情况下,先收集SYSTEMSTATE DUMP再重启数据库实例,收集的方法如下:

--获取SYSTEMSTATE的方法:

使用ORADEBUG做

sqlplus / as sysdba

oradebug setmypid;

oradebug unlimit;

oradebug dump systemstate 266;

! sleep 60

oradebug dump systemstate 266;

oradebug tracefile_name;

建议

1.通过修改AUTO SGA自动内存管理模式为手动SGA管理(SGA_TARGET=0),避免shared pool的shrink行为,从而避免上述故障再次发生。

2.对于系统hang住的情况,优先收集systemstate dump再重启实例,方法如上文所述。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值