第一次遇到ORA-04031的错误!!!!

29 篇文章 0 订阅
27 篇文章 0 订阅

早上CRM的人说Exadata的数据库无法登陆,一看发现节点一个节点Hang住了,使用SQL*Plus登陆会卡死,且日志中提示:无法分配共享内存。

查看alert日志发现,其中出现了很多ORA-04031的错:

Errors in file /u01/app/oracle/diag/rdbms/srcbfin/SRCBFIN2/trace/SRCBFIN2_smon_12501.trc:

ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^e949e218","kglHeapInitialize:temp")

Tue Sep 25 08:55:46 2012

Errors in file /u01/app/oracle/diag/rdbms/srcbfin/SRCBFIN2/trace/SRCBFIN2_smon_12501.trc:

ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^e949e218","kglHeapInitialize:temp")

Errors in file /u01/app/oracle/diag/rdbms/srcbfin/SRCBFIN2/trace/SRCBFIN2_smon_12501.trc:

ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^e949e218","kglHeapInitialize:temp")

Tue Sep 25 08:56:06 2012

Errors in file /u01/app/oracle/diag/rdbms/srcbfin/SRCBFIN2/trace/SRCBFIN2_smon_12501.trc:

ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^e949e218","kglHeapInitialize:temp")

Errors in file /u01/app/oracle/diag/rdbms/srcbfin/SRCBFIN2/trace/SRCBFIN2_smon_12501.trc:

ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select t.ts#,t.file#,t.block...","KGLH0^19d2357e","kglHeapInitialize:temp")

Tue Sep 25 08:56:26 2012

Errors in file /u01/app/oracle/diag/rdbms/srcbfin/SRCBFIN2/trace/SRCBFIN2_smon_12501.trc:

ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","unknown object","KGLH0^e949e218","kglHeapInitialize:temp")

 

截取trace文件的部分内容如下:

*** 2012-09-25 05:18:53.209

AUTO MEM: backup fail a, 0, 25296896, 384, 385, 0, 100

 

*** 2012-09-25 05:18:53.209

=================================

Begin 4031 Diagnostic Information

=================================

The following information assists Oracle in diagnosing

causes of ORA-4031 errors.  This trace may be disabled

by setting the init.ora _4031_dump_bitvec = 0

=====================================

Allocation Request Summary Informaton

=====================================

Current information setting:  04014fff

  SGA Heap Dump Interval=3600 seconds

  Dump Interval=300 seconds

  Last Dump Time=09/25/2012 05:18:52

  Dump Count=1

Allocation request for: kglHeapInitialize:temp

  Heap: 0xc565e0020, size: 32

******************************************************

HEAP DUMP heap name="sga heap"  desc=0x60001190

 extent sz=0x9800 alt=248 het=32767 rec=9 flg=-126 opc=0

 parent=(nil) owner=(nil) nex=(nil) xsz=0x1 heap=(nil)

 fl2=0x60, nex=(nil)

 ds for latch 1: 0x60056bf0 0x60058448 0x60059ca0 0x6005b4f8

 ds for latch 2: 0x600604b8 0x60061d10 0x60063568 0x60064dc0

 ds for latch 3: 0x60069d80 0x6006b5d8 0x6006ce30 0x6006e688

 ds for latch 4: 0x60073648 0x60074ea0 0x600766f8 0x60077f50

 ds for latch 5: 0x6007cf10 0x6007e768 0x6007ffc0 0x60081818

 ds for latch 6: 0x600867d8 0x60088030 0x60089888 0x6008b0e0

 reserved granule count 0 (granule size 134217728)

******************************************************

HEAP DUMP heap name="KGLH0^e949e218"  desc=0xc565e0020

 extent sz=0xfe8 alt=32767 het=56 rec=0 flg=2 opc=2

 parent=0x60001190 owner=0xc565dffd0 nex=(nil) xsz=0x8000000 heap=(nil)

 fl2=0x26, nex=(nil)

 Subheap has 0 bytes of memory allocated

====================

Process State Object

====================

----------------------------------------

SO: 0xc00d41558, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3

 proc=0xc00d41558, name=process, file=ksu.h LINE:12616, pg=0

(process) Oracle pid:24, ser:1, calls cur/top: 0xb9713fb58/0xc1147da10

          flags : (0x16) SYSTEM

          flags2: (0x0),  flags3: (0x10)

          intr error: 0, call error: 0, sess error: 0, txn error 0

          intr queue: empty

  ksudlp FALSE at location: 0

  (post info) last post received: 4248 0 2

              last post received-location: ksl2.h LINE:2362 ID:kslpsr

              last process to post me: c00d404a8 1 6

              last post sent: 0 0 26

              last post sent-location: ksa2.h LINE:285 ID:ksasnd

              last process posted by me: c00d404a8 1 6

  (latch info) wait_event=0 bits=0

  Process Group: DEFAULT, pseudo proc: 0xc58e7a120

  O/S info: user: oracle, term: UNKNOWN, ospid: 12501

  OSD pid info: Unix process pid: 12501, image: oracle@dm01db02.srcb.com (SMON)

PSO child state object changes :

Dump of memory from 0x0000000C10CE8E58 to 0x0000000C10CE9060

C10CE8E50                   00000000 00000000          [........]

C10CE8E60 00000000 00000000 00000000 00000000  [................]

  Repeat 31 times

=========================

User Session State Object

=========================

----------------------------------------

SO: 0xc20ea7708, type: 4, owner: 0xc00d41558, flag: INIT/-/-/0x00 if: 0x3 c: 0x3

 proc=0xc00d41558, name=session, file=ksu.h LINE:12624, pg=0

(session) sid: 1 ser: 1 trans: (nil), creator: 0xc00d41558

          flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-

          flags2: (0x40409) -/-/INC

          DID: , short-term DID:

          txn branch: (nil)

          oct: 0, prv: 0, sql: (nil), psql: 0xc4c326750, user: 0/SYS

ksuxds FALSE at location: 0

service name: SYS$BACKGROUND

Current Wait Stack:

  Not in wait; last wait ended 0.000602 sec ago

Wait State:

  fixed_waits=0 flags=0x21 boundary=(nil)/-1

Session Wait History:

    elapsed time of 0.000629 sec since last wait

 0: waited for 'SGA: allocation forcing component growth'

    =0x0, =0x0, =0x0

    wait_id=5546611 seq_num=42378 snap_id=101

    wait times: snap=0.000000 sec, exc=5.041349 sec, total=5.050030 sec

    wait times: max=infinite

    wait counts: calls=100 os=100

    occurred after 0.000000 sec of elapsed time

 1: waited for 'SGA: allocation forcing component growth'

    =0x0, =0x0, =0x0

    wait_id=5546711 seq_num=42377 snap_id=1

    wait times: snap=0.000101 sec, exc=0.000101 sec, total=0.000101 sec

    wait times: max=infinite

    wait counts: calls=1 os=1

    occurred after 0.000000 sec of elapsed time

 

Sampled Session History of session 1 serial 1

---------------------------------------------------

The sampled session history is constructed by sampling

the target session every 1 second. The sampling process

captures at each sample if the session is in a non-idle wait,

an idle wait, or not in a wait. If the session is in a

non-idle wait then one interval is shown for all the samples

the session was in the same non-idle wait. If the

session is in an idle wait or not in a wait for

consecutive samples then one interval is shown for all

the consecutive samples. Though we display these consecutive

samples  in a single interval the session may NOT be continuously

idle or not in a wait (the sampling process does not know).

 

The history is displayed in reverse chronological order.

 

sample interval: 1 sec, max history 120 sec

---------------------------------------------------

  [1 sample,                                                          05:18:52]

    waited for 'SGA: allocation forcing component growth', seq_num: 42362

      p1: ''=0x0

      p2: ''=0x0

      p3: ''=0x0

      time_waited: >= 0 sec (still in wait)

  [1 sample,                                                          05:18:51]

    waited for 'SGA: allocation forcing component growth', seq_num: 42322

      p1: ''=0x0

      p2: ''=0x0

      p3: ''=0x0

      time_waited: 0.050906 sec (sample interval: 0 sec)

  [1 sample,                                                          05:18:50]

    waited for 'SGA: allocation forcing component growth', seq_num: 42282

      p1: ''=0x0

      p2: ''=0x0

      p3: ''=0x0

      time_waited: 0.050941 sec (sample interval: 0 sec)

    idle wait at each sample

  [5 samples not collected,                                05:17:59 - 05:18:03]

  [55 samples,                                             05:17:04 - 05:17:58]

    idle wait at each sample

  [5 samples not collected,                                05:16:59 - 05:17:03]

  [17 samples,                                             05:16:53 - 05:16:58]

    idle wait at each sample

---------------------------------------------------

Sampled Session History Summary:

  longest_non_idle_wait: 'SGA: allocation forcing component growth'

  [1 sample, 05:18:50 ]

      time_waited: 0.050941 sec (sample interval: 0 sec)

---------------------------------------------------

temporary object counter: 0

----------------------------------------

 

从以上日志中,会发现存在很多和SGA分配的等待事件:SGA:allocation forcing component growth,看来这个问题可以暂时确定为是“SGA被撑爆了”。这个等待事件在AWR报告中也可以看到:

Top 5 Timed Foreground Events

Event

Waits

Time(s)

Avg wait (ms)

% DB time

Wait Class

DB CPU

 

129

 

26.90

 

SGA: allocation forcing component growth

612

105

172

21.96

Other

latch: shared pool

11

88

8040

18.48

Concurrency

SQL*Net more data to client

384,182

55

0

11.59

Network

cell smart table scan

6,645

16

2

3.44

User I/O

 

SGA中的两个主要的区域是Share Pool和Buffer Cache,先使用以下语句查看这个两个区域在的值在SGA中所占比例的变化情况:

共享池最近3天的变化趋势 :

select to_char(b.end_interval_time,'yyyy-mm-dd hh24:mi:ss') eit,a.value/1024/1024

from dba_hist_parameter a ,dba_hist_snapshot b

where a.snap_id=b.snap_id

and a.instance_number=b.instance_number

and a.instance_number=2

and a.parameter_name like '%shared_pool_size%'

and b.end_interval_time>=trunc(sysdate)-3

order by b.end_interval_time;

 

db cache最近3天的变化趋势:

select to_char(b.end_interval_time,'yyyy-mm-dd hh24:mi:ss') eit,a.value/1024/1024

from dba_hist_parameter a ,dba_hist_snapshot b

where a.snap_id=b.snap_id

and a.instance_number=b.instance_number

and a.instance_number=2

and a.parameter_name like '%db_cache_size%'

and b.end_interval_time>=trunc(sysdate)-3

order by b.end_interval_time;

 

把结果放到excel里面生成如下图,其中蓝色区域为SharePool的值,红色区域为BufferCache的值:


从上图中可以看出,其实从昨天开始,SharePool就不断侵占BufferCache的空间,最终导致BufferCache空间所剩无几,直至应用失败。这个在AWR报告中也可以看出来:

Cache Sizes

Begin

End

Buffer Cache:

256M

256M

Std Block Size:

8K

Shared Pool Size:

48,128M

48,128M

Log Buffer:

35,252K

 

看来此次的故障是由于SGA中SharedPool被某些回话(作业)所大量侵占之后,导致其他回话无法申请SharePool所致,并且那些侵占SharedPool的回话(作业)还在不断的申请SharedPool空间。后来经过排查发现是最近应用上了个程序补数据,并且是使用游标的方式,同时由于应用程序没仔细检查,导致无限循环,作业无法释放资源所致。

解决方法:

1、  重启实例

2、  由于之前SharedPool和BufferCache都是自动管理的,没有设置为最小固定值,所以SharedPool才会侵占BufferCache的空间,因此修改SharedPool和BufferCache为固定值,且保留一定的冗余,但是这个并不是避免该问题的方案,因为如果SharedPool再次被回话(作业)完全侵占也是会出现此问题。


可以参考一下网上大牛们关于ORA-04031错误的分析:

Eygle:               DBA手记:共享池的改进与ORA-04031的变化
 老熊的三分地:ORA-04031案例一则
 DBA Notes:   如何解决ORA-04031 错误




 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值