oracle exp 04031,ORA-04031 故障详细处理案例一则。

weixin_39924179

于 2021-04-04 03:08:10 发布

阅读量171

点赞数

文章标签： oracle exp 04031

博文地址：ORA-04031 故障解决案例详细分析一则(彩色生动标注版)

环境：

Oracle 10g

AIX 6.1

HACMP 6

现象：

1.数据库Down。

2.alert日志中频繁报ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool",...)错误。

3.数据库每晚的逻辑导出报错：

TESTDB:/expbackup#>tail -200 backehr140322.log

EXP-00056: ORACLE error 1034 encountered

ORA-01034: ORACLE not available

ORA-27101: shared memory realm does not exist

IBM AIX RISC System/6000 Error: 2: No such file or directory

EXP-00005: all allowable logon attempts failed

EXP-00000: Export terminated unsuccessfully

即使是手动发起备份脚本，也是报同样的错误，备份无法进行，但是每晚的RMAN定时备份能正常进行，每天的备份集状态正常。

分析：

Oracle的解释：

[oracle@TESTDB ~]$ oerr ORA 4031

04031, 00000, "unable to allocate %s bytes of shared memory (\"%s\",\"%s\",\"%s\",\"%s\")"

// *Cause: More shared memory is needed than was allocated in the shared

// pool or Streams pool.

// *Action: If the shared pool is out of memory, either use the

// DBMS_SHARED_POOL package to pin large packages,

// reduce your use of shared memory, or increase the amount of

// available shared memory by increasing the value of the

// initialization parameters SHARED_POOL_RESERVED_SIZE and

// SHARED_POOL_SIZE.

// If the large pool is out of memory, increase the initialization

// parameter LARGE_POOL_SIZE.

// If the error is issued from an Oracle Streams or XStream process,

// increase the initialization parameter STREAMS_POOL_SIZE or increase

// the capture or apply parameter MAX_SGA_SIZE.

查阅了一些Oracle的资料，发现ORA-04031错误一般可能由于两个原因引起的：

1.内存中存在大量碎片，导致在分配内存的时候，没有连续的内存可存放，此问题一般是需要在开发的角度上入手，比如增加绑定变量，减少应解析来改善和避免；

2.内存容量不足，需要扩大内存。

Alert Log在数据库Down之前的报了很多ORA-04031错误：

Sat Mar 22 19:00:09 2014

Errors in file /oracle/admin/oratest/bdump/oratest_j001_233886.trc:

ORA-12012: error on auto execute of job 8950

ORA-04031: unable to allocate ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","select name,password,datats#...","Typecheck","kgghteInit")

ORA-06512: at "SYS.DBMS_SPACE", line 3289

ORA-06512: at line 1

ORA-00604: error occurred at recursive SQL level 5

ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","select name,password,datats#...","Typecheck","kgghteInit")

ORA-06512: at "SYS.PRVT_ADVISOR", line 1624

ORA-06512: at "SYS.DBMS_ADVISOR", line 186

ORA-06512: at "SYS.DBMS_SPACE", line 1500

ORA-06512: at "SYS.DBMS_SPACE", line 1566

bytes of shared memory ("","","","")

Sat Mar 22 20:28:28 2014

Errors in file /oracle/admin/oratest/bdump/orahtest_j000_278630.trc:

ORA-12012: error on auto execute of job 42781

ORA-04031: unable to allocate ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","select x.inst_id,x.indx+1,ks...","Typecheck","kgghtInit")

ORA-06512: at "SYS.DBMS_ISCHED", line 3047

ORA-06512: at "SYS.DBMS_SCHEDULER", line 1926

ORA-06512: at "EXFSYS.DBMS_RLMGR_DR", line 3150

ORA-01403: no data found

ORA-06512: at line 1

bytes of shared memory ("","","","")

Sat Mar 22 20:55:46 2014

Errors in file /oracle/admin/oratest/bdump/oratest_reco_319580.trc:

ORA-04031: unable to allocate 4120 bytes of shared memory ("shared pool","select host,userid,password,...","Typecheck","kgghteInit")

Sat Mar 22 20:55:46 2014

RECO: terminating instance due to error 4031

Instance terminated by RECO, pid = 319580

其中重点查看导致Instance Down掉的这条日志，是由于进程号为319580的RECO进程导致的，提示是无法在shared pool中分配4120bytes的内存块，具体的信息需要分析一下数据库的跟踪文件：orahr_reco_319580.trc，在这里，不要用tkprof之类的工具进行处理，直接用文本编辑器打开分析，其中比较值得注意的地方有：

==============================

Memory Utilization of Subpool 1

================================

Allocation Name Size

_________________________ __________

"free memory " 4306672

"miscellaneous " 304

"Undo Meta Data " 144

"dpslut_kfdsg " 512

"File Space Usage " 11336

"trace buffer " 950272

"trace_knlasg " 504

由此看到，虽然free memory并不大，但是其确实大于报错中提到的4120bytes，由此看来，有可能引起该报错的原因有两个，就是上文中提到的：

1.确实有碎片；2.内存可能不足。

在取数据分析的时候，正好是上班时间，因此只能通过一些临时手段进行该问题的处理，比较有效的方法是：

刷共享池：alter system flush shared_pool;

作此操作一定要注意！千万不要在业务繁忙的时候做，很有可能引起宕机！可以在中午或者是下班时刻执行，执行之前，如果有条件，最好做一次全备，不论是逻辑导出还是RMAN备份。

内存不足这个原因需要做进一步的分析，用数据说话，不能单凭感觉和经验。

跟客户和相关负责人了解了一下情况，得知客户在不久前刚进行过一次数据库的迁移，我感觉可能是由于在迁移过程中有地方疏忽造成的隐患。

由于之前帮客户处理数据库问题的时候，抓过数据库的各种信息，所以手头上有当时数据库的详细状态记录；经查看parameter后，发现，有很多参数跟迁移前不同，其中sga_target和sga_max_size这两个值比以前小几十倍。

因此初步断定，当前的内存确实可能较之前比非常不足，跟客户反映后，客户同意无业务时间进行参数修改和重启数据库的工作。

于是在下班后，将数据库参数修改，命令如下：

alter system set sga_max_size=5000M scope=spfile;

alter system set sga_target=5000M scope=spfile;

重启数据库后，手动发起逻辑备份脚本，发现执行成功！

待做完全备以后，将数据库切换到HACMP的另一节点，修改参数，重启，导出均成功后，确定操作有效，判断正确。

至此为止基本的处理已经告一段落，但是问题的解决要除根！具体是因为哪个环节导致的这个隐患的存在一定要找出来，否则每一次迁移都要承受数据库宕机之苦。

经调查，发现当初迁移的时候，是将参数文件通过create pfile from spfile生成的，然后拷贝到另一台机器上进行的startup from pfile=‘***’。

后来打开这个pfile一看，里面只有短短的10几行，但是在数据库迁移前后我发现有20多个参数是有区别的，因此判断是在恢复操作中很多没有明文声明的参数可能是按照默认的值设定的，但是这些参数在迁移之前可能是有人为了适应应用环境手动调整过的。

因此，恢复参数文件的方法是存在问题的，单纯的移动到目的地，然后通过startup pfile='***'看来是会导致很多参数无法复原到之前的状态。

在此，我建议参数文件用Oracle RMAN去恢复，因此在备份的时候，一定要注意备份相关的控制文件和参数文件，备份相关的操作演示请参阅：RMAN备份、恢复实验室之备份篇【备份实验】

恢复演示如下：

1.启动实例并恢复参数文件：

RMAN> startup nomount;

startup failed: ORA-01078: failure in processing system parameters

LRM-00109: could not open parameter file '/oracle/app/oracle/product/11.2.0/db_1/dbs/initorcl.ora'

starting Oracle instance without parameter file for retrieval of spfile

Oracle instance started

Total System Global Area 158662656 bytes

Fixed Size 2211448 bytes

Variable Size 92275080 bytes

Database Buffers 58720256 bytes

Redo Buffers 5455872 bytes

RMAN> restore spfile from '/oracle/app/oracle/product/11.2.0/db_1/dbs/c-1347976258-20140318-01';

Starting restore at 18-MAR-14

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=96 device type=DISK

channel ORA_DISK_1: restoring spfile from AUTOBACKUP /oracle/app/oracle/product/11.2.0/db_1/dbs/c-1347976258-20140318-01

channel ORA_DISK_1: SPFILE restore from AUTOBACKUP complete

Finished restore at 18-MAR-14

2.设置DBID:

RMAN> set dbid=1347976258

executing command: SET DBID

3.恢复控制文件：

RMAN> restore controlfile from '/oracle/app/oracle/product/11.2.0/db_1/dbs/c-1347976258-20140318-01';

Starting restore at 18-MAR-14

using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=125 device type=DISK

channel ORA_DISK_1: restoring control file

channel ORA_DISK_1: restore complete, elapsed time: 00:00:08

output file name=/oracle/app/oradata/ORCL/controlfile/o1_mf_8xm4g27m_.ctl

Finished restore at 18-MAR-14

4.将数据库置于MOUNT状态：

RMAN> alter database mount;

database mounted

released channel: ORA_DISK_1

5.查看备份集信息：

由于控制文件中有备份信息，因此恢复控制文件后便可看到具体的备份信息：

List of Backups

===============

Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag

------- -- -- - ----------- --------------- ------- ------- ---------- ---

37 B 0 A DISK 02-AUG-13 1 1 NO TAG20130802T054625

38 B 0 A DISK 02-AUG-13 1 1 NO TAG20130802T054625

39 B 1 A DISK 02-AUG-13 1 1 NO TAG20130802T055026

40 B 1 A DISK 02-AUG-13 1 1 NO TAG20130802T055026

41 B 1 A DISK 18-OCT-13 1 1 NO TAG20131018T005620

42 B 1 A DISK 18-OCT-13 1 1 NO TAG20131018T005620

43 B A A DISK 19-NOV-13 1 1 NO TAG20131119T013353

44 B 0 A DISK 19-NOV-13 1 1 NO TAG20131119T013521

45 B 0 A DISK 19-NOV-13 1 1 NO TAG20131119T013521

46 B A A DISK 19-NOV-13 1 1 NO TAG20131119T013801

47 B A A DISK 07-FEB-14 1 1 NO TAG20140207T002846

48 B 1 A DISK 07-FEB-14 1 1 NO TAG20140207T003003

49 B A A DISK 07-FEB-14 1 1 NO TAG20140207T003109

50 B F A DISK 07-FEB-14 1 1 NO TAG20140207T003110

51 B A A DISK 07-FEB-14 1 1 NO TAG20140207T011201

52 B 1 A DISK 07-FEB-14 1 1 NO TAG20140207T011203

53 B A A DISK 07-FEB-14 1 1 NO TAG20140207T011249

54 B F A DISK 07-FEB-14 1 1 NO TAG20140207T011250

55 B A A DISK 07-FEB-14 1 1 NO TAG20140207T013906

56 B 0 A DISK 07-FEB-14 1 1 NO TAG20140207T013908

57 B A A DISK 07-FEB-14 1 1 NO TAG20140207T014045

58 B F A DISK 07-FEB-14 1 1 NO TAG20140207T014046

59 B 1 A DISK 10-MAR-14 1 1 NO ZHANGZICHAO_STUDY

60 B F A DISK 10-MAR-14 1 1 NO TAG20140310T035229

61 B A A DISK 18-MAR-14 1 1 NO TAG20140318T032915

62 B 1 A DISK 18-MAR-14 1 1 NO TAG20140318T033619

63 B A A DISK 18-MAR-14 1 1 NO TAG20140318T034112

64 B F A DISK 18-MAR-14 1 1 NO TAG20140318T034114

65 B F A DISK 18-MAR-14 1 1 NO TAG20140318T034213

具体查看备份集的方法有很多，请参阅：RMAN备份、恢复实验室之备份篇【查看备份情况】

确认备份集都在之后，进行数据库恢复即可！

6. Restore Database:

RMAN> restore database;

Starting restore at 18-MAR-14

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=133 device type=DISK

channel ORA_DISK_1: starting datafile backup set restore

channel ORA_DISK_1: specifying datafile(s) to restore from backup set

channel ORA_DISK_1: restoring datafile 00001 to /oracle/app/oradata/ORCL/datafile/o1_mf_system_9h7jv8g1_.dbf

channel ORA_DISK_1: restoring datafile 00002 to /oracle/app/oradata/ORCL/datafile/o1_mf_sysaux_8xm483f7_.dbf

channel ORA_DISK_1: restoring datafile 00003 to /oracle/app/oradata/ORCL/datafile/o1_mf_undotbs1_8xm483gd_.dbf

channel ORA_DISK_1: restoring datafile 00004 to /oracle/app/oradata/ORCL/datafile/o1_mf_users_9ksobds9_.dbf

channel ORA_DISK_1: restoring datafile 00005 to /oracle/app/oradata/ORCL/datafile/o1_mf_example_9h7hct57_.dbf

channel ORA_DISK_1: restoring datafile 00006 to /oracle/app/oradata/catalog_rman/rmantbs01.dbf

channel ORA_DISK_1: reading from backup piece /oracle/app/oracle/product/11.2.0/db_1/dbs/1rp002ls_1_1

channel ORA_DISK_1: piece handle=/oracle/app/oracle/product/11.2.0/db_1/dbs/1rp002ls_1_1 tag=TAG20140207T013908

channel ORA_DISK_1: restored backup piece 1

channel ORA_DISK_1: restore complete, elapsed time: 00:06:39

Finished restore at 18-MAR-14

7. Recoer Database:

RMAN> recover database;

Starting recover at 18-MAR-14

using channel ORA_DISK_1

channel ORA_DISK_1: starting incremental datafile backup set restore

channel ORA_DISK_1: specifying datafile(s) to restore from backup set

destination for restore of datafile 00002: /oracle/app/oradata/ORCL/datafile/o1_mf_sysaux_9lgs3p0k_.dbf

destination for restore of datafile 00003: /oracle/app/oradata/ORCL/datafile/o1_mf_undotbs1_9lgs3pmp_.dbf

destination for restore of datafile 00004: /oracle/app/oradata/ORCL/datafile/o1_mf_users_9lgs3q3d_.dbf

destination for restore of datafile 00005: /oracle/app/oradata/ORCL/datafile/o1_mf_example_9lgs3p6t_.dbf

destination for restore of datafile 00006: /oracle/app/oradata/catalog_rman/rmantbs01.dbf

channel ORA_DISK_1: reading from backup piece /oracle/app/oracle/product/11.2.0/db_1/dbs/1up2pv13_1_1

channel ORA_DISK_1: piece handle=/oracle/app/oracle/product/11.2.0/db_1/dbs/1up2pv13_1_1 tag=ZHANGZICHAO_STUDY

channel ORA_DISK_1: restored backup piece 1

channel ORA_DISK_1: restore complete, elapsed time: 00:00:11

channel ORA_DISK_1: starting incremental datafile backup set restore

channel ORA_DISK_1: specifying datafile(s) to restore from backup set

destination for restore of datafile 00001: /oracle/app/oradata/ORCL/datafile/o1_mf_system_9lgs3ov6_.dbf

destination for restore of datafile 00002: /oracle/app/oradata/ORCL/datafile/o1_mf_sysaux_9lgs3p0k_.dbf

destination for restore of datafile 00003: /oracle/app/oradata/ORCL/datafile/o1_mf_undotbs1_9lgs3pmp_.dbf

destination for restore of datafile 00004: /oracle/app/oradata/ORCL/datafile/o1_mf_users_9lgs3q3d_.dbf

destination for restore of datafile 00005: /oracle/app/oradata/ORCL/datafile/o1_mf_example_9lgs3p6t_.dbf

destination for restore of datafile 00006: /oracle/app/oradata/catalog_rman/rmantbs01.dbf

channel ORA_DISK_1: reading from backup piece /oracle/app/oracle/product/11.2.0/db_1/dbs/21p3f19n_1_1

channel ORA_DISK_1: piece handle=/oracle/app/oracle/product/11.2.0/db_1/dbs/21p3f19n_1_1 tag=TAG20140318T033619

channel ORA_DISK_1: restored backup piece 1

channel ORA_DISK_1: restore complete, elapsed time: 00:00:07

starting media recovery

archived log for thread 1 with sequence 134 is already on disk as file /oracle/app/oradata/ORCL/onlinelog/o1_mf_2_8xm4g6s5_.log

archived log for thread 1 with sequence 135 is already on disk as file /oracle/app/oradata/ORCL/onlinelog/o1_mf_3_8xm4g8gt_.log

archived log file name=/oracle/app/oradata/ORCL/onlinelog/o1_mf_2_8xm4g6s5_.log thread=1 sequence=134

archived log file name=/oracle/app/oradata/ORCL/onlinelog/o1_mf_3_8xm4g8gt_.log thread=1 sequence=135

media recovery complete, elapsed time: 00:00:03

Finished recover at 18-MAR-14

8.开启数据库：

RMAN> alter database open resetlogs;

database opened

操作完成！

weixin_39924179

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫