oracle rac ora 01012,多cpu环境中运行root.sh失败,asm报ORA-04031

有朋友和我反馈,说他们在装linux 6.5上面装11.2.0.3的rac出现异常,root.sh在第一个节点执行就失败了,请求帮助

6ca578c8b7d23dbe11bcade2a66f6737.png

根据上面记录,查看asmca日志

[main] [ 2015-07-24 12:49:35.885 CST ] [SQLEngine.reInitialize:738] Reinitializing SQLEngine...

[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889] OracleHome.getVersion called. Current Version: 11.2.0.3.0

[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:957] Current Version From Inventory: 11.2.0.3.0

[main] [ 2015-07-24 12:49:35.885 CST ] [OracleHome.getVersion:889] OracleHome.getVersion called. Current Version: 11.2.0.3.0

[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957] Current Version From Inventory: 11.2.0.3.0

[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:889] OracleHome.getVersion called. Current Version: 11.2.0.3.0

[main] [ 2015-07-24 12:49:35.886 CST ] [OracleHome.getVersion:957] Current Version From Inventory: 11.2.0.3.0

[main] [ 2015-07-24 12:49:35.886 CST ] [SQLPlusEngine.getCmmdParams:222] m_home 11.2.0.3.0

[main] [ 2015-07-24 12:49:35.887 CST ] [SQLPlusEngine.getCmmdParams:223] version > 112 true

[main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:555] Default NLS_LANG: AMERICAN_AMERICA.AL32UTF8

[main] [ 2015-07-24 12:49:35.887 CST ] [SQLEngine.getEnvParams:565] NLS_LANG: AMERICAN_AMERICA.AL32UTF8

[main] [ 2015-07-24 12:49:35.888 CST ] [SQLEngine.initialize:325] Execing SQLPLUS/SVRMGR process...

[main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:362] m_bReaderStarted: false

[main] [ 2015-07-24 12:49:35.900 CST ] [SQLEngine.initialize:366] Starting Reader Thread...

[main] [ 2015-07-24 12:49:35.901 CST ] [SQLEngine.initialize:415] Waiting for m_bReaderStarted to be true

[main] [ 2015-07-24 12:49:35.972 CST ] [SQLEngine.done:2189] Done called

[main] [ 2015-07-24 12:49:35.972 CST ] [UsmcaLogger.logException:173] SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM

[main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:174] ORA-01012: not logged on

[main] [ 2015-07-24 12:49:35.973 CST ] [UsmcaLogger.logException:175] oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-01012: not logged on

oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeImpl(SQLEngine.java:1658)

oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeQuery(SQLEngine.java:831)

oracle.sysman.assistants.usmca.backend.USMInstance.configureLocalASM(USMInstance.java:3036)

oracle.sysman.assistants.usmca.service.UsmcaService.configureLocalASM(UsmcaService.java:1049)

oracle.sysman.assistants.usmca.model.UsmcaModel.performConfigureLocalASM(UsmcaModel.java:944)

oracle.sysman.assistants.usmca.model.UsmcaModel.performOperation(UsmcaModel.java:797)

oracle.sysman.assistants.usmca.Usmca.execute(Usmca.java:174)

oracle.sysman.assistants.usmca.Usmca.main(Usmca.java:369)

[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:173] SEVERE:method oracle.sysman.assistants.usmca.backend.USMInstance:configureLocalASM

[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:174] ORA-03113: end-of-file on communication channel

[main] [ 2015-07-24 12:49:35.989 CST ] [UsmcaLogger.logException:175] oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-03113: end-of-file on communication channel

这里可以看出来,asm实例无法登陆(ORA-01012和ORA-03113),根据这样的错误,分析asm日志

Reconfiguration complete

Fri Jul 24 12:49:29 2015

LCK0 started with pid=22, OS id=46913

Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmd0_46887.trc (incident=81):

ORA-04031: unable to allocate 7072 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","ges resource ")

Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_81/+ASM1_lmd0_46887_i81.trc

Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc (incident=177):

ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss")

Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_177/+ASM1_lck0_46913_i177.trc

Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lmon_46885.trc (incident=73):

ORA-04031: unable to allocate 632 bytes of shared memory ("shared pool","unknown object","sga heap(1,1)","name-service ")

Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_73/+ASM1_lmon_46885_i73.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_lck0_46913.trc:

ORA-04031: unable to allocate 760 bytes of shared memory ("shared pool","unknown object","KKSSP^1343","kglss")

System state dump requested by (instance=1, osid=46913 (LCK0)), summary=[abnormal instance termination].

System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_diag_46879.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

LCK0 (ospid: 46913): terminating the instance due to error 4031

Fri Jul 24 12:49:35 2015

ORA-1092 : opitsk aborting process

Instance terminated by LCK0, pid = 46913

进一步分析asm日志,发现是大家熟悉的asm的ORA-4031问题,那就是说明数据库在执行root.sh的时候使用默认参数文件启动asm的时候shared pool不够大(根据ORACLE最佳实践,建议memory_target=1536M及其以上值),从而出现该问题。类似Bug 14292825 ORA-4031 in ASM as default memory parameters values for 11.2 ASM instances low,根据官方描述该问题在11.2.0.4中修复

26176c71e99d8ab01fa3a42db411b908.png

通过asm日志发现相关默认值配置

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

With the Real Application Clusters and Automatic Storage Management options.

ORACLE_HOME = /u01/app/11.2.0/grid

System name:Linux

Node name:RAC01

Release:2.6.32-358.el6.x86_64

Version:#1 SMP Tue Jan 29 11:47:41 EST 2013

Machine:x86_64

Using parameter settings in client-side pfile /u01/app/11.2.0/grid/dbs/init+ASM1.ora on machine RAC01

System parameters with non-default values:

large_pool_size = 16M

instance_type = "asm"

remote_login_passwordfile= "EXCLUSIVE"

asm_power_limit = 1

diagnostic_dest = "/u01/app/grid"

Cluster communication is configured to use the following interface(s) for this instance

10.10.10.31

cluster interconnect IPC version:Oracle UDP/IP (generic)

IPC Vendor 1 proto 2

Fri Jul 24 12:49:27 2015

通过查询/proc/cpuinfo,检查cpu数量

processor: 191

vendor_id: GenuineIntel

cpu family: 6

model: 62

model name: Intel(R) Xeon(R) CPU E7-8850 v2 @ 2.30GHz

stepping: 7

cpu MHz: 1200.000

cache size: 24576 KB

physical id: 7

siblings: 24

core id: 13

cpu cores: 12

apicid: 251

initial apicid: 251

fpu: yes

fpu_exception: yes

cpuid level: 13

wp: yes

而根据How To Determine The Default Number Of Subpools Allocated During Startup (Doc ID 455179.1)中描述

最多7个subpool(这里一共有192个cpu,因此subpool数量为7)

0cc1538a5281c227e060d9d62f7aa5cd.png

每个suppool最少512m内存,因此shared pool最小需要3.5G(而默认值几百M,远远不够)

603c52058e13e9e1d157cd88b905372b.png

由于cpu多,导致shared pool的Subpools 更加多,使得shared pool的需求量更加大。至此本次故障原因可以总结:

由于cpu较多,需要更多的shared pool,而11.2.0.3中由于asm默认内存分配较少,导致在asm启动之时出现shared pool不足(本身默认值小,而且shared pool需求大,从而出现了ORA-04031就不奇怪了),因为运行root.sh过程中asm无法正常启动,从而使得root.sh运行失败。

处理办法:临时disable部分cpu,然后重新执行root.sh,修改asm内存分配,再enable cpu.

特别说明:此故障acs的兄弟遇到过,所以这次我能够快速反应,感谢acs兄弟们的帮忙,另外有权限的朋友可以看看:3-10479952701和3-7976215751等sr描述

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值