这个机器的CPU_COUNT设置有问题,机器实际只有2个CPU,每个CPU 6个核,总共12个核.
PARALLEL_MAX_SERVERS 是并行SQL所能用的进程数上限,这个值是由以下公式算出,
(CPU_COUNT x PARALLEL_THREADS_PER_CPU x (2 if PGA_AGGREGATE_TARGET > 0; otherwise 1) x 5)DAC中, CPU_COUNT=24, PARALLEL_THREADS_PER_CPU=2, PGA_AGGREGATE_TARGET= 3464M,算得PARALLEL_MAX_SERVERS=480
这就是为啥并行SQL语句中被人为指定最大并行度为48(24*2=48),
从2月23日挂机后的日志中可以
看到DB重启时先自动做了如下改动.Tue Feb 23 12:16:59 2016Adjusting the default value of parameter parallel_max_serversfrom 480 to 285 due to the value of parameter processes (300)可见最大上限已经达到,
这里可以把CPU_COUNT降低为12,这样PARALLEL_MAX_SERVERS=240,可以防止之前进程过多导致的高负载和挂机现象的发生.
select * from v$parameter where name like '%cpu%' /*'parallel_max_servers'*/ /*'resource_manager_plan' */
*********************************************************************************************
前面一篇文章讲述了oracle cpu_count过高引发了一个bug,导数oracle无法启动,详见连接
http://blog.csdn.net/xionglang7/article/details/9181791
这个是测试库的问题,但是过了两天生产库也出现同样的问题了,所以大家彻查了一下原因。
目前了解的情况是与CPU核数过高、数据库的并行参数(paraller_max_server)设置过高(256)等有关,ORACLE的算法在低版本有BUG(在cpu核数超过128、并行参数设置过高时容易触发)。
Alert日志显示在进行一个insert语句时触发了该bug引起宕机。之后重启时由于并行恢复参数设置问题,同样触发了BUG,导致无法启动。经查询metalink,通过降低CPU核数绕过该bug,完成启动。目前metalink上未找到当前版本的补丁,后续将开展升级整改。
那么我们就来看看paraller_max_server这个东西是怎么计算的?
metlink上记录:
- With 11.2.0.2 there is a new method to compute the default for PARALLEL_MAX_SERVERS.
- In the Oracle Rdbms Reference Guide we find:
- parallel_max_servers = PARALLEL_THREADS_PER_CPU * CPU_COUNT * concurrent_parallel_users * 5
- In the formula, the value assigned to concurrent_parallel_users running at the default degree of parallelism on an
- instance is dependent on the memory management setting.
- - If automatic memory management is disabled (manual mode), then the value of concurrent_parallel_users is 1.
- - If PGA automatic memory management is enabled, then the value of concurrent_parallel_users is 2.
- - If global memory management or SGA memory target is used in addition to PGA automatic memory management,
- then the value of concurrent_parallel_users is 4.
- The value is capped by processes -15 (this is true for versions prior 11.2.0.2 as well).
- As example we have the following values
- parallel_threads_per_cpu = 2
- cpu_count = 4
- pga_aggregate_target = 500M
- sga_target = 900M
- processes = 150
- parallel_max_servers = 2 * 4 * 4 * 5 = 160
- parallel_max_servers = min( 150-15 , 160 ) = 135
- So with these values we get a default of 135 for parallel_max_servers.
- Note if the parallel_max_servers is reduced due to value of processes, then you see similar to the following in alert log (e.g. at instance start up):
- Mon May 06 18:43:06 2013
- Adjusting the default value of parameter parallel_max_servers
- from 160 to 135 due to the value of parameter processes (150)
- Starting ORACLE instance (normal)
所以如果这个参数太高,并行的进程就太大了,导数数据库无法承受。
这个参数值可以通过下面的语句查询:
select * from dba_hist_parameter b where b.parameter_name='parallel_max_servers' order by b.snap_iddesc
从上面的内容看出oracle并行进程不能设置的太高。
CPU_COUNT参数:
http://docs.oracle.com/cd/B19306_01/server.102/b14237/initparams032.htm#REFRN10023
PARALLEL_MAX_SERVERS 参数:
http://docs.oracle.com/cd/B19306_01/server.102/b14237/initparams153.htm#REFRN10158