又见bug,动态调整SGA参数导致数据库crash

最近遇到这样一个事情,在动态的调整sga的一个参数的时候,数据库就down了:

SQL > alter system set DB_KEEP_CACHE_SIZE = 1 g ;
alter system set DB_KEEP_CACHE_SIZE = 1 g
*
ERROR at line 1 :
ORA - 03113 : end - of - file on communication channel
 
 
SQL >

对应的,在alertlog中的报错为:

Sat Nov 13 03:56:00 2010
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/bdump/micprd_mman_3264732.trc:
ORA-00600: internal error code, arguments: [kmgs_update_target_size_1], [0], [], [], [], [], [], []
Sat Nov 13 03:56:01 2010
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/bdump/micprd_mman_3264732.trc:
ORA-00600: internal error code, arguments: [kmgs_update_target_size_1], [0], [], [], [], [], [], []
Sat Nov 13 03:56:01 2010
MMAN: terminating instance due to error 822
Sat Nov 13 03:56:01 2010
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/bdump/micprd_mmon_2969868.trc:
ORA-00600: internal error code, arguments: [kmgs_perform_parameter_updates_1], [0], [], [], [], [], [], []
Sat Nov 13 03:56:03 2010
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/bdump/micprd_mmon_2969868.trc:
ORA-00600: internal error code, arguments: [kmgs_parameter_update_timeout_1], [600], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kmgs_perform_parameter_updates_1], [0], [], [], [], [], [], []
Instance terminated by MMAN, pid = 3264732

对应的tace文件中的call stack为:

----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              000000000 ? FFFFFFFFFFFC7D4 ?
ksedmp+0290          bl       ksedst               1048DFFC0 ?
ksfdmp+0018          bl       03F5B014
kgerinv+00dc         bl       _ptrgl
kgeasnmierr+004c     bl       kgerinv              FFFFFFFFFFFCB00 ? 000000000 ?
                                                   101045830 ? 000000010 ?
                                                   FFFFFFFFFFFCBA0 ?
kmgs_update_target_  bl       kgeasnmierr          110190110 ? 110440040 ?
size+006c                                          104A818A0 ? 100000001 ?
                                                   000000000 ? 000000000 ?
                                                   12E0BE826D694B2F ?
                                                   000000000 ?
kmgs_update_paramet  bl       kmgs_update_target_  900000000083144 ?
er_ctx+0354                   size                 FFFFFFFFFFFCDA0 ?
kmgs_update_param_m  bl       kmgs_update_paramet  000000002 ? FFFFFFFFFFFD468 ?
anual_helper+0078             er_ctx               010008000 ? 110011970 ?
kmgs_update_param_m  bl       kmgs_update_param_m  000000000 ? 000000001 ?
anual+0064                    anual_helper         000000002 ?
kmgsdrv+16cc         bl       kmgs_update_param_m  9000000000F6F64 ?
                              anual
ksbabs+03a8          bl       _ptrgl
ksbrdp+0408          bl       _ptrgl
opirip+03fc          bl       03F59B54
opidrv+0448          bl       opirip               110296850 ? 410298190 ?
                                                   FFFFFFFFFFFF960 ?
sou2o+0090           bl       opidrv               3202AB393C ? 4A0144924 ?
                                                   FFFFFFFFFFFF960 ?
opimai_real+0150     bl       01FC4EF4
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0098         bl       main                 000000000 ? 000000000 ?

经查metalink,应该是这以下的2个bug有关:

  1. Bug 6737235: DATABASE WITH SGA_TARGET CRASH WHEN ALTER SGA COMPONENTS
  2. Bug 6146397: ORA-00600 [KMGS_UPDATE_TARGET_SIZE_1], [0] INSTANCE TERMINATED BY MMAN

我操作的数据库是aix上oracle 10.2.0.3数据库。目前这2个bug都还没有在10203上的fix,只是建议升级到10204或者用workaround的解决。workaround是用alter system scope=spfile的方式修改,再重启数据库生效。

这让我感觉10g上的自动内存管理不太靠谱,虽然自动内存管理可以动态调整buffer cache和其他几个pool如share pool之间的比例,但是如果一个系统是成熟的系统,很好的进行了绑定变量,那么就不太会发生各个池子之间比例的变动。如果连动态的调整一下sga中的一个组件,都会导致MMAN将数据库down掉的话,那oracle是不是太脆弱了呢?

另外,在查询关于这个ora-600的相关报错信息的时候,还查到了另一个有意思的bug:

  1. MMON Terminates Instance With ORA-600 [kmgs_pre_process_request_6] or ORA-600 [kmgs_update_target_size_1] When Resizing Caches [ID 373802.1]

这个是Bug 4433838,影响10201至10203的所有平台,这个bug虽然和我的情况有点出入,文档说是MMON进程终止实例,我的情况是MMAN进程终止实例。但文档说触发这个bug的cause是sga的大小为4G的整数倍。呵呵,碰巧,我的sga_target和sga_max_size为12G,正是4G的倍数。而且ora-600之后的参数和trace文件中的信息也和我类似。

因此,综上,在10204以下的版本设置调整SGA的时候,需要注意以下几点:
1、SGA大小不要设置成4G的整数倍。
2、调整SGA内的组件的时候,不要动态的调整,利用scope=spfile再重启数据库进行修改。

附:

Bug 6737235: DATABASE WITH SGA_TARGET CRASH WHEN ALTER SGA COMPONENTS     
 
显示 Bug 属性 Bug 属性
类型     B - Defect     已在产品版本中修复     -
严重性     2 - Severe Loss of Service     产品版本     10.2.0.3
状态     36 - Duplicate Bug. To Filer     平台     23 - Oracle Solaris on SPARC (64-bit)
创建时间     08-Jan-2008     平台版本     -
更新时间     11-Aug-2009     基本 Bug     6192679
数据库版本     10.2.0.3         
影响平台     Generic         
产品源     Oracle         
 
显示相关产品 相关产品
产品线     Oracle Database Products     系列     Oracle Database
区域     Oracle Database     产品     5 - Oracle Server - Enterprise Edition
 
Hdr: 6737235 10.2.0.3 RDBMS 10.2.0.3 MEMORY MGMT PRODID-5 PORTID-23 6192679
Abstract: DATABASE WITH SGA_TARGET CRASH WHEN ALTER SGA COMPONENTS
 
*** 01/08/08 10:47 pm ***
TAR:
----
 
PROBLEM:
--------
The parameter setting are as follows:
 
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ---------
    shared_pool_size                     big integer 608M
    sga_max_size                         big integer 12G
    sga_target                           big integer 12G
    db_cache_size                        big integer 9G
    *.java_pool_size=209715200
    *.large_pool_size=209715200
    *.streams_pool_size=50331648
 
DEFAULT buffer cache in the v$sga_dynamic_components shows the below value :
 
CURRENT_SIZE      MIN_SIZE    USER_SPECIFIED_SIZE
------------      --------    -------------------
1.1744E+10     1.1744E+10      9663676416
 
When try to increase the shared_pool_size to 700M :
 
    SQL> alter system set shared_pool_size=700M;
    alter system set shared_pool_size=700M
    *
    ERROR at line 1:
    ORA-2097: parameter cannot be modified because specified value is
invalid
    ORA-4033: Insufficient memory to grow pool
    
    
    SQL> select name from v$database;
    select name from v$database
    *
    ERROR at line 1:
    ORA-3135: connection lost contact
 
After the above error message the database crashes.
below is the message in the alert log :
    Errors in file
/u01/app/oracle/admin/LSTDDB1/bdump/lstddb1_mman_12427.trc:
    ORA-600: internal error code, arguments: [kmgs_update_target_size_1],
[0], [], [], [], [], [], []
 
 
When try to decrease the size of the default buffer cache getting:
 
    SQL> alter system set db_cache_size=10G;
    alter system set db_cache_size=10G
    *
    ERROR at line 1:
    ORA-3113: end-of-file on communication channel
 
and the database crashes.
the alert shows the following:
 
Errors in file /u01/app/oracle/admin/LSTDDB1/bdump/lstddb1_mman_11825.trc:
ORA-600: internal error code, arguments: [kmgs_pre_process_request_6], [6],
[700], [576], [3], [0x67C7B9398], [], []
 
 
DIAGNOSTIC ANALYSIS:
--------------------
 
WORKAROUND:
-----------
None
 
RELATED BUGS:
-------------
Bug 6192679
 
REPRODUCIBILITY:
----------------
Everytime at customer site
 
TEST CASE:
----------
 
STACK TRACE:
------------
 
SUPPORTING INFORMATION:
-----------------------
 
24 HOUR CONTACT INFORMATION FOR P1 BUGS:
----------------------------------------
 
DIAL-IN INFORMATION:
--------------------
 
IMPACT DATE:
------------
 
*** 01/08/08 10:49 pm ***
*** 01/08/08 10:53 pm ***
*** 01/08/08 10:53 pm *** (CHG: Sta->16)
*** 01/09/08 07:51 am *** (CHG: Asg->NEW OWNER OWNER)
*** 01/09/08 08:53 am *** (CHG: Sta->10 SubComp->MEMORY MGMT)
*** 01/09/08 08:53 am ***
*** 01/10/08 08:20 pm ***
*** 01/10/08 08:20 pm *** (CHG: Sta->16)
*** 01/11/08 08:33 am *** (CHG: Sta->10)
*** 01/11/08 08:33 am ***
*** 01/28/08 10:43 pm ***
*** 01/28/08 10:43 pm *** (CHG: Sta->16)
*** 01/29/08 06:17 am *** (CHG: Sta->10)
*** 01/29/08 06:17 am ***
*** 01/29/08 06:42 am *** (CHG: Sta->16)
*** 01/29/08 06:42 am ***
*** 01/29/08 06:44 am *** (CHG: Sta->10)
*** 01/29/08 06:44 am *** (CHG: Sta->16)
*** 01/29/08 10:00 am *** (CHG: Sta->11)
*** 01/29/08 10:00 am ***
*** 01/29/08 10:00 am *** (ADD: Impact/Symptom->INTERNAL ERROR )
*** 01/29/08 10:01 am *** (CHG: Asg->NEW OWNER OWNER)
*** 01/29/08 10:01 am ***
*** 02/13/08 06:07 pm ***
*** 05/06/08 06:21 pm ***
*** 05/06/08 06:29 pm ***
*** 05/06/08 06:31 pm ***
*** 08/07/08 10:57 am *** (CHG: Sta->36)
Bug 6146397: ORA-00600 [KMGS_UPDATE_TARGET_SIZE_1], [0] INSTANCE TERMINATED BY MMAN     
 
显示 Bug 属性 Bug 属性
类型     B - Defect     已在产品版本中修复     -
严重性     3 - Minimal Loss of Service     产品版本     10.2.0.3.0
状态     33 - Suspended, Req'd Info not Avail     平台     212 - IBM AIX on POWER Systems (64-bit)
创建时间     21-Jun-2007     平台版本     -
更新时间     17-Sep-2007     基本 Bug     -
数据库版本     10.2.0.3         
影响平台     Generic         
产品源     Oracle         
 
显示相关产品 相关产品
产品线     Oracle Database Products     系列     Oracle Database
区域     Oracle Database     产品     5 - Oracle Server - Enterprise Edition
 
Hdr: 6146397 10.2.0.3 RDBMS 10.2.0.3.0 MEMORY MGMT PRODID-5 PORTID-212
Abstract: ORA-600  [KMGS_UPDATE_TARGET_SIZE_1], [0]  INSTANCE TERMINATED BY MMAN
 
*** 06/21/07 05:05 pm ***
TAR:
----
 
PROBLEM:
--------
When dynamically allocating memory for running multiple sized buffer pools
statement fails and instance terminates
 
DIAGNOSTIC ANALYSIS:
--------------------
ALTER SYSTEM SET db_16k_cache_size='256M' SCOPE=BOTH;
ORA-600: internal error code, arguments: [kmgs_update_target_size_1], [0],
[],
..
MMAN: terminating instance due to error 822
 
WORKAROUND:
-----------
1) ALTER SYSTEM SET db_16k_cache_size='
< size > ' scope=spfile
2) Restart instance
 
RELATED BUGS:
-------------
 
REPRODUCIBILITY:
----------------
 
TEST CASE:
----------
 
STACK TRACE:
------------
ksedst ksedmp ksfdmp kgerinv kgeasnmierr kmgs_update_target_size
kmgs_update_parameter_ctx kmgs_update_param_manual_helper
kmgs_update_param_manual kmgsdrv ksbabs ksbrdp opirip opidrv
sou2o opimai_real main start
 
SUPPORTING INFORMATION:
-----------------------
irp3_mman_1794226.trc
alert_IRP3.log
 
24 HOUR CONTACT INFORMATION FOR P1 BUGS:
----------------------------------------
 
DIAL-IN INFORMATION:
--------------------
 
IMPACT DATE:
------------
 
*** 06/21/07 05:13 pm *** (CHG: Sta->16)
*** 06/21/07 10:32 pm *** (CHG: Asg->NEW OWNER OWNER)
*** 06/22/07 06:26 am ***
*** 06/22/07 06:28 am *** (CHG: Sta->10)
*** 09/17/07 10:27 pm *** (CHG: Sta->33 SubComp->MEMORY MGMT)
*** 09/17/07 10:27 pm ***
MMON Terminates Instance With ORA-600 [kmgs_pre_process_request_6] or ORA-600 [kmgs_update_target_size_1] When Resizing Caches [ID 373802.1]
      修改时间 23-MAR-2010     类型 PROBLEM     状态 PUBLISHED     
 
In this Document
  Symptoms
  Cause
  Solution
  References
 
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.3
Information in this document applies to any platform.
Symptoms
 
The MMAN process terminates the instance with the the following error is reported in the database alert.log file:
ORA-00600: internal error code, arguments: [kmgs_pre_process_request_6]
 
OR
 
ORA-00600: internal error code, arguments: [kmgs_update_target_size_1]
 
The call stack from the ORA-00600 trace file will be similar to:
ksddoa
< - ksdpcg < - ksdpec < - ksfpec < - kgerinv < - kgeasnmierr < - quest < - kmgsdrv
 
or
 
kmgs_update_target_size < - kmgs_update_parameter_ctx < - kmgs_update_param_manual_helper < - kmgs_update_param_manual
 
Cause
This is due to Bug 4433838
Similar issues have been filed as bugs which have also been determined to be duplicates of Bug 4433838 .
 
The error occurs when the parameter SGA_TARGET is set to an exact multiple of 4Gb .
 
For instance , if the SGA_TARGET value in HEXADECIMAL has 00000000 as the last 4 bytes then you can hit this problem . The bug indicates this happens on an exact multiple of 4G .
 
Example:
SGA_TARGET ( 20 Gb ) = 21474836480 -- > HEXADECIMAL -->( 0x500000000) --->(value in HEXADECIMAL has 00000000 as the last 4 bytes).
 
SGA_TARGET (4 Gb) = 4294967296 --> HEXADECIMAL -->( 0x100000000)  --->(value in HEXADECIMAL has 00000000 as the last 4 bytes).
Solution
 
1.  Upgrade to 10.2.0.4 or higher.
 
OR
 
2 Apply the one-off Patch 4433838 depending on its availability for your patchset-level and platform
 
OR
 
3.  Use the workaround:
 
    Set your SGA_TARGET parameter to any non-multiple values of  4 Gb.
    In other words, ensure that your SGA_TARGET value in HEXADECIMAL does not have 00000000 as the last 4 bytes.
 
    Example:
    If your SGA_TARGET is set to 20 Gb (21474836480) = 0x500000000 (in Hexadecimal).  The workaround consists of changing its value to 0x510000000 (in Hexadecimal) = 21743271936
 
    If your SGA_TARGET is set to 4 Gb (4294967296) = 0x100000000 (in Hexadecimal).  The workaround consists of changing its value to 0x100000001 (in Hexadecimal) = 4294967297
 
 
References
BUG:4433838 - ORA-600 [KMGS_PRE_PROCESS_REQUEST_6] TERMINATES INSTANCE WHEN RESIZING CACHES
BUG:6192679 - MMAN ORA-600[KMGS_UPDATE_TARGET_SIZE_1],[0] AND INSTANCE DOWN
 
显示相关信息 相关的
产品
 
    * Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - Enterprise Edition
 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值