又见bug，动态调整SGA参数导致数据库crash

最新推荐文章于 2024-08-26 17:03:22 发布

huyangg

最新推荐文章于 2024-08-26 17:03:22 发布

阅读量1.2k

点赞数

分类专栏： oracle10g

oracle10g 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

最近遇到这样一个事情，在动态的调整sga的一个参数的时候，数据库就down了：

SQL > alter system set DB_KEEP_CACHE_SIZE = 1 g ;
alter system set DB_KEEP_CACHE_SIZE = 1 g
*
ERROR at line 1 :
ORA - 03113 : end - of - file on communication channel

SQL >

对应的，在alertlog中的报错为：

Sat Nov 13 03:56:00 2010
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/bdump/micprd_mman_3264732.trc:
ORA-00600: internal error code, arguments: [kmgs_update_target_size_1], [0], [], [], [], [], [], []
Sat Nov 13 03:56:01 2010
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/bdump/micprd_mman_3264732.trc:
ORA-00600: internal error code, arguments: [kmgs_update_target_size_1], [0], [], [], [], [], [], []
Sat Nov 13 03:56:01 2010
MMAN: terminating instance due to error 822
Sat Nov 13 03:56:01 2010
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/bdump/micprd_mmon_2969868.trc:
ORA-00600: internal error code, arguments: [kmgs_perform_parameter_updates_1], [0], [], [], [], [], [], []
Sat Nov 13 03:56:03 2010
Errors in file /u01/app/oracle/product/10.2.0/admin/MICPRD/bdump/micprd_mmon_2969868.trc:
ORA-00600: internal error code, arguments: [kmgs_parameter_update_timeout_1], [600], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kmgs_perform_parameter_updates_1], [0], [], [], [], [], [], []
Instance terminated by MMAN, pid = 3264732

对应的tace文件中的call stack为：

----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c bl ksedst1 000000000 ? FFFFFFFFFFFC7D4 ?
ksedmp+0290 bl ksedst 1048DFFC0 ?
ksfdmp+0018 bl 03F5B014
kgerinv+00dc bl _ptrgl
kgeasnmierr+004c bl kgerinv FFFFFFFFFFFCB00 ? 000000000 ?
   101045830 ? 000000010 ?
   FFFFFFFFFFFCBA0 ?
kmgs_update_target_ bl kgeasnmierr 110190110 ? 110440040 ?
size+006c 104A818A0 ? 100000001 ?
   000000000 ? 000000000 ?
   12E0BE826D694B2F ?
   000000000 ?
kmgs_update_paramet bl kmgs_update_target_ 900000000083144 ?
er_ctx+0354 size FFFFFFFFFFFCDA0 ?
kmgs_update_param_m bl kmgs_update_paramet 000000002 ? FFFFFFFFFFFD468 ?
anual_helper+0078 er_ctx 010008000 ? 110011970 ?
kmgs_update_param_m bl kmgs_update_param_m 000000000 ? 000000001 ?
anual+0064 anual_helper 000000002 ?
kmgsdrv+16cc bl kmgs_update_param_m 9000000000F6F64 ?
   anual
ksbabs+03a8 bl _ptrgl
ksbrdp+0408 bl _ptrgl
opirip+03fc bl 03F59B54
opidrv+0448 bl opirip 110296850 ? 410298190 ?
   FFFFFFFFFFFF960 ?
sou2o+0090 bl opidrv 3202AB393C ? 4A0144924 ?
   FFFFFFFFFFFF960 ?
opimai_real+0150 bl 01FC4EF4
main+0098 bl opimai_real 000000000 ? 000000000 ?
__start+0098 bl main 000000000 ? 000000000 ?

经查metalink，应该是这以下的2个bug有关：

Bug 6737235: DATABASE WITH SGA_TARGET CRASH WHEN ALTER SGA COMPONENTS
Bug 6146397: ORA-00600 [KMGS_UPDATE_TARGET_SIZE_1], [0] INSTANCE TERMINATED BY MMAN

我操作的数据库是aix上oracle 10.2.0.3数据库。目前这2个bug都还没有在10203上的fix，只是建议升级到10204或者用workaround的解决。workaround是用alter system scope=spfile的方式修改，再重启数据库生效。

这让我感觉10g上的自动内存管理不太靠谱，虽然自动内存管理可以动态调整buffer cache和其他几个pool如share pool之间的比例，但是如果一个系统是成熟的系统，很好的进行了绑定变量，那么就不太会发生各个池子之间比例的变动。如果连动态的调整一下sga中的一个组件，都会导致MMAN将数据库down掉的话，那oracle是不是太脆弱了呢？

另外，在查询关于这个ora-600的相关报错信息的时候，还查到了另一个有意思的bug：

MMON Terminates Instance With ORA-600 [kmgs_pre_process_request_6] or ORA-600 [kmgs_update_target_size_1] When Resizing Caches [ID 373802.1]

这个是Bug 4433838，影响10201至10203的所有平台，这个bug虽然和我的情况有点出入，文档说是MMON进程终止实例，我的情况是MMAN进程终止实例。但文档说触发这个bug的cause是sga的大小为4G的整数倍。呵呵，碰巧，我的sga_target和sga_max_size为12G，正是4G的倍数。而且ora-600之后的参数和trace文件中的信息也和我类似。

因此，综上，在10204以下的版本设置调整SGA的时候，需要注意以下几点：
1、SGA大小不要设置成4G的整数倍。 2、调整SGA内的组件的时候，不要动态的调整，利用scope=spfile再重启数据库进行修改。

附：

Bug 6737235: DATABASE WITH SGA_TARGET CRASH WHEN ALTER SGA COMPONENTS

显示 Bug 属性 Bug 属性
类型 B - Defect 已在产品版本中修复 -
严重性 2 - Severe Loss of Service 产品版本 10.2.0.3
状态 36 - Duplicate Bug. To Filer 平台 23 - Oracle Solaris on SPARC (64-bit)
创建时间 08-Jan-2008 平台版本 -
更新时间 11-Aug-2009 基本 Bug 6192679
数据库版本 10.2.0.3
影响平台 Generic
产品源 Oracle

显示相关产品相关产品
产品线 Oracle Database Products 系列 Oracle Database
区域 Oracle Database 产品 5 - Oracle Server - Enterprise Edition

Hdr: 6737235 10.2.0.3 RDBMS 10.2.0.3 MEMORY MGMT PRODID-5 PORTID-23 6192679
Abstract: DATABASE WITH SGA_TARGET CRASH WHEN ALTER SGA COMPONENTS

*** 01/08/08 10:47 pm ***
TAR:
----

PROBLEM:
--------
The parameter setting are as follows:

   NAME TYPE VALUE
   ------------------------------------ ----------- ---------
   shared_pool_size big integer 608M
   sga_max_size big integer 12G
   sga_target big integer 12G
   db_cache_size big integer 9G
   *.java_pool_size=209715200
   *.large_pool_size=209715200
   *.streams_pool_size=50331648

DEFAULT buffer cache in the v$sga_dynamic_components shows the below value :

CURRENT_SIZE MIN_SIZE USER_SPECIFIED_SIZE
------------ -------- -------------------
1.1744E+10 1.1744E+10 9663676416

When try to increase the shared_pool_size to 700M :

   SQL> alter system set shared_pool_size=700M;
   alter system set shared_pool_size=700M
   *
   ERROR at line 1:
   ORA-2097: parameter cannot be modified because specified value is
invalid
   ORA-4033: Insufficient memory to grow pool


   SQL> select name from v$database;
   select name from v$database
   *
   ERROR at line 1:
   ORA-3135: connection lost contact

After the above error message the database crashes.
below is the message in the alert log :
   Errors in file
/u01/app/oracle/admin/LSTDDB1/bdump/lstddb1_mman_12427.trc:
   ORA-600: internal error code, arguments: [kmgs_update_target_size_1],
[0], [], [], [], [], [], []

When try to decrease the size of the default buffer cache getting:

   SQL> alter system set db_cache_size=10G;
   alter system set db_cache_size=10G
   *
   ERROR at line 1:
   ORA-3113: end-of-file on communication channel

and the database crashes.
the alert shows the following:

Errors in file /u01/app/oracle/admin/LSTDDB1/bdump/lstddb1_mman_11825.trc:
ORA-600: internal error code, arguments: [kmgs_pre_process_request_6], [6],
[700], [576], [3], [0x67C7B9398], [], []

DIAGNOSTIC ANALYSIS:
--------------------

WORKAROUND:
-----------
None

RELATED BUGS:
-------------
Bug 6192679

REPRODUCIBILITY:
----------------
Everytime at customer site

TEST CASE:
----------

STACK TRACE:
------------

SUPPORTING INFORMATION:
-----------------------

24 HOUR CONTACT INFORMATION FOR P1 BUGS:
----------------------------------------

DIAL-IN INFORMATION:
--------------------

IMPACT DATE:
------------

*** 01/08/08 10:49 pm ***
*** 01/08/08 10:53 pm ***
*** 01/08/08 10:53 pm *** (CHG: Sta->16)
*** 01/09/08 07:51 am *** (CHG: Asg->NEW OWNER OWNER)
*** 01/09/08 08:53 am *** (CHG: Sta->10 SubComp->MEMORY MGMT)
*** 01/09/08 08:53 am ***
*** 01/10/08 08:20 pm ***
*** 01/10/08 08:20 pm *** (CHG: Sta->16)
*** 01/11/08 08:33 am *** (CHG: Sta->10)
*** 01/11/08 08:33 am ***
*** 01/28/08 10:43 pm ***
*** 01/28/08 10:43 pm *** (CHG: Sta->16)
*** 01/29/08 06:17 am *** (CHG: Sta->10)
*** 01/29/08 06:17 am ***
*** 01/29/08 06:42 am *** (CHG: Sta->16)
*** 01/29/08 06:42 am ***
*** 01/29/08 06:44 am *** (CHG: Sta->10)
*** 01/29/08 06:44 am *** (CHG: Sta->16)
*** 01/29/08 10:00 am *** (CHG: Sta->11)
*** 01/29/08 10:00 am ***
*** 01/29/08 10:00 am *** (ADD: Impact/Symptom->INTERNAL ERROR )
*** 01/29/08 10:01 am *** (CHG: Asg->NEW OWNER OWNER)
*** 01/29/08 10:01 am ***
*** 02/13/08 06:07 pm ***
*** 05/06/08 06:21 pm ***
*** 05/06/08 06:29 pm ***
*** 05/06/08 06:31 pm ***
*** 08/07/08 10:57 am *** (CHG: Sta->36)

Bug 6146397: ORA-00600 [KMGS_UPDATE_TARGET_SIZE_1], [0] INSTANCE TERMINATED BY MMAN

显示 Bug 属性 Bug 属性
类型 B - Defect 已在产品版本中修复 -
严重性 3 - Minimal Loss of Service 产品版本 10.2.0.3.0
状态 33 - Suspended, Req'd Info not Avail 平台 212 - IBM AIX on POWER Systems (64-bit)
创建时间 21-Jun-2007 平台版本 -
更新时间 17-Sep-2007 基本 Bug -
数据库版本 10.2.0.3
影响平台 Generic
产品源 Oracle

显示相关产品相关产品
产品线 Oracle Database Products 系列 Oracle Database
区域 Oracle Database 产品 5 - Oracle Server - Enterprise Edition

Hdr: 6146397 10.2.0.3 RDBMS 10.2.0.3.0 MEMORY MGMT PRODID-5 PORTID-212
Abstract: ORA-600 [KMGS_UPDATE_TARGET_SIZE_1], [0] INSTANCE TERMINATED BY MMAN

*** 06/21/07 05:05 pm ***
TAR:
----

PROBLEM:
--------
When dynamically allocating memory for running multiple sized buffer pools
statement fails and instance terminates

DIAGNOSTIC ANALYSIS:
--------------------
ALTER SYSTEM SET db_16k_cache_size='256M' SCOPE=BOTH;
ORA-600: internal error code, arguments: [kmgs_update_target_size_1], [0],
[],
..
MMAN: terminating instance due to error 822

WORKAROUND:
-----------
1) ALTER SYSTEM SET db_16k_cache_size=' < size > ' scope=spfile
2) Restart instance

RELATED BUGS:
-------------

REPRODUCIBILITY:
----------------

TEST CASE:
----------

STACK TRACE:
------------
ksedst ksedmp ksfdmp kgerinv kgeasnmierr kmgs_update_target_size
kmgs_update_parameter_ctx kmgs_update_param_manual_helper
kmgs_update_param_manual kmgsdrv ksbabs ksbrdp opirip opidrv
sou2o opimai_real main start

SUPPORTING INFORMATION:
-----------------------
irp3_mman_1794226.trc
alert_IRP3.log

24 HOUR CONTACT INFORMATION FOR P1 BUGS:
----------------------------------------

DIAL-IN INFORMATION:
--------------------

IMPACT DATE:
------------

*** 06/21/07 05:13 pm *** (CHG: Sta->16)
*** 06/21/07 10:32 pm *** (CHG: Asg->NEW OWNER OWNER)
*** 06/22/07 06:26 am ***
*** 06/22/07 06:28 am *** (CHG: Sta->10)
*** 09/17/07 10:27 pm *** (CHG: Sta->33 SubComp->MEMORY MGMT)
*** 09/17/07 10:27 pm ***

MMON Terminates Instance With ORA-600 [kmgs_pre_process_request_6] or ORA-600 [kmgs_update_target_size_1] When Resizing Caches [ID 373802.1]
   修改时间 23-MAR-2010 类型 PROBLEM 状态 PUBLISHED

In this Document
Symptoms
Cause
Solution
References

Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 10.2.0.3
Information in this document applies to any platform.
Symptoms

The MMAN process terminates the instance with the the following error is reported in the database alert.log file:
ORA-00600: internal error code, arguments: [kmgs_pre_process_request_6]

OR

ORA-00600: internal error code, arguments: [kmgs_update_target_size_1]

The call stack from the ORA-00600 trace file will be similar to:
ksddoa < - ksdpcg < - ksdpec < - ksfpec < - kgerinv < - kgeasnmierr < - quest < - kmgsdrv

or

kmgs_update_target_size < - kmgs_update_parameter_ctx < - kmgs_update_param_manual_helper < - kmgs_update_param_manual

Cause
This is due to Bug 4433838
Similar issues have been filed as bugs which have also been determined to be duplicates of Bug 4433838 .

The error occurs when the parameter SGA_TARGET is set to an exact multiple of 4Gb .

For instance , if the SGA_TARGET value in HEXADECIMAL has 00000000 as the last 4 bytes then you can hit this problem . The bug indicates this happens on an exact multiple of 4G .

Example:
SGA_TARGET ( 20 Gb ) = 21474836480 -- > HEXADECIMAL -->( 0x500000000) --->(value in HEXADECIMAL has 00000000 as the last 4 bytes).

SGA_TARGET (4 Gb) = 4294967296 --> HEXADECIMAL -->( 0x100000000) --->(value in HEXADECIMAL has 00000000 as the last 4 bytes).
Solution

1. Upgrade to 10.2.0.4 or higher.

OR

2 Apply the one-off Patch 4433838 depending on its availability for your patchset-level and platform

OR

3. Use the workaround:

   Set your SGA_TARGET parameter to any non-multiple values of 4 Gb.
   In other words, ensure that your SGA_TARGET value in HEXADECIMAL does not have 00000000 as the last 4 bytes.

   Example:
   If your SGA_TARGET is set to 20 Gb (21474836480) = 0x500000000 (in Hexadecimal). The workaround consists of changing its value to 0x510000000 (in Hexadecimal) = 21743271936

   If your SGA_TARGET is set to 4 Gb (4294967296) = 0x100000000 (in Hexadecimal). The workaround consists of changing its value to 0x100000001 (in Hexadecimal) = 4294967297

References
BUG:4433838 - ORA-600 [KMGS_PRE_PROCESS_REQUEST_6] TERMINATES INSTANCE WHEN RESIZING CACHES
BUG:6192679 - MMAN ORA-600[KMGS_UPDATE_TARGET_SIZE_1],[0] AND INSTANCE DOWN

显示相关信息相关的
产品

   * Oracle Database Products > Oracle Database > Oracle Database > Oracle Server - Enterprise Edition

原文地址： http://www.oracleblog.org/working-case/do-not-set-sga-multiple-of-4gb/