一条sql语句导致的数据库宕机问题及分析

之前分享过一篇博文,是一条sql语句"导致"的数据库宕机,上次是另有原因,这次真碰到一个案例,而且是在重要的环境上,希望大家引以为戒。
数据库是基于Linux64的版本,版本是11.2.0.2.0,已经打了最新的psu.
数据库的访问用户数大约在1000左右,当时查看服务器的cpu已经是100%了,有大约10个进程都是cpu 100%,数据库逻辑读也是超高,一秒钟大约是接近百兆的情况,sga是12G,已用了sga的自动管理(sga_target=0), 查看内存组件时发现buffer_cache已经有shrink的迹象,而且buffer_cache的min_size还是有一点小,就在可用范围内给buffer cache 增大了几百兆的样子,生成了一个ADDM, 报告里第一条就是希望设置sga_target为一个特定的值,性能可能会有一定的提升,当时想,sga_max_size都已经是12G了,设置sga_target=12G也没有问题吧
就按照它的提示做了,
alter system set sga_target=12G;
结果命令提顿了几秒钟,然后就崩出来一个end_of_communicaiton的ora错误,我感觉出问题了,已查看进程,数据库是真down掉了。
查看alert日志,发现时由于resize_sga的ora-600问题导致的,所有的在线进程都被自动给kill掉了。

然后马上和相应的team来协调,把数据库先startup了。再查看具体的信息。
alert日志如下:
Thread 1 advanced to log sequence 14054 (LGWR switch)
  Current log# 2 seq# 14054 mem# 0: /dbtestPR1/oracle/TEST01/redolog_A2/redo/redo02A.log
  Current log# 2 seq# 14054 mem# 1: /dbtestPR1/oracle/TEST01/redolog_B2/redo/redo02B.log
Wed Apr 09 20:07:10 2014
Archived Log entry 14090 added for thread 1 sequence 14053 ID 0xb8c6d509 dest 1:
Wed Apr 09 20:40:13 2014
Errors in file /opt/app/oracle/dbtestpr1/diag/rdbms/TEST01/TEST01/trace/TEST01_mman_27182.trc (incident=360075):
ORA-00600: internal error code, arguments: [kmgsb_resize_sga_target_1], [0], [768], [4], [], [], [], [], [], [], [], []
Incident details in: /opt/app/oracle/dbtestpr1/diag/rdbms/TEST01/TEST01/incident/incdir_360075/TEST01_mman_27182_i360075.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /opt/app/oracle/dbtestpr1/diag/rdbms/TEST01/TEST01/trace/TEST01_mman_27182.trc:
ORA-00600: internal error code, arguments: [kmgsb_resize_sga_target_1], [0], [768], [4], [], [], [], [], [], [], [], []
MMAN (ospid: 27182): terminating the instance due to error 822
Wed Apr 09 20:40:14 2014
opiodr aborting process unknown ospid (25518) as a result of ORA-1092
Wed Apr 09 20:40:14 2014
ORA-1092 : opitsk aborting process
Wed Apr 09 20:40:14 2014
Wed Apr 09 20:40:14 2014
opiodr aborting process unknown ospid (27776) as a result of ORA-1092opiodr aborting process unknown ospid (10547) as a result of ORA-1092
Wed Apr 09 20:40:14 2014
opiodr aborting process unknown ospid (7458) as a result of ORA-1092
Wed Apr 09 20:40:14 2014
Wed Apr 09 20:40:14 2014
ORA-1092 : opitsk aborting process
ORA-1092 : opitsk aborting process
Wed Apr 09 20:40:14 2014
ORA-1092 : opitsk aborting process
Wed Apr 09 20:40:14 2014
opiodr aborting process unknown ospid (30719) as a result of ORA-1092
Wed Apr 09 20:40:14 2014
ORA-1092 : opitsk aborting process
.......
ORA-1092 : opitsk aborting process
Wed Apr 09 20:40:14 2014
System state dump requested by (instance=1, osid=27182 (MMAN)), summary=[abnormal instance termination].
System State dumped to trace file /opt/app/oracle/dbtestpr1/diag/rdbms/TEST01/TEST01/trace/TEST01_diag_27176.trc
Instance terminated by MMAN, pid = 27182

查看metalink,确实有这样的一个bug.
Bug 10173135 - Resize SGA_TARGET crashes instance with ORA-600 [kmgsb_resize_sga_target_1] (Doc ID 10173135.8)

而且影响的版本如下:
Product (Component) Oracle Server (Rdbms)
Range of versions believed to be affected Versions BELOW 12.1
Versions confirmed as being affected
Platforms affected Generic (all / most platforms affected)

真是撞到枪口上了,查看了下,在11.2.0.3.0中才修复了这个问题。
然后自我总结了下,发现sga的自动管理操作还是需要谨慎,新特性的使用也是如此,一定要有足够的把握才能使用。



来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/23718752/viewspace-1141131/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/23718752/viewspace-1141131/

1 解决Oracle 9.2.0.6版本数据库由于ORA-07445宕机问题 故障现象: XX网数据库宕机,查看日志发现以下内容: Wed Jun 8 20:24:17 2005 Errors in file /u02/app/oracle/admin/unicom/udump/unicom_ora_661.trc: ORA-07445: \263\366\317\326\322\354\263\243: \272\313\320\304\327\252\264\242 [0000000101C3089C] [SIGSEGV] [Address not mappe d to object] [0x000000000] [] [] Wed Jun 8 20:24:22 2005 Errors in file /u02/app/oracle/admin/unicom/bdump/unicom_pmon_11598.trc: ORA-07445: exception encountered: core dump [0000000101C399A0] [SIGSEGV] [Address not mapped to object] [0x000000000] [] [] Wed Jun 8 20:24:23 2005 Errors in file /u02/app/oracle/admin/unicom/bdump/unicom_pmon_11598.trc: ORA-07445: exception encountered: core dump [0000000101C399A0] [SIGSEGV] [Address not mapped to object] [0x000000000] [] [] ORA-00602: internal programming exception ORA-07445: exception encountered: core dump [0000000101C399A0] [SIGSEGV] [Address not mapped to object] [0x000000000] [] [] Wed Jun 8 20:24:33 2005 CKPT: terminating instance due to error 472 Instance terminated by CKPT, pid = 11604 Wed Jun 8 21:04:47 2005 Starting ORACLE instance (normal) 解决办法: Oracle工程师建议安装Oracle补丁p3949307_9206_SOLARIS64,经过测试,安装步骤如下: (注意,首先shutdown数据库) 1,解压补丁文件 unzip p3949307_9206_SOLARIS64.zip 解开后的目录是:4060756 2,修改oraclehomeproperties.xml文件,该文件在$ORACLE_HOME/inventory/ContentsXML目录下。 cp oraclehomeproperties.xml oraclehomeproperties.xmlb.bak vi oraclehomeproperties.xml 更改数字453 ->23,存盘退出 3,修改PATH路径为 PATH=$ORACLE_HOME/bin:/usr/ccs/bin:${PATH} 4,执行opatch apply命令 cd 4060756 $ORACLE_HOME/OPatch/opatch apply 5,安装成功后会出现如下结果 Updating inventory... /oracle92/app/oracle/product/9.2.0.1/OPatch/opatch.pl version: 1.0.0.0.51 Copyright (c) 2001-2004 Oracle Corporation. All Rights Reserved. OPatch succeeded.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值