数据库异常崩溃的元凶--OOM killer

数据库版本是10.2.0.4 ,操作系统是linux as 4u4

最近一个星期出现两次了,一台数据库异常中断,错误如下:

Fri Mar 19 12:15:09 2010
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl_pmon_13421.trc:
ORA-00470: LGWR process terminated with error
Fri Mar 19 12:15:09 2010
PMON: terminating instance due to error 470

[@more@]

trc文件如下:

/u01/app/oracle/admin/orcl/bdump/orcl_pmon_13421.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10201
System name: Linux
Node name: qht113
Release: 2.6.9-55.ELsmp
Version: #1 SMP Fri Apr 20 17:03:35 EDT 2007
Machine: i686
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 2
Unix process pid: 13421, image: oracle@qht113 (PMON)

*** 2010-03-19 12:14:58.874
*** SERVICE NAME:(SYS$BACKGROUND) 2010-03-19 12:14:58.679
*** SESSION ID:(555.1) 2010-03-19 12:14:58.679
Background process LGWR found dead
......

error 470 detected in background process
*** 2010-03-19 12:15:09.299
ORA-00470: LGWR process terminated with error

根据错误提示:LGRW进程出现问题,中断服务了。

查看一下这段时间的OS日志:

# vi /var/log/messages

Mar 19 12:14:03 qht113 kernel: 0 bounce buffer pages
Mar 19 12:14:03 qht113 kernel: Free swap: 9988688kB
Mar 19 12:14:03 qht113 kernel: 1179648 pages of RAM
Mar 19 12:14:03 qht113 kernel: 818784 pages of HIGHMEM
Mar 19 12:14:03 qht113 kernel: 142184 reserved pages
Mar 19 12:14:03 qht113 kernel: 4277680 pages shared
Mar 19 12:14:03 qht113 kernel: 8025 pages swap cached
Mar 19 12:14:03 qht113 kernel: Out of Memory: Killed process 29907 (oracle).
Mar 19 12:14:03 qht113 kernel: oom-killer: gfp_mask=0xd0
Mar 19 12:14:03 qht113 kernel: Mem-info:
Mar 19 12:14:03 qht113 kernel: DMA per-cpu:
Mar 19 12:14:03 qht113 kernel: cpu 0 hot: low 2, high 6, batch 1
Mar 19 12:14:03 qht113 kernel: cpu 0 cold: low 0, high 2, batch 1
Mar 19 12:14:03 qht113 kernel: cpu 1 hot: low 2, high 6, batch 1
Mar 19 12:14:03 qht113 kernel: cpu 1 cold: low 0, high 2, batch 1
Mar 19 12:14:03 qht113 kernel: Normal per-cpu:
Mar 19 12:14:03 qht113 kernel: cpu 0 hot: low 32, high 96, batch 16
Mar 19 12:14:03 qht113 kernel: cpu 0 cold: low 0, high 32, batch 16
Mar 19 12:14:03 qht113 kernel: cpu 1 hot: low 32, high 96, batch 16
Mar 19 12:14:03 qht113 kernel: cpu 1 cold: low 0, high 32, batch 16
Mar 19 12:14:03 qht113 kernel: HighMem per-cpu:
Mar 19 12:14:03 qht113 kernel: cpu 0 hot: low 32, high 96, batch 16
Mar 19 12:14:03 qht113 kernel: cpu 0 cold: low 0, high 32, batch 16
Mar 19 12:14:03 qht113 kernel: cpu 1 hot: low 32, high 96, batch 16
Mar 19 12:14:03 qht113 kernel: cpu 1 cold: low 0, high 32, batch 16
Mar 19 12:14:03 qht113 kernel:
Mar 19 12:14:03 qht113 kernel: Free pages: 202364kB (189056kB HighMem)
Mar 19 12:14:03 qht113 kernel: Active:365041 inactive:591049 dirty:0 writeback:0 unstable:0 free:50591 slab:11215 mapped:392246 pagetables:14438
Mar 19 12:14:03 qht113 kernel: DMA free:12516kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scanned:890 all_unreclaimable? yes
Mar 19 12:14:03 qht113 kernel: protections[]: 0 0 0
Mar 19 12:14:03 qht113 kernel: Normal free:792kB min:928kB low:1856kB high:2784kB active:343544kB inactive:458948kB present:901120kB pages_scanned:1035962 all_unreclaimable? yes
Mar 19 12:14:03 qht113 kernel: protections[]: 0 0 0
Mar 19 12:14:03 qht113 kernel: HighMem free:189056kB min:512kB low:1024kB high:1536kB active:1116620kB inactive:1905248kB present:3801088kB pages_scanned:0 all_unreclaimable? no
Mar 19 12:14:03 qht113 kernel: protections[]: 0 0 0
Mar 19 12:14:03 qht113 kernel: DMA: 1*4kB 2*8kB 3*16kB 3*32kB 3*64kB 3*128kB 2*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12516kB
Mar 19 12:14:03 qht113 kernel: Normal: 12*4kB 13*8kB 2*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 792kB
Mar 19 12:14:03 qht113 kernel: HighMem: 1190*4kB 8601*8kB 6366*16kB 402*32kB 4*64kB 0*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 189056kB
Mar 19 12:14:03 qht113 kernel: Swap cache: add 12012517, delete 12004498, find 5695310/7634496, race 2+35
Mar 19 12:14:03 qht113 kernel: 0 bounce buffer pages
Mar 19 12:14:03 qht113 kernel: Free swap: 9988688kB
Mar 19 12:14:03 qht113 kernel: 1179648 pages of RAM
Mar 19 12:14:03 qht113 kernel: 818784 pages of HIGHMEM
Mar 19 12:14:03 qht113 kernel: 142184 reserved pages
Mar 19 12:14:03 qht113 kernel: 4141749 pages shared
Mar 19 12:14:03 qht113 kernel: 8025 pages swap cached
Mar 19 12:14:03 qht113 kernel: Out of Memory: Killed process 29909 (oracle).
Mar 19 12:14:03 qht113 kernel: oom-killer: gfp_mask=0xd0

看到oracle的进程被oom给杀掉了,很有可能就是她把LGWR进程给杀了;联想想之前给数据库增加表空间时,经常加了一半时,连接的进程突然中断了,一直以为是磁盘的问题,现在看来是内存不足,OOM将其进程杀掉了。

参考metalink文档: Note:228203.1、Note:452000.1、 Note:445163.1说白了 OOM Killer 就是一层保护机制,用于避免 Linux 在内存不足的时候不至于出太严重的问题,把无关紧要的进程杀掉。对于 RHEL 4 ,新增了一个参数: vm.lower_zone_protection 。这个参数默认的单位为 MB,默认 0 的时候,LowMem 为 16MB。建议设置 vm.lower_zone_protection = 200 甚至更大以避免 LowMem 区域的碎片,是绝对能解决这个问题。
解决办法:
# vi /etc/sysctl.conf (增加以下参数)
vm.lower_zone_protection=200
# sysctl -p (让参数生效)

Tip: OOM Killer 的关闭与激活方式:

# echo "0" > /proc/sys/vm/oom-kill

# echo "1" > /proc/sys/vm/oom-kill

refer:

http://hi.baidu.com/edeed/blog/item/03e5cd116ae4b816b9127bde.html

http://www.dbanotes.net/database/linux_outofmemory_oom_killer.html

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/271283/viewspace-1032172/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/271283/viewspace-1032172/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值