数据库版本是10.2.0.4 ,操作系统是linux as 4u4
最近一个星期出现两次了,一台数据库异常中断,错误如下:
Fri Mar 19 12:15:09 2010
Errors in file /u01/app/oracle/admin/orcl/bdump/orcl_pmon_13421.trc:
ORA-00470: LGWR process terminated with error
Fri Mar 19 12:15:09 2010
PMON: terminating instance due to error 470
trc文件如下:
/u01/app/oracle/admin/orcl/bdump/orcl_pmon_13421.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/10201
System name: Linux
Node name: qht113
Release: 2.6.9-55.ELsmp
Version: #1 SMP Fri Apr 20 17:03:35 EDT 2007
Machine: i686
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 2
Unix process pid: 13421, image: oracle@qht113 (PMON)
*** 2010-03-19 12:14:58.874
*** SERVICE NAME:(SYS$BACKGROUND) 2010-03-19 12:14:58.679
*** SESSION ID:(555.1) 2010-03-19 12:14:58.679
Background process LGWR found dead
......
error 470 detected in background process
*** 2010-03-19 12:15:09.299
ORA-00470: LGWR process terminated with error
根据错误提示:LGRW进程出现问题,中断服务了。
查看一下这段时间的OS日志:
# vi /var/log/messages
Mar 19 12:14:03 qht113 kernel: 0 bounce buffer pages
Mar 19 12:14:03 qht113 kernel: Free swap: 9988688kB
Mar 19 12:14:03 qht113 kernel: 1179648 pages of RAM
Mar 19 12:14:03 qht113 kernel: 818784 pages of HIGHMEM
Mar 19 12:14:03 qht113 kernel: 142184 reserved pages
Mar 19 12:14:03 qht113 kernel: 4277680 pages shared
Mar 19 12:14:03 qht113 kernel: 8025 pages swap cached
Mar 19 12:14:03 qht113 kernel: Out of Memory: Killed process 29907 (oracle).
Mar 19 12:14:03 qht113 kernel: oom-killer: gfp_mask=0xd0
Mar 19 12:14:03 qht113 kernel: Mem-info:
Mar 19 12:14:03 qht113 kernel: DMA per-cpu:
Mar 19 12:14:03 qht113 kernel: cpu 0 hot: low 2, high 6, batch 1
Mar 19 12:14:03 qht113 kernel: cpu 0 cold: low 0, high 2, batch 1
Mar 19 12:14:03 qht113 kernel: cpu 1 hot: low 2, high 6, batch 1
Mar 19 12:14:03 qht113 kernel: cpu 1 cold: low 0, high 2, batch 1
Mar 19 12:14:03 qht113 kernel: Normal per-cpu:
Mar 19 12:14:03 qht113 kernel: cpu 0 hot: low 32, high 96, batch 16
Mar 19 12:14:03 qht113 kernel: cpu 0 cold: low 0, high 32, batch 16
Mar 19 12:14:03 qht113 kernel: cpu 1 hot: low 32, high 96, batch 16
Mar 19 12:14:03 qht113 kernel: cpu 1 cold: low 0, high 32, batch 16
Mar 19 12:14:03 qht113 kernel: HighMem per-cpu:
Mar 19 12:14:03 qht113 kernel: cpu 0 hot: low 32, high 96, batch 16
Mar 19 12:14:03 qht113 kernel: cpu 0 cold: low 0, high 32, batch 16
Mar 19 12:14:03 qht113 kernel: cpu 1 hot: low 32, high 96, batch 16
Mar 19 12:14:03 qht113 kernel: cpu 1 cold: low 0, high 32, batch 16
Mar 19 12:14:03 qht113 kernel:
Mar 19 12:14:03 qht113 kernel: Free pages: 202364kB (189056kB HighMem)
Mar 19 12:14:03 qht113 kernel: Active:365041 inactive:591049 dirty:0 writeback:0 unstable:0 free:50591 slab:11215 mapped:392246 pagetables:14438
Mar 19 12:14:03 qht113 kernel: DMA free:12516kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scanned:890 all_unreclaimable? yes
Mar 19 12:14:03 qht113 kernel: protections[]: 0 0 0
Mar 19 12:14:03 qht113 kernel: Normal free:792kB min:928kB low:1856kB high:2784kB active:343544kB inactive:458948kB present:901120kB pages_scanned:1035962 all_unreclaimable? yes
Mar 19 12:14:03 qht113 kernel: protections[]: 0 0 0
Mar 19 12:14:03 qht113 kernel: HighMem free:189056kB min:512kB low:1024kB high:1536kB active:1116620kB inactive:1905248kB present:3801088kB pages_scanned:0 all_unreclaimable? no
Mar 19 12:14:03 qht113 kernel: protections[]: 0 0 0
Mar 19 12:14:03 qht113 kernel: DMA: 1*4kB 2*8kB 3*16kB 3*32kB 3*64kB 3*128kB 2*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12516kB
Mar 19 12:14:03 qht113 kernel: Normal: 12*4kB 13*8kB 2*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 792kB
Mar 19 12:14:03 qht113 kernel: HighMem: 1190*4kB 8601*8kB 6366*16kB 402*32kB 4*64kB 0*128kB 2*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 189056kB
Mar 19 12:14:03 qht113 kernel: Swap cache: add 12012517, delete 12004498, find 5695310/7634496, race 2+35
Mar 19 12:14:03 qht113 kernel: 0 bounce buffer pages
Mar 19 12:14:03 qht113 kernel: Free swap: 9988688kB
Mar 19 12:14:03 qht113 kernel: 1179648 pages of RAM
Mar 19 12:14:03 qht113 kernel: 818784 pages of HIGHMEM
Mar 19 12:14:03 qht113 kernel: 142184 reserved pages
Mar 19 12:14:03 qht113 kernel: 4141749 pages shared
Mar 19 12:14:03 qht113 kernel: 8025 pages swap cached
Mar 19 12:14:03 qht113 kernel: Out of Memory: Killed process 29909 (oracle).
Mar 19 12:14:03 qht113 kernel: oom-killer: gfp_mask=0xd0
看到oracle的进程被oom给杀掉了,很有可能就是她把LGWR进程给杀了;联想想之前给数据库增加表空间时,经常加了一半时,连接的进程突然中断了,一直以为是磁盘的问题,现在看来是内存不足,OOM将其进程杀掉了。
参考metalink文档: Note:228203.1、Note:452000.1、 Note:445163.1说白了 OOM Killer 就是一层保护机制,用于避免 Linux 在内存不足的时候不至于出太严重的问题,把无关紧要的进程杀掉。对于 RHEL 4 ,新增了一个参数: vm.lower_zone_protection 。这个参数默认的单位为 MB,默认 0 的时候,LowMem 为 16MB。建议设置 vm.lower_zone_protection = 200 甚至更大以避免 LowMem 区域的碎片,是绝对能解决这个问题。解决办法:
# vi /etc/sysctl.conf (增加以下参数)
vm.lower_zone_protection=200
# sysctl -p (让参数生效)
Tip: OOM Killer 的关闭与激活方式:
# echo "0" > /proc/sys/vm/oom-kill
# echo "1" > /proc/sys/vm/oom-kill
refer:
http://hi.baidu.com/edeed/blog/item/03e5cd116ae4b816b9127bde.html
http://www.dbanotes.net/database/linux_outofmemory_oom_killer.html
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/271283/viewspace-1032172/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/271283/viewspace-1032172/