oracle开启大页内存频繁发生内存溢出处理

1.问题:

     一套centos 6.8  + oracle 11.2.0.1 RAC的环境。 这几天连续出现2台主机分别重启的现象。

 

2.排查过程:

数据库alert日志如下:  
opiodr aborting process unknown ospid (65714) as a result of ORA-609
Fri Apr 02 04:19:48 2021
Process q002 died, see its trace file
Fri Apr 02 04:34:57 2021
PMON (ospid: 11122): terminating the instance due to error 471
Fri Apr 02 04:34:57 2021
System state dump is made for local instance
System State dumped to trace file /home/app/oracle/diag/rdbms/xxdb/xxdb1/trace/xxdb1_diag_11130.trc
Fri Apr 02 04:34:58 2021
ORA-1092 : opitsk aborting process
Fri Apr 02 04:34:58 2021
License high water mark = 64
Instance terminated by PMON, pid = 11122
USER (ospid: 66586): terminating the instance
Instance terminated by USER, pid = 66586
Fri Apr 02 04:35:33 2021
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0

操作系统日志如下:

Apr  1 11:14:05 XXRAC01 ntpd[8971]: 0.0.0.0 0515 05 clock_sync
Apr  1 15:07:38 XXRAC01 sshd[24383]: Accepted password for oracle from 192.168.108.80 port 57869 ssh2
Apr  2 04:19:06 XXRAC01 kernel: INFO: task jbd2/dm-11-8:6568 blocked for more than 120 seconds.
Apr  2 04:19:06 XXRAC01 kernel:      Not tainted 2.6.32-642.el6.x86_64 #1
Apr  2 04:19:06 XXRAC01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  2 04:19:06 XXRAC01 kernel: jbd2/dm-11-8  D 0000000000000032     0  6568      2 0x00000000
Apr  2 04:19:06 XXRAC01 kernel: ffff884017f23d20 0000000000000046 0000000000000000 ffff884017f23ce4
Apr  2 04:19:06 XXRAC01 kernel: ffff880100000000 ffff88407fe87600 000038b1992f04b9 ffff880191b56ec0
Apr  2 04:19:06 XXRAC01 kernel: 0000000000000200 0000000103b37a02 ffff88401339b068 ffff884017f23fd8
Apr  2 04:19:06 XXRAC01 kernel: Call Trace:
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffffa04c080f>] jbd2_journal_commit_transaction+0x19f/0x14f0 [jbd2]
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff8108f14c>] ? lock_timer_base+0x3c/0x70
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff810a6ac0>] ? autoremove_wake_function+0x0/0x40
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffffa04c6a38>] kjournald2+0xb8/0x220 [jbd2]
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff810a6ac0>] ? autoremove_wake_function+0x0/0x40
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffffa04c6980>] ? kjournald2+0x0/0x220 [jbd2]
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff810a662e>] kthread+0x9e/0xc0
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff810a6590>] ? kthread+0x0/0xc0
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
Apr  2 04:19:06 XXRAC01 kernel: INFO: task oraagent.bin:9962 blocked for more than 120 seconds.
Apr  2 04:19:06 XXRAC01 kernel:      Not tainted 2.6.32-642.el6.x86_64 #1
Apr  2 04:19:06 XXRAC01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  2 04:19:06 XXRAC01 kernel: oraagent.bin  D 0000000000000015     0  9962      1 0x00000000
Apr  2 04:19:06 XXRAC01 kernel: ffff881a456aba88 0000000000000086 0000000000000000 ffffffff8117fb38
Apr  2 04:19:06 XXRAC01 kernel: ffff881a456aba70 ffff8814f3b79000 000038b1a838d49b ffff8840103ec7e0
Apr  2 04:19:06 XXRAC01 kernel: ffff881a456aba28 0000000103b37ab5 ffff884017215068 ffff881a456abfd8
Apr  2 04:19:06 XXRAC01 kernel: Call Trace:
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff8117fb38>] ? ____cache_alloc_node+0x108/0x160
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffffa04befca>] start_this_handle+0x25a/0x480 [jbd2]
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff810a6ac0>] ? autoremove_wake_function+0x0/0x40
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffffa04bf3d5>] jbd2_journal_start+0xb5/0x100 [jbd2]
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffffa04fef96>] ext4_journal_start_sb+0x56/0xe0 [ext4]
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffffa04e7f6a>] ext4_dirty_inode+0x2a/0x60 [ext4]
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff811c695b>] __mark_inode_dirty+0x3b/0x160
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff811b7032>] file_update_time+0xf2/0x170
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff811309e0>] __generic_file_aio_write+0x230/0x490
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff81130cc8>] generic_file_aio_write+0x88/0x100
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffffa04e1e08>] ext4_file_write+0x58/0x190 [ext4]
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff81199b4a>] do_sync_write+0xfa/0x140
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff810a6ac0>] ? autoremove_wake_function+0x0/0x40
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff8123ac26>] ? security_file_permission+0x16/0x20
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff81199e48>] vfs_write+0xb8/0x1a0
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff8119b35f>] ? fget_light_pos+0x3f/0x50
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff8119a981>] sys_write+0x51/0xb0
Apr  2 04:19:06 XXRAC01 kernel: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
Apr  2 04:19:06 XXRAC01 kernel: INFO: task cssdmonitor:10000 blocked for more than 120 seconds.
Apr  2 04:19:06 XXRAC01 kernel:      Not tainted 2.6.32-642.el6.x86_64 #1
Apr  2 04:19:06 XXRAC01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  2 04:19:06 XXRAC01 kernel: cssdmonitor   D 0000000000000020     0 10000      1 0x00000000
Apr  2 04:19:06 XXRAC01 kernel: ffff881a4573ba88 0000000000000086 0000000000000000 0000000200000000
Apr  2 04:19:06 XXRAC01 kernel: 000000010000001e ffff880191dd0b00 000038b2702f3525 0000000000000008
Apr  2 04:19:06 XXRAC01 kernel: 0000000000016ec0 0000000103b3883c ffff88401cb27ad8 ffff881a4573bfd8
Apr  2 04:19:06 XXRAC01 kernel: Call Trace:


4:27:42 XXRAC01 kernel: Out of memory: Kill process 65090 (oracle) score 111 or sacrifice child
Apr  2 04:27:42 XXRAC01 kernel: Killed process 65090, UID 54321, (oracle) total-vm:157506752kB, anon-rss:6100kB, file-rss:31659480kB
Apr  2 04:27:55 XXRAC01 kernel: oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Apr  2 04:27:55 XXRAC01 kernel: oracle cpuset=/ mems_allowed=0
Apr  2 04:27:55 XXRAC01 kernel: Pid: 11224, comm: oracle Not tainted 2.6.32-642.el6.x86_64 #1
Apr  2 04:27:55 XXRAC01 kernel: Call Trace:

 

 

查看内存使用情况 free  -g ,空闲内存较少。(这是发生重启后看的)

# free -g
             total       used       free     shared    buffers     cached
Mem:           252        157         94          0          0          2
-/+ buffers/cache:        155         97
Swap:           19          0         19

 

查看大页使用情况:

#cat /proc/meminfo |grep Huge

AnonHugePages:         0 kB
HugePages_Total:   76900
HugePages_Free:   76900

HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

 

#cat /etc/security/limits.conf

oracle  hard  memlock  118536840

 

从上面可知,系统已经锁定150G大页内存,oracle 无法使用到大页内存。

 数据库SGA大小

SQL> show parameter sga

NAME                     TYPE     VALUE
------------------------------------ ----------- ------------------------------
lock_sga                 boolean     FALSE
pre_page_sga                 boolean     FALSE
sga_max_size                 big integer 150G
sga_target                 big integer 150G

SQL> show parameter pga

NAME                     TYPE     VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target             big integer 40G

 

当数据库分配的SGA使用到100G时,系统会内存溢出,发生OOM杀死oracle进程,同时由于内存不足,导致操作系统重启。

 

3.解决方法

因此,调整limits.conf以下参数(比大页内存大),可以避免再次发生内存溢出

#cat /etc/security/limits.conf

oracle  soft  memlock  209715200
oracle  hard  memlock  209715200

 

最后调整完后重启数据库即可。

 

使用一段时间后,观察到内存变化

# cat /proc/meminfo |grep Huge
AnonHugePages:         0 kB
HugePages_Total:   76900
HugePages_Free:     8075
HugePages_Rsvd:     7976
HugePages_Surp:        0
Hugepagesize:       2048 kB

 

 

 

 

 

 

 

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值