oracle开启大页内存频繁发生内存溢出处理

最新推荐文章于 2024-05-27 22:16:55 发布

yqfntx

最新推荐文章于 2024-05-27 22:16:55 发布

阅读量1.7k

点赞数 1

分类专栏： oracle rac 故障处理

本文链接：https://blog.csdn.net/yqfntx/article/details/115554693

版权

oracle rac 故障处理专栏收录该内容

2 篇文章 0 订阅

订阅专栏

1.问题:

一套centos 6.8 + oracle 11.2.0.1 RAC的环境。这几天连续出现2台主机分别重启的现象。

2.排查过程:

数据库alert日志如下:
opiodr aborting process unknown ospid (65714) as a result of ORA-609
Fri Apr 02 04:19:48 2021
Process q002 died, see its trace file
Fri Apr 02 04:34:57 2021
PMON (ospid: 11122): terminating the instance due to error 471
Fri Apr 02 04:34:57 2021
System state dump is made for local instance
System State dumped to trace file /home/app/oracle/diag/rdbms/xxdb/xxdb1/trace/xxdb1_diag_11130.trc
Fri Apr 02 04:34:58 2021
ORA-1092 : opitsk aborting process
Fri Apr 02 04:34:58 2021
License high water mark = 64
Instance terminated by PMON, pid = 11122
USER (ospid: 66586): terminating the instance
Instance terminated by USER, pid = 66586
Fri Apr 02 04:35:33 2021
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0

操作系统日志如下:

Apr 1 11:14:05 XXRAC01 ntpd[8971]: 0.0.0.0 0515 05 clock_sync
Apr 1 15:07:38 XXRAC01 sshd[24383]: Accepted password for oracle from 192.168.108.80 port 57869 ssh2
Apr 2 04:19:06 XXRAC01 kernel: INFO: task jbd2/dm-11-8:6568 blocked for more than 120 seconds.
Apr 2 04:19:06 XXRAC01 kernel:      Not tainted 2.6.32-642.el6.x86_64 #1
Apr 2 04:19:06 XXRAC01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 2 04:19:06 XXRAC01 kernel: jbd2/dm-11-8 D 0000000000000032     0 6568      2 0x00000000
Apr 2 04:19:06 XXRAC01 kernel: ffff884017f23d20 0000000000000046 0000000000000000 ffff884017f23ce4
Apr 2 04:19:06 XXRAC01 kernel: ffff880100000000 ffff88407fe87600 000038b1992f04b9 ffff880191b56ec0
Apr 2 04:19:06 XXRAC01 kernel: 0000000000000200 0000000103b37a02 ffff88401339b068 ffff884017f23fd8
Apr 2 04:19:06 XXRAC01 kernel: Call Trace:
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffffa04c080f>] jbd2_journal_commit_transaction+0x19f/0x14f0 [jbd2]
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff8108f14c>] ? lock_timer_base+0x3c/0x70
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff810a6ac0>] ? autoremove_wake_function+0x0/0x40
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffffa04c6a38>] kjournald2+0xb8/0x220 [jbd2]
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff810a6ac0>] ? autoremove_wake_function+0x0/0x40
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffffa04c6980>] ? kjournald2+0x0/0x220 [jbd2]
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff810a662e>] kthread+0x9e/0xc0
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff810a6590>] ? kthread+0x0/0xc0
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
Apr 2 04:19:06 XXRAC01 kernel: INFO: task oraagent.bin:9962 blocked for more than 120 seconds.
Apr 2 04:19:06 XXRAC01 kernel:      Not tainted 2.6.32-642.el6.x86_64 #1
Apr 2 04:19:06 XXRAC01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 2 04:19:06 XXRAC01 kernel: oraagent.bin D 0000000000000015     0 9962      1 0x00000000
Apr 2 04:19:06 XXRAC01 kernel: ffff881a456aba88 0000000000000086 0000000000000000 ffffffff8117fb38
Apr 2 04:19:06 XXRAC01 kernel: ffff881a456aba70 ffff8814f3b79000 000038b1a838d49b ffff8840103ec7e0
Apr 2 04:19:06 XXRAC01 kernel: ffff881a456aba28 0000000103b37ab5 ffff884017215068 ffff881a456abfd8
Apr 2 04:19:06 XXRAC01 kernel: Call Trace:
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff8117fb38>] ? ____cache_alloc_node+0x108/0x160
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffffa04befca>] start_this_handle+0x25a/0x480 [jbd2]
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff810a6ac0>] ? autoremove_wake_function+0x0/0x40
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffffa04bf3d5>] jbd2_journal_start+0xb5/0x100 [jbd2]
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffffa04fef96>] ext4_journal_start_sb+0x56/0xe0 [ext4]
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffffa04e7f6a>] ext4_dirty_inode+0x2a/0x60 [ext4]
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff811c695b>] __mark_inode_dirty+0x3b/0x160
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff811b7032>] file_update_time+0xf2/0x170
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff811309e0>] __generic_file_aio_write+0x230/0x490
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff81130cc8>] generic_file_aio_write+0x88/0x100
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffffa04e1e08>] ext4_file_write+0x58/0x190 [ext4]
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff81199b4a>] do_sync_write+0xfa/0x140
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff810a6ac0>] ? autoremove_wake_function+0x0/0x40
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff8123ac26>] ? security_file_permission+0x16/0x20
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff81199e48>] vfs_write+0xb8/0x1a0
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff8119b35f>] ? fget_light_pos+0x3f/0x50
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff8119a981>] sys_write+0x51/0xb0
Apr 2 04:19:06 XXRAC01 kernel: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
Apr 2 04:19:06 XXRAC01 kernel: INFO: task cssdmonitor:10000 blocked for more than 120 seconds.
Apr 2 04:19:06 XXRAC01 kernel:      Not tainted 2.6.32-642.el6.x86_64 #1
Apr 2 04:19:06 XXRAC01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 2 04:19:06 XXRAC01 kernel: cssdmonitor   D 0000000000000020     0 10000      1 0x00000000
Apr 2 04:19:06 XXRAC01 kernel: ffff881a4573ba88 0000000000000086 0000000000000000 0000000200000000
Apr 2 04:19:06 XXRAC01 kernel: 000000010000001e ffff880191dd0b00 000038b2702f3525 0000000000000008
Apr 2 04:19:06 XXRAC01 kernel: 0000000000016ec0 0000000103b3883c ffff88401cb27ad8 ffff881a4573bfd8
Apr 2 04:19:06 XXRAC01 kernel: Call Trace:

4:27:42 XXRAC01 kernel: Out of memory: Kill process 65090 (oracle) score 111 or sacrifice child
Apr 2 04:27:42 XXRAC01 kernel: Killed process 65090, UID 54321, (oracle) total-vm:157506752kB, anon-rss:6100kB, file-rss:31659480kB
Apr 2 04:27:55 XXRAC01 kernel: oracle invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Apr 2 04:27:55 XXRAC01 kernel: oracle cpuset=/ mems_allowed=0
Apr 2 04:27:55 XXRAC01 kernel: Pid: 11224, comm: oracle Not tainted 2.6.32-642.el6.x86_64 #1
Apr 2 04:27:55 XXRAC01 kernel: Call Trace:

查看内存使用情况 free -g ，空闲内存较少。（这是发生重启后看的）

# free -g
             total       used       free     shared    buffers     cached
Mem:           252        157         94          0          0          2
-/+ buffers/cache:        155         97
Swap:           19          0         19

查看大页使用情况:

#cat /proc/meminfo |grep Huge

AnonHugePages:         0 kB
HugePages_Total:   76900
HugePages_Free:   76900
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

#cat /etc/security/limits.conf

oracle hard memlock 118536840

从上面可知，系统已经锁定150G大页内存，oracle 无法使用到大页内存。

数据库SGA大小

SQL> show parameter sga

NAME                     TYPE   VALUE
------------------------------------ ----------- ------------------------------
lock_sga                 boolean   FALSE
pre_page_sga                 boolean   FALSE
sga_max_size                 big integer 150G
sga_target                 big integer 150G
SQL> show parameter pga

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
pga_aggregate_target big integer 40G

当数据库分配的SGA使用到100G时，系统会内存溢出，发生OOM杀死oracle进程，同时由于内存不足，导致操作系统重启。

3.解决方法

因此，调整limits.conf以下参数（比大页内存大），可以避免再次发生内存溢出

#cat /etc/security/limits.conf

oracle soft memlock 209715200
oracle hard memlock 209715200

最后调整完后重启数据库即可。

使用一段时间后，观察到内存变化

# cat /proc/meminfo |grep Huge
AnonHugePages:         0 kB
HugePages_Total:   76900
HugePages_Free:     8075
HugePages_Rsvd:     7976
HugePages_Surp:        0
Hugepagesize:       2048 kB

yqfntx

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
oracle开启大页内存频繁发生内存溢出处理

1.问题: 一套centos 6.8 + oracle 11.2.0.1 RAC的环境。这几天连续出现2台主机分别重启的现象。2.排查过程:数据库alert日志如下: opiodr aborting process unknown ospid (65714) as a result of ORA-609Fri Apr 02 04:19:48 2021Process q002 died, see its trace fileFri Apr 02 04:34:57 202...
复制链接

扫一扫

专栏目录