ologgerd进程导致数据库负载高

最新推荐文章于 2021-04-08 20:27:24 发布

congmo2314

最新推荐文章于 2021-04-08 20:27:24 发布

阅读量577

点赞数

cWasAAAAASUVORK5CYII=

客户一台主机load超过50，响应很慢，发现进程27941消耗比率高。

经过metalink 查询发现这个是oracle 的一个bug,数据库版本是11.2.0.4

Bug信息如下：

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.3 and later
Information in this document applies to any platform.

Symptoms

A node is evicted from the cluster due to network communication error.

GI Alert log reports following errors and one node gets evicted:

Node 1 GI Alert log
------------------------
CRS-1612:Network communication with node prodrac2(2) missing for 50% of timeout interval. Removal of this node from cluster in 29.240 seconds
..
CRS-1610:Network communication with node prodrac2 (2) missing for 90% of timeout interval. Removal of this node from cluster in 3.740 seconds
CRS-1607:Node utx2db02 is being evicted in cluster incarnation 278185525; details at (:CSSNM00007:) in /orabase1/app/11.2.0.3/grid_6/log/utx2db01/cssd/ocssd.log.

Node 2 GI Alert log
------------------------
CRS-1610:Network communication with node prodrac1 (1) missing for 90% of timeout interval. Removal of this node from cluster in 3.740 seconds
CRS-1609:This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /orabase1/app/11.2.0.3/grid_6/log/utx2db02/cssd/ocssd.log.
CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /orabase1/app/11.2.0.3/grid_6/log/utx2db02/cssd/ocssd.log

Top output shows that Cluster Health Monitor (CHM) daemon ologgerd using high CPU and starts spinning before the reboot

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31439 root RT 0 375m 142m 58m S 161.8 0.1 359:42.49 /orabase1/app/11.2.0.3/grid_6/bin/ologgerd -m utx2db01 -r -d /orabase1/app/11.2.0.3/grid_

The call stack also shows page allocation failure for ologgerd:

Jan 14 03:16:16 utx2db02 kernel: Free swap = 13181784kB
Jan 14 03:16:21 utx2db02 kernel: Total swap = 25165816kB
Jan 14 03:16:30 utx2db02 kernel: ologgerd: page allocation failure. order:4, mode:0xd0
Jan 14 03:16:35 utx2db02 kernel: Pid: 31475, comm: ologgerd Not tainted 2.6.32-400.21.1.el5uek #1 <<<<<<<<<<
Jan 14 03:16:40 utx2db02 kernel: Call Trace:
Jan 14 03:16:44 utx2db02 kernel: [] __alloc_pages_nodemask+0x524/0x595
Jan 14 03:17:01 utx2db02 kernel: [] kmem_getpages+0x4f/0xf4
Jan 14 03:17:05 utx2db02 kernel: [] fallback_alloc+0x12e/0x1ce
Jan 14 03:17:06 utx2db02 kernel: [] ____cache_alloc_node+0x121/0x134
Jan 14 03:17:07 utx2db02 kernel: [] kmem_cache_alloc_node_notrace+0x84/0xb9
Jan 14 03:17:09 utx2db02 kernel: [] __kmalloc_node+0x46/0x73
Jan 14 03:17:13 utx2db02 kernel: [] ? __alloc_skb+0x72/0x13d
Jan 14 03:17:13 utx2db02 kernel: [] __alloc_skb+0x72/0x13d
Jan 14 03:17:15 utx2db02 kernel: [] sk_stream_alloc_skb+0x3d/0xaf
Jan 14 03:17:16 utx2db02 kernel: [] tcp_sendmsg+0x176/0x6cf
Jan 14 03:17:16 utx2db02 kernel: [] __sock_sendmsg+0x5e/0x67
Jan 14 03:17:18 utx2db02 kernel: [] sock_sendmsg+0xcc/0xe5
Jan 14 03:17:19 utx2db02 kernel: [] ? radix_tree_delete+0xf1/0x194
Jan 14 03:17:20 utx2db02 kernel: [] ? autoremove_wake_function+0x0/0x3d
Jan 14 03:17:21 utx2db02 kernel: [] ? security_sk_alloc+0x16/0x18
Jan 14 03:17:23 utx2db02 kernel: [] ? fget_light+0x58/0x73
Jan 14 03:17:25 utx2db02 kernel: [] ? sockfd_lookup_light+0x20/0x58
Jan 14 03:17:26 utx2db02 kernel: [] sys_sendto+0x12f/0x171
Jan 14 03:17:27 utx2db02 kernel: [] ? audit_syscall_entry+0x103/0x12f
Jan 14 03:17:31 utx2db02 kernel: [] system_call_fastpath+0x16/0x1b