ologgerd进程导致数据库负载高

cWasAAAAASUVORK5CYII=

 

客户一台主机load超过50,响应很慢,发现进程27941消耗比率高

经过metalink 查询发现这个是oracle 的一个bug,数据库版本是11.2.0.4

Bug信息如下:

    
 

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.3 and later
Information in this document applies to any platform.

Symptoms

A node is evicted from the cluster due to network communication error.

GI Alert log reports following errors and one node gets evicted:  

Node 1 GI Alert log
------------------------
CRS-1612:Network communication with node prodrac2(2) missing for 50% of timeout interval. Removal of this node from cluster in 29.240 seconds
..
CRS-1610:Network communication with node prodrac2 (2) missing for 90% of timeout interval. Removal of this node from cluster in 3.740 seconds
CRS-1607:Node utx2db02 is being evicted in cluster incarnation 278185525; details at (:CSSNM00007:) in /orabase1/app/11.2.0.3/grid_6/log/utx2db01/cssd/ocssd.log.

 

Node 2 GI Alert log
------------------------
CRS-1610:Network communication with node prodrac1 (1) missing for 90% of timeout interval. Removal of this node from cluster in 3.740 seconds
CRS-1609:This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /orabase1/app/11.2.0.3/grid_6/log/utx2db02/cssd/ocssd.log.
CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /orabase1/app/11.2.0.3/grid_6/log/utx2db02/cssd/ocssd.log

 

Top output shows that Cluster Health Monitor (CHM) daemon ologgerd using high CPU and starts spinning before the reboot

 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 
31439 root RT 0 375m 142m 58m S 161.8 0.1 359:42.49 /orabase1/app/11.2.0.3/grid_6/bin/ologgerd -m utx2db01 -r -d /orabase1/app/11.2.0.3/grid_


The call stack also shows page allocation failure for ologgerd:

Jan 14 03:16:16 utx2db02 kernel: Free swap = 13181784kB
Jan 14 03:16:21 utx2db02 kernel: Total swap = 25165816kB
Jan 14 03:16:30 utx2db02 kernel: ologgerd: page allocation failure. order:4, mode:0xd0
Jan 14 03:16:35 utx2db02 kernel: Pid: 31475, comm: ologgerd Not tainted 2.6.32-400.21.1.el5uek #1 <<<<<<<<<<
Jan 14 03:16:40 utx2db02 kernel: Call Trace:
Jan 14 03:16:44 utx2db02 kernel: [] __alloc_pages_nodemask+0x524/0x595
Jan 14 03:17:01 utx2db02 kernel: [] kmem_getpages+0x4f/0xf4
Jan 14 03:17:05 utx2db02 kernel: [] fallback_alloc+0x12e/0x1ce
Jan 14 03:17:06 utx2db02 kernel: [] ____cache_alloc_node+0x121/0x134
Jan 14 03:17:07 utx2db02 kernel: [] kmem_cache_alloc_node_notrace+0x84/0xb9
Jan 14 03:17:09 utx2db02 kernel: [] __kmalloc_node+0x46/0x73
Jan 14 03:17:13 utx2db02 kernel: [] ? __alloc_skb+0x72/0x13d
Jan 14 03:17:13 utx2db02 kernel: [] __alloc_skb+0x72/0x13d
Jan 14 03:17:15 utx2db02 kernel: [] sk_stream_alloc_skb+0x3d/0xaf
Jan 14 03:17:16 utx2db02 kernel: [] tcp_sendmsg+0x176/0x6cf
Jan 14 03:17:16 utx2db02 kernel: [] __sock_sendmsg+0x5e/0x67
Jan 14 03:17:18 utx2db02 kernel: [] sock_sendmsg+0xcc/0xe5
Jan 14 03:17:19 utx2db02 kernel: [] ? radix_tree_delete+0xf1/0x194
Jan 14 03:17:20 utx2db02 kernel: [] ? autoremove_wake_function+0x0/0x3d
Jan 14 03:17:21 utx2db02 kernel: [] ? security_sk_alloc+0x16/0x18
Jan 14 03:17:23 utx2db02 kernel: [] ? fget_light+0x58/0x73
Jan 14 03:17:25 utx2db02 kernel: [] ? sockfd_lookup_light+0x20/0x58
Jan 14 03:17:26 utx2db02 kernel: [] sys_sendto+0x12f/0x171
Jan 14 03:17:27 utx2db02 kernel: [] ? audit_syscall_entry+0x103/0x12f
Jan 14 03:17:31 utx2db02 kernel: [] system_call_fastpath+0x16/0x1b

 

Changes

 None.

Cause

Loggerd uses high cpu and do lots of I/O to the disk where the BDB (Berkeley Database used by CHM) resides.

This is due to BUG 13867435 - OLOGGERD USING A LOT OF RESOURCES .

 

Solution

Apply Patch 13867435 - OLOGGERD USING A LOT OF RESOURCES on top of 11.2.0.3.

The bug is fixed in 11.2.0.4 GI PSU.

 

我的处理方式如下 :

1. grid@woqurac1:/home/grid>crsctl stat res -t -init

--------------------------------------------------------------------------------

NAME           TARGET  STATE        SERVER                   STATE_DETAILS      

--------------------------------------------------------------------------------

Cluster Resources

--------------------------------------------------------------------------------

ora.asm

      1        ONLINE  OFFLINE                               Instance Shutdown  

ora.cluster_interconnect.haip

      1        ONLINE  OFFLINE                                                  

ora.crf

      1        ONLINE  ONLINE       rac1     --查看该资源                              

ora.crsd

      1        ONLINE  OFFLINE                                                  

ora.cssd

      1        ONLINE  OFFLINE                                                  

ora.cssdmonitor

      1        ONLINE  ONLINE       rac1                                    

ora.ctssd

      1        ONLINE  OFFLINE                                                   

ora.diskmon

      1        OFFLINE OFFLINE                                                  

ora.drivers.acfs

      1        ONLINE  ONLINE       rac1                                    

ora.evmd

      1        ONLINE  OFFLINE                                                  

ora.gipcd

      1        ONLINE  ONLINE       rac1                                    

ora.gpnpd

      1        ONLINE  ONLINE       rac1                                    

ora.mdnsd

      1        ONLINE  ONLINE       rac1

2.在每个节点执行如下脚本(root用户执行)

# <GI_HOME>/bin/crsctl stop res ora.crf -init

# <GI_HOME>/bin/crsctl modify res ora.crf -attr ENABLED=0 -init

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/31134212/viewspace-2093362/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/31134212/viewspace-2093362/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值