1 故障描述
登录上OS使用top发现消耗系统资源较多的进程是ologgerd。
数据库top信息如下:
[root@orcl02 ~]# top
top - 09:51:34 up 1109 days, 17:45, 2 users, load average: 87.29, 83.10, 75.26
Tasks: 690 total, 58 running, 632 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.6 us, 95.2 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 65808320 total, 685060 free, 11065128 used, 54058132 buff/cache
KiB Swap: 2097148 total, 317256 free, 1779892 used. 25127296 avail MemPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5381 root rt -5 780236 162836 69000 S 105.6 0.2 7856:59 ologgerd
32626 grid -2 0 1345116 12068 10596 R 69.4 0.0 5636:23 oracle
2 故障分析
metalink上有类似的文档:Node Eviction due to OLOGGERD High CPU (文档 ID 1636942.1)
Loggerd uses high cpu and do lots of I/O to the disk where the BDB (Berkeley Database used by CHM) resides.
This is due to BUG 13867435 - OLOGGERD USING A LOT OF RESOURCES .
3 故障处理
解决办法是安装升级补丁:
Apply Patch 13867435 - OLOGGERD USING A LOT OF RESOURCES on top of 11.2.0.3.
The bug is fixed in 11.2.0.4 GI PSU.
也可以通过屏蔽资源来解决这个问题:
Ologgerd 是 Oracle 集群健康监视器的一部分,由 Oracle 支持作为 RAC 问题调试的工具。如果 ologgerd 进程占用大量的 CPU,您可以停止它,在两个节点上执行:
crsctl stop resource ora.crf -init
如果你想要永久禁用 ologgerd,那么执行:
crsctl delete resource ora.crf -init
Oracle Ologgerd进程占用过多CPU资源