ilo芯片触发服务器死机,H3C案例查看:某局点一台HPE ProLiant DL380 Gen9服务器间歇性死机...

硬件的AHS日志未发现任何报警信息,在服务器死机的时间段也没有发现异常,只记录有在死机发生后不久的人为触发的服务器重启记录。另外,服务器的BIOS和P440ar控制器固件版本稍微低些,不是最新的。

分析操作系统的SOSReprot日志发现在服务器死机之前的时间段有OOM(Out Of Memory)内存溢出记录,最好发现是用户l3fw进程导致的,关闭l3fw进程后,故障未复现,确认是由于用户自己的进程导致内存溢出,最后产生的服务器死机无响应的问题,与服务器硬件无关。

具体的日志分析过程如下:

1.13,14,15,16,17号messages日志里都记录有大量的内存溢出而杀死l3fw进程,如下:

Mar 13 19:43:03 localhost kernel: Out of memory: Kill process 8676 (l3fw) score 982 or sacrifice child

Mar 13 19:43:03 localhost kernel:Killed process 8676 (l3fw)total-vm:97243004kB, anon-rss:63878444kB, file-rss:0kB

Mar 13 19:43:03 localhost kernel: l3fw: page allocation failure: order:0, mode:0x2015a

Mar 13 19:43:03 localhost kernel: CPU: 0 PID: 8676 Comm: l3fw Not tainted 3.10.0-123.el7.x86_64 #1

Mar 14 09:27:18 localhost kernel:Out of memory: Kill process 4748 (l3fw) score 982 or sacrifice child

Mar 14 09:27:18 localhost kernel:Killed process 4748 (l3fw) total-vm:97241980kB, anon-rss:63826664kB, file-rss:0kB

Mar 15 13:21:31localhost kernel: Out of memory: Kill process 7628 (l3fw) score 981 or sacrifice child

Mar 15 13:21:31 localhost kernel:Killed process 7628 (l3fw) total-vm:97111932kB, anon-rss:63811384kB, file-rss:356kB

Mar 16 10:44:47 localhost kernel: Out of memory: Kill process 12456 (l3fw)score 980 or sacrifice child

Mar 16 10:44:47 localhost kernel: Killed process 12456 (l3fw)total-vm:97045372kB, anon-rss:63801988kB, file-rss:0kB

Mar 17 10:42:41 localhost kernel:Out of memory: Kill process 6881 (l3fw) score 980 or sacrifice child

Mar 17 10:42:41 localhost kernel: Killed process 6881 (l3fw) total-vm:96980860kB, anon-rss:63894712kB, file-rss:564kB

2.内存的溢出导致了机器系统无相应,但是硬件没有任何报错的产生,以13号日志为例:

Mar 13 19:43:03localhost kernel: l3fw invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0

Mar 13 19:43:03 localhost kernel: l3fw cpuset=/ mems_allowed=0-1

Mar 13 19:43:03 localhost kernel: CPU: 13 PID: 8696 Comm: l3fw Not tainted 3.10.0-123.el7.x86_64 #1

Mar 13 19:43:03 localhost kernel: active_anon:15173182 inactive_anon:979308 isolated_anon:0

active_file:0 inactive_file:0 isolated_file:0

unevictable:0 dirty:0 writeback:0 unstable:0

free:52843 slab_reclaimable:14200 slab_unreclaimable:25628

mapped:2296 shmem:2312 pagetables:50753 bounce:0

free_cma:0

Mar 13 19:43:03 localhost kernel: Node 0 DMA free:15748kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes

Mar 13 19:43:03 localhost kernel: lowmem_reserve[]: 0 1641 31847 31847

Mar 13 19:43:03 localhost kernel: Node 0 DMA32 free:121684kB min:2304kB low:2880kB high:3456kB active_anon:1150040kB inactive_anon:411036kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1948156kB managed:1681388kB mlocked:0kB dirty:0kB writeback:0kB mapped:44kB shmem:40kB slab_reclaimable:1212kB slab_unreclaimable:3268kB kernel_stack:40kB pagetables:4500kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:11 all_unreclaimable? yes

Mar 13 19:43:03 localhost kernel: lowmem_reser

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值