记录下测试服务器频繁死机问题解决

linux-crash

问题

测试服务器频繁死机,刚开始一周一次,后面应用服务启动就死机。
服务器系统: CentOS 6.5
内核版本:2.6.32-431.el6.x86_64

服务器系统日志分析

查看日志:/var/log/message ,下面是出错比较多的

Dec  4 14:11:46 localhost abrtd: Init complete, entering main loop
Dec  4 14:11:53 localhost modem-manager: (ttyS1) closing serial device...
Dec  4 14:11:53 localhost modem-manager: (ttyS1) opening serial device...
Dec  4 14:11:59 localhost modem-manager: (ttyS1) closing serial device...
Dec  4 14:12:16 localhost kernel: {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
Dec  4 14:12:16 localhost kernel: {1}[Hardware Error]: APEI generic hardware error status
Dec  4 14:12:16 localhost kernel: {1}[Hardware Error]: severity: 2, corrected
Dec  4 14:12:16 localhost kernel: {1}[Hardware Error]: section: 0, severity: 2, corrected
Dec  4 14:12:16 localhost kernel: {1}[Hardware Error]: flags: 0x01
Dec  4 14:12:16 localhost kernel: {1}[Hardware Error]: primary
Dec  4 14:12:16 localhost kernel: {1}[Hardware Error]: fru_text: CorrectedErr
Dec  4 14:12:16 localhost kernel: {1}[Hardware Error]: section_type: memory error
Dec  4 14:12:16 localhost kernel: {1}[Hardware Error]: node: 15424
Dec  4 14:12:16 localhost kernel: {1}[Hardware Error]: device: 12343
Dec  4 14:12:16 localhost kernel: {1}[Hardware Error]: error_type: 2, single-bit ECC
Dec  4 14:12:16 localhost kernel: [Hardware Error]: Machine check events logged 【死机】
Dec  9 04:05:06 localhost kernel: imklog 5.8.10, log source = /proc/kmsg started. 【重启】
Dec  9 04:05:06 localhost rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1601" x-info="http://www.rsyslog.com"] start
Dec  9 04:05:06 localhost kernel: Initializing cgroup subsys cpuset

Dec  9 04:05:11 localhost abrtd: Init complete, entering main loop
Dec  9 04:05:19 localhost modem-manager: (ttyS1) closing serial device...
Dec  9 04:05:19 localhost modem-manager: (ttyS1) opening serial device...
Dec  9 04:05:25 localhost modem-manager: (ttyS1) closing serial device...
Dec  9 04:05:52 localhost kernel: {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
Dec  9 04:05:52 localhost kernel: {1}[Hardware Error]: APEI generic hardware error status
Dec  9 04:05:52 localhost kernel: {1}[Hardware Error]: severity: 2, corrected
Dec  9 04:05:52 localhost kernel: {1}[Hardware Error]: section: 0, severity: 2, corrected
Dec  9 04:05:52 localhost kernel: {1}[Hardware Error]: flags: 0x01
Dec  9 04:05:52 localhost kernel: {1}[Hardware Error]: primary
Dec  9 04:05:52 localhost kernel: {1}[Hardware Error]: fru_text: CorrectedErr
Dec  9 04:05:52 localhost kernel: {1}[Hardware Error]: section_type: memory error
Dec  9 04:05:52 localhost kernel: {1}[Hardware Error]: node: 24208
Dec  9 04:05:52 localhost kernel: {1}[Hardware Error]: device: 12343
Dec  9 04:05:52 localhost kernel: {1}[Hardware Error]: error_type: 2, single-bit ECC
Dec  9 04:05:52 localhost kernel: [Hardware Error]: Machine check events logged 【死机】
Dec 11 10:40:00 localhost kernel: imklog 5.8.10, log source = /proc/kmsg started. 【重启】
Dec 11 10:40:00 localhost rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1603" x-info="http://www.rsyslog.com"] start
Dec 11 10:40:00 localhost kernel: Initializing cgroup subsys cpuset
Dec 11 10:40:00 localhost kernel: Initializing cgroup subsys cpu

当时看到这些错误还是比较懵,Hardware Error硬件错误,以为无法挽救。

解决办法

在bing搜索关键“Hardware error from APEI Generic Hardware Error Source: 1”找到一篇匹配度还算比较高的: APEI Generic Hardware Error 大致是系统与ECC 内存相关的问题导致

后面我进行了2个操作:

  • 1.内存条拔出来清理灰尘换个插槽重新插入【重启后问题没解决】
  • 2.升级内核 (内核从 2.6.32-431.el6.x86_64 升级到 3.17.1

目前服务器已经运行一周多,暂没出现死机现象,/var/log/message 无任何报错出现。

事后思考

服务器出现这个问题,可能与前几次突然停电有关。

资料参考

Linux日志查看
CentOS 内核升级
Linux最新内核列表

转载于:https://my.oschina.net/wenjinglian/blog/1591609

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值