LInux 从系统日志中找出系统关闭的原因

ArchitecTang

已于 2022-09-13 17:18:03 修改

阅读量2.1w

点赞数 6

分类专栏： Linux 文章标签： linux 服务器运维

于 2022-09-13 15:07:11 首次发布

本文链接：https://blog.csdn.net/qq_28345657/article/details/126833131

版权

Linux 专栏收录该内容

4 篇文章

订阅专栏

当服务器无预警地无法连接或执行操作时，可能是由于系统崩溃或硬件故障。通过检查系统日志，如使用`last reboot`、`last-x`等命令，可以找出关机或重启的原因，例如电源按钮被按下、过热、电池问题或文件系统损坏等。日志中特定的条目，如`systemd-journald`关于文件系统的错误信息，可以帮助定位问题。此外，检查UPS日志也是诊断过程的一部分。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

有时候服务器运行的好好的，突然就无法连接无法执行任何操作了，强制重启后需要通过系统日志排查系统故障的原因

只有root特权程序才能正常关闭系统。因此，当系统以正常方式关闭时，它要么是具有root特权的用户，要么是acpi脚本。在这两种情况下，我们都可以通过检查日志来查找。按下电源按钮，过热或电池电量不足（笔记本电脑）可能会导致acpi关闭。
1、首先尝试以下命令

last reboot |less  #显示最近的重启条目列表
last -x |less #显示最近关闭的条目列表
或
last -x | grep shutdown|less

2、检查last -x 命令的输出

last -x |head |tac

正常关机实例
正常关闭和家电如下所示

runlevel (to lvl 3)   3.10.0-1160.el7. Mon Aug 29 17:00 - 17:00  (00:00)    
shutdown system down  3.10.0-1160.el7. Mon Aug 29 17:00 - 17:01  (00:00)    
reboot   system boot  3.10.0-1160.el7. Mon Aug 29 17:01 - 14:42 (14+21:41)  
runlevel (to lvl 3)   3.10.0-1160.el7. Mon Aug 29 17:01 - 14:42 (14+21:41)

3、意外关机示例

reboot   system boot  3.10.0-1127.19.1 Sun Sep 11 01:31 - 14:40 (2+13:09)   
runlevel (to lvl 3)   3.10.0-1127.19.1 Sat Sep 10 17:31 - 14:40 (2+21:09)

检查/vat/log/中的日志
一个bash命令来过滤日志

grep -iv ': starting\|kernel: .*: Power Button\|watching system buttons\|Stopped Cleaning Up\|Started Crash recovery kernel' \
/root/messages /var/log/syslog /var/log/apcupsd* \
| grep -iw 'recover[a-z]*\|power[a-z]*\|shut[a-z ]*down\|rsyslogd\|ups'

当意外关闭电源或发生硬件故障时，文件系统将无法正确卸载，因此在下次启动时，可能会输出如下日志

[    3.238424] IPVS: [rr] scheduler registered.
[    3.475768] systemd-journald[479]: Received request to flush runtime journal from PID 1
[    3.483416] systemd-journald[479]: File /var/log/journal/20200914151306980406746494236010/system.journal corrupted or uncleanly shut down, renaming and re
[    3.483812] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0x700, revision 0

当按下电源按钮而关闭系统电源时，将输出以下日志

systemd-logind: Power key pressed.
systemd-logind: Powering Off...
systemd-logind: System is powering down.

当服务器正常关闭时，将会输出以下日志

rsyslogd: ... exiting on signal 15

当系统因为温度过高导致关闭时，将会输出以下日志

critical temperature reached...,shutting down

如果您有UPS并运行守护程序来监视电源和关闭电源，则显然应该检查其日志（NUT日志位于/ var / log / messages，但apcupsd日志位于/ var / log / apcupsd *）

4、last 手册中的描述

last [...] prints information about connect times of users. 
Records are printed from most recent to least recent.  
[...]
The special users reboot and shutdown log in when the system reboots
or (surprise) shuts down.

我们head用来保留最近的10个事件，并tac用来颠倒顺序，这样我们就不会为最近事件到最近事件的最后打印感到困惑。

5、一些可能的日志文件可供探索

/var/log/debug
/var/log/syslog (will be pretty full and may be harder to browse)
/var/log/user.log
/var/log/kern.log
/var/log/boot

6、我这里发现的原因是：

systemd-journald[479]: File /var/log/journal/....../system.journal corrupted or uncleanly shut down, renaming and replacing.

是由于底层宿主机软硬件故障导致的系统崩溃