linux 重启节点网络,,Linux由于物理节点故障导致的异常重启-Case1

问题描述:Linux VM异常重启,需要排查问题原因

排查结果:

查询Messages日志获取到的信息

虚拟机内核版本:

Jun 20 03:34:51 test01 kernel: Linux version 2.6.32-642.1.1.el6.x86_64 (mockbuild@worker1.bsys.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Tue May 31 21:57:07 UTC 2016

查看到虚拟机重启时间约为:2019/6/20 03:34 CST

Jun 20 03:31:04 test01 kernel: hv_utils: Shutdown request received -graceful shutdown initiated

Jun20 03:31:04 test01 init: tty (/dev/tty1) main process (2703) killed by TERM signal

Jun20 03:31:04 test01 init: tty (/dev/tty2) main process (2705) killed by TERM signal

Jun20 03:31:04 test01 init: tty (/dev/tty3) main process (2707) killed by TERM signal

Jun20 03:31:04 test01 init: tty (/dev/tty4) main process (2709) killed by TERM signal

Jun20 03:31:04 test01 init: tty (/dev/tty5) main process (2711) killed by TERM signal

Jun20 03:31:04 test01 init: tty (/dev/tty6) main process (2713) killed by TERM signal

Jun20 03:31:04 test01 init: serial (ttyS0) main process (2723) killed by TERM signal

Jun20 03:31:05 test01 abrtd: Got signal 15, exiting

Jun20 03:31:10 test01 dnsmasq[1710]: exiting on receipt of SIGTERM

Jun20 03:31:13test01 acpid: exiting

Jun20 03:31:13 test01 rpcbind: rpcbind terminating on signal. Restart with "rpcbind -w"Jun20 03:31:13test01 init: Disconnected from system bus

Jun20 03:31:14 test01 auditd[1424]: The audit daemon is exiting.

Jun20 03:31:14 test01 kernel: type=1305 audit(1560972674.940:131977): audit_pid=0 old=1424 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1Jun20 03:31:15 test01 kernel: type=1305 audit(1560972675.039:131978): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditctl_t:s0 res=1Jun20 03:31:15test01 kernel: Kernel logging (proc) stopped.

Jun20 03:31:15 test01 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1458" x-info="http://www.rsyslog.com"] exiting on signal 15.

Jun20 03:34:51 test01 kernel: imklog 5.8.10, log source = /proc/kmsg started.

Jun20 03:34:51 test01 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1457" x-info="http://www.rsyslog.com"] start

Jun20 03:34:51test01 kernel: Initializing cgroup subsys cpuset

Jun20 03:34:51test01 kernel: Initializing cgroup subsys cpu

Jun20 03:34:51 test01 kernel: Linux version 2.6.32-642.1.1.el6.x86_64 (mockbuild@worker1.bsys.centos.org) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-17) (GCC) ) #1 SMP Tue May 31 21:57:07 UTC 2016Jun20 03:34:51 test01 kernel: Command line: ro root=UUID=adc76f7c-fef6-4075-941e-e7ce50fb3e50 rd_NO_LUKS KEYBOARDTYPE=pc KEYTABLE=us LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 console=ttyS0,115200n8 earlyprintk=ttyS0,115200 rootdelay=300rd_NO_LVM rd_NO_DM

Jun20 03:34:51test01 kernel: KERNEL supported cpus:

Jun20 03:34:51test01 kernel: Intel GenuineIntel

Jun20 03:34:51test01 kernel: AMD AuthenticAMD

Jun20 03:34:51 test01 kernel: Centaur CentaurHauls

查询到虚拟机重启之前出现了Kernel Panic的错误,提示:INFO: task jbd2/sda1-8:540 blocked for more than 120 seconds. 备注:jdb2进程是一个文件系统的写journal的进程

Jun 19 00:27:01 test01 auditd[1424]: Audit daemon rotating log files

Jun20 03:28:27 test01 kernel: INFO: task jbd2/sda1-8:540 blocked for more than 120seconds.

Jun20 03:28:27 test01 kernel: Not tainted 2.6.32-642.1.1.el6.x86_64 #1Jun20 03:28:27 test01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"disables this message.

Jun20 03:28:27 test01 kernel: jbd2/sda1-8 D 0000000000000000 0 540 2 0x00000000Jun20 03:28:27 test01 kernel: ffff880433257b30 0000000000000046 0000000000000000ffff880433336d80

Jun20 03:28:27test01 kernel: 000000002a5c72b4 ffffffffa307745d ffff880433257ae0 ffff880433257ad0

Jun20 03:28:27 test01 kernel: ffffffffa0045958 0000000000000000ffff8804316505f8 ffff880433257fd8

Jun20 03:28:27test01 kernel: Call Trace:

Jun20 03:28:27 test01 kernel: [] ? read_hv_clock_tsc+0x38/0x80[hv_vmbus]

Jun20 03:28:27 test01 kernel: [] ? sync_page+0x0/0x50Jun20 03:28:27 test01 kernel: [] io_schedule+0x73/0xc0Jun20 03:28:27 test01 kernel: [] sync_page+0x3d/0x50Jun20 03:28:27 test01 kernel: [] __wait_on_bit+0x5f/0x90Jun20 03:28:27 test01 kernel: [] wait_on_page_bit+0x73/0x80Jun20 03:28:27 test01 kernel: [] ? wake_bit_function+0x0/0x50Jun20 03:28:27 test01 kernel: [] ? pagevec_lookup_tag+0x25/0x40Jun20 03:28:27 test01 kernel: [] wait_on_page_writeback_range+0xfb/0x190Jun20 03:28:27 test01 kernel: [] filemap_fdatawait+0x2f/0x40Jun20 03:28:27 test01 kernel: [] jbd2_journal_commit_transaction+0x7e9/0x14f0[jbd2]

Jun20 03:28:27 test01 kernel: [] ? try_to_del_timer_sync+0x7b/0xe0Jun20 03:28:27 test01 kernel: [] kjournald2+0xb8/0x220[jbd2]

Jun20 03:28:27 test01 kernel: [] ? autoremove_wake_function+0x0/0x40Jun20 03:28:27 test01 kernel: [] ? kjournald2+0x0/0x220[jbd2]

Jun20 03:28:27 test01 kernel: [] kthread+0x9e/0xc0Jun20 03:28:27 test01 kernel: [] child_rip+0xa/0x20Jun20 03:28:27 test01 kernel: [] ? kthread+0x0/0xc0Jun20 03:28:27 test01 kernel: [] ? child_rip+0x0/0x20

查询了secure日志,仅仅发现有关机信号的记录,没有更加详细的信息了  备注:SIGTERM 15 A 终止信号

Jun 20 03:31:05 test01 sshd[1811]: Received signal 15; terminating.

Jun20 03:34:59 test01 sshd[1808]: Server listening on 0.0.0.0 port 22.

Jun20 03:34:59 test01 sshd[1808]: Server listening on :: port 22.

查询了Linux问题虚拟机所在的物理节点,发现早在6/20 03:10 CST左右,物理节点已经出现了系统故障,虚拟机被自动迁移至其他可用节点,此过程造成了虚拟机的自动重启

总结:在某些情况下,服务器系统日志出现了call trace的报错,也不一定是操作系统层面引起的问题,还可能是物理节点(针对虚拟机)或硬件故障。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值