前言
随着操作系统的使用普及,遇到了很多系统问题在应用层面无法排查,本文整理了一份基于浪潮云启操作系统(InLinux)使用crash工具和kdump生成的vmcore文件排查应用问题的方法。
操作系统版本
[root@localhost ~]# cat /etc/os-release
NAME="InLinux"
VERSION="23.12 (LTS-SP1)"
ID="InLinux"
VERSION_ID="23.12"
PRETTY_NAME="InLinux 23.12 (LTS-SP1)"
ANSI_COLOR="0;31"
BUILD_TIME="2024-04-23_15:40:11"
组件安装
安装crash
yum install -y crash-debuginfo crash
安装kernel-debuginfo和kernel-debugsource
安装此组件是为了获取vmlinux。安装后vmlinux的路径为:
/usr/lib/debug/lib/modules/$(uname -r)/vmlinux
yum install -y kernel-debugsource kernel-debuginfo
安装kexec-tools
yum install kernel-debuginfo-$(uname -r) kexec-tools
crash使用
说明
crash分析问题,需要使用vmcore文件和vmlinux文件。其中注意点如下:
- vmcore文件时通过kdump生成,一般是在路径‘/var/crash/’目录下,如果有多个,根据自己的需要来选在。
- vmlinux文件路径:/usr/lib/debug/lib/modules/$(uname -r)/vmlinux
- 确保kernel、kernel-debuginfo的版本完全相同
生成测试用vmcore
vmcore是系统宕机或者panic 时 kdump生成的系统运行在某个时间点的内存状态的快照,我们可以通过模拟的方式生成vmcore文件。
手动触发crash,等待几分钟,虚机自动重启,测试启动后kdump转存vmcore日志,触发命令如下:
# echo 1 > /proc/sys/kernel/sysrq
# echo c > /proc/sysrq-trigger
生成的vmcore文件位于/var/crash/目录下。
命令运行方式
crash命令执行方式如下:
crash {vmcore文件} {调试内核vmlinux}
- 第一个参数为kdump生成的vmcore文件,可以模拟生成。
- 第二个参数为vmlinux,安装kernel-debuginfo时安装的程序
命令示例如下:
crash /var/crash/127.0.0.1-2024-08-05-08\:47\:09/vmcore /usr/lib/debug/lib/modules/$(uname -r)/vmlinux
crash工具开始调试
[root@localhost ~]# crash /var/crash/127.0.0.1-2024-08-05-08\:47\:09/vmcore /usr/lib/debug/lib/modules/$(uname -r)/vmlinux
crash 8.0.2-1.ile2312sp1
Copyright (C) 2002-2022 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011, 2020-2022 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
Copyright (C) 2015, 2021 VMware, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
WARNING: kernel version inconsistency between vmlinux and dumpfile
KERNEL: /usr/lib/debug/lib/modules/5.10.0-197.0.0.110.ile2312sp1.x86_64/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2024-08-05-08:47:09/vmcore [PARTIAL DUMP]
CPUS: 16
DATE: Mon Aug 5 08:47:04 CST 2024
UPTIME: 2 days, 15:28:13
LOAD AVERAGE: 0.00, 0.05, 0.09
TASKS: 227
NODENAME: localhost.localdomain
RELEASE: 5.10.0-197.0.0.110.ile2312sp1.x86_64
VERSION: #1 SMP Tue Apr 30 10:18:42 UTC 2024
MACHINE: x86_64 (2194 Mhz)
MEMORY: 16 GB
PANIC: "Kernel panic - not syncing: sysrq triggered crash"
PID: 7961
COMMAND: "bash"
TASK: ffff9c36410bb400 [THREAD_INFO: ffff9c36410bb400]
CPU: 1
STATE: TASK_RUNNING (PANIC)
crash>
查看日志log/dmesg命令
通过查看系统日志,可以排查大部分应用程序的问题。
log命令
crash> log
[ 0.000000] Linux version 5.10.0-197.0.0.110.ile2312sp1.x86_64 (abuild@obsworker208) (gcc_old (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Apr 30 10:18:42 UTC 2024
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.10.0-197.0.0.110.ile2312sp1.x86_64 root=UUID=17bb1f2f-3fb1-49de-b9b2-747a32892161 ro cgroup_disable=files apparmor=0 crashkernel=512M
[ 0.000000] signal: max sigframe size: 944
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffd8fff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000bffd9000-0x00000000bfffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000043fffffff] usable
...
[ 503.309075] capability: warning: `yum' uses 32-bit capabilities (legacy support in use)
[228492.577924] sysrq: Trigger a crash
[228492.578847] Kernel panic - not syncing: sysrq triggered crash
[228492.579793] CPU: 1 PID: 7961 Comm: bash Kdump: loaded Not tainted 5.10.0-197.0.0.110.ile2312sp1.x86_64 #1
[228492.581162] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[228492.582513] Call Trace:
[228492.583225] dump_stack+0x57/0x6e
[228492.583939] panic+0x10e/0x2ef
[228492.584615] ? printk+0x58/0x73
[228492.585332] sysrq_handle_crash+0x16/0x20
[228492.586137] __handle_sysrq.cold+0x43/0x11a
[228492.586561] write_sysrq_trigger+0x34/0x60
[228492.586983] proc_reg_write+0x40/0x90
[228492.587384] vfs_write+0xde/0x250
[228492.587764] ksys_write+0x5f/0xe0
[228492.588151] do_syscall_64+0x40/0x80
[228492.588554] entry_SYSCALL_64_after_hwframe+0x62/0xc7
[228492.589013] RIP: 0033:0x7f72f9878c67
[228492.589401] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[228492.590702] RSP: 002b:00007fffc84d3cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[228492.591383] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f72f9878c67
[228492.592025] RDX: 0000000000000002 RSI: 0000559b08fc7870 RDI: 0000000000000001
[228492.592673] RBP: 0000559b08fc7870 R08: 00007f72f992c380 R09: 00007f72f992c400
[228492.593313] R10: 00007f72f992c300 R11: 0000000000000246 R12: 0000000000000002
[228492.593967] R13: 00007f72f996e5a0 R14: 0000000000000002 R15: 00007f72f996e7a0
[228492.597097] kexec: Bye!
crash>
dmesg命令
crash> dmesg
[ 0.000000] Linux version 5.10.0-197.0.0.110.ile2312sp1.x86_64 (abuild@obsworker208) (gcc_old (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Apr 30 10:18:42 UTC 2024
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.10.0-197.0.0.110.ile2312sp1.x86_64 root=UUID=17bb1f2f-3fb1-49de-b9b2-747a32892161 ro cgroup_disable=files apparmor=0 crashkernel=512M
[ 0.000000] signal: max sigframe size: 944
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffd8fff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000bffd9000-0x00000000bfffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000043fffffff] usable
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] SMBIOS 2.8 present.
[ 0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 0.000000] Hypervisor detected: KVM
[ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[ 0.000000] kvm-clock: cpu 0, msr 409401001, primary cpu clock
[ 0.000000] kvm-clock: using sched offset of 475504025149 cycles
[ 0.000008] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[ 0.000016] tsc: Detected 2194.908 MHz processor
[ 0.001274] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
[ 0.001278] e820: remove [mem 0x000a0000-0x000fffff] usable
[ 0.001283] last_pfn = 0x440000 max_arch_pfn = 0x400000000
[ 0.001322] MTRR default type: write-back
[ 0.001324] MTRR fixed ranges enabled:
[ 0.001325] 00000-9FFFF write-back
[ 0.001326] A0000-BFFFF uncachable
[ 0.001327] C0000-FFFFF write-protect
[ 0.001328] MTRR variable ranges enabled:
...
[ 503.309075] capability: warning: `yum' uses 32-bit capabilities (legacy support in use)
[228492.577924] sysrq: Trigger a crash
[228492.578847] Kernel panic - not syncing: sysrq triggered crash
[228492.579793] CPU: 1 PID: 7961 Comm: bash Kdump: loaded Not tainted 5.10.0-197.0.0.110.ile2312sp1.x86_64 #1
[228492.581162] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[228492.582513] Call Trace:
[228492.583225] dump_stack+0x57/0x6e
[228492.583939] panic+0x10e/0x2ef
[228492.584615] ? printk+0x58/0x73
[228492.585332] sysrq_handle_crash+0x16/0x20
[228492.586137] __handle_sysrq.cold+0x43/0x11a
[228492.586561] write_sysrq_trigger+0x34/0x60
[228492.586983] proc_reg_write+0x40/0x90
[228492.587384] vfs_write+0xde/0x250
[228492.587764] ksys_write+0x5f/0xe0
[228492.588151] do_syscall_64+0x40/0x80
[228492.588554] entry_SYSCALL_64_after_hwframe+0x62/0xc7
[228492.589013] RIP: 0033:0x7f72f9878c67
[228492.589401] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[228492.590702] RSP: 002b:00007fffc84d3cb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[228492.591383] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f72f9878c67
[228492.592025] RDX: 0000000000000002 RSI: 0000559b08fc7870 RDI: 0000000000000001
[228492.592673] RBP: 0000559b08fc7870 R08: 00007f72f992c380 R09: 00007f72f992c400
[228492.593313] R10: 00007f72f992c300 R11: 0000000000000246 R12: 0000000000000002
[228492.593967] R13: 00007f72f996e5a0 R14: 0000000000000002 R15: 00007f72f996e7a0
[228492.597097] kexec: Bye!
crash>
bt命令
bt查看堆栈: 展示调用堆栈信息,如果不加参数那么就可以利用SP和FP进行栈回溯打印。
当日志不能判断应用的问题时,可以通过使用bt命令查看系统的堆栈调用信息,对问题进行深入排查。
crash> bt
PID: 7961 TASK: ffff9c36410bb400 CPU: 1 COMMAND: "bash"
#0 [ffffbfbac6b3bde0] panic at ffffffff9089b0f2
#1 [ffffbfbac6b3be60] sysrq_handle_crash at ffffffff904fad86
#2 [ffffbfbac6b3be68] __handle_sysrq.cold at ffffffff908c13b8
#3 [ffffbfbac6b3be98] write_sysrq_trigger at ffffffff904fb6a4
#4 [ffffbfbac6b3beb0] proc_reg_write at ffffffff9022c900
#5 [ffffbfbac6b3bec8] vfs_write at ffffffff9019946e
#6 [ffffbfbac6b3bf00] ksys_write at ffffffff901998cf
#7 [ffffbfbac6b3bf38] do_syscall_64 at ffffffff908e0750
#8 [ffffbfbac6b3bf50] entry_SYSCALL_64_after_hwframe at ffffffff90a000da
RIP: 00007f72f9878c67 RSP: 00007fffc84d3cb8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f72f9878c67
RDX: 0000000000000002 RSI: 0000559b08fc7870 RDI: 0000000000000001
RBP: 0000559b08fc7870 R8: 00007f72f992c380 R9: 00007f72f992c400
R10: 00007f72f992c300 R11: 0000000000000246 R12: 0000000000000002
R13: 00007f72f996e5a0 R14: 0000000000000002 R15: 00007f72f996e7a0
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash>
bt –T : -T显示一个进程从thread_info以上一直到堆栈底部的所有symbol信息,一般比不加参数打印出的信息更多
crash> bt -T
PID: 7961 TASK: ffff9c36410bb400 CPU: 1 COMMAND: "bash"
[ffffbfbac6b3b480] __update_blocked_fair at ffffffff8ff2977d
[ffffbfbac6b3b4f0] raw_spin_rq_unlock at ffffffff8ff1e7ea
[ffffbfbac6b3b530] update_nohz_stats at ffffffff8ff2cba0
[ffffbfbac6b3b538] cpumask_next_and at ffffffff903c773a
[ffffbfbac6b3b540] find_busiest_group at ffffffff8ff3d1c2
[ffffbfbac6b3b670] can_migrate_task at ffffffff8ff39855
[ffffbfbac6b3b690] detach_tasks at ffffffff8ff39e4f
[ffffbfbac6b3b6e0] load_balance at ffffffff8ff3dc66
[ffffbfbac6b3b788] __update_load_avg_cfs_rq at ffffffff8ff4f19c
[ffffbfbac6b3b798] __update_load_avg_se at ffffffff8ff4ee88
[ffffbfbac6b3b7a0] update_curr at ffffffff8ff30c0e
[ffffbfbac6b3b7e8] set_next_entity at ffffffff8ff2e453
[ffffbfbac6b3b818] pick_next_task_fair at ffffffff8ff3eaf9
[ffffbfbac6b3b890] vsnprintf at ffffffff903d68cc
[ffffbfbac6b3b8a8] number at ffffffff903d1c4f
[ffffbfbac6b3b8f0] widen_string at ffffffff903d24fb
[ffffbfbac6b3b910] vsnprintf at ffffffff903d690e
[ffffbfbac6b3b928] number at ffffffff903d1c4f
[ffffbfbac6b3b970] widen_string at ffffffff903d24fb
[ffffbfbac6b3b980] number at ffffffff903d1c4f
[ffffbfbac6b3b9c8] widen_string at ffffffff903d24fb
[ffffbfbac6b3b9e8] vsnprintf at ffffffff903d690e
[ffffbfbac6b3ba40] vgacon_scroll at ffffffff9042d9cf
[ffffbfbac6b3ba68] desc_read_finalized_seq at ffffffff8ff6971f
[ffffbfbac6b3ba70] con_scroll at ffffffff9050595a
[ffffbfbac6b3ba90] prb_read at ffffffff8ff697f0
[ffffbfbac6b3baa8] kvm_io_delay at ffffffff8fe751a0
[ffffbfbac6b3bab0] atomic_notifier_call_chain at ffffffff8ff15257
[ffffbfbac6b3bb08] _prb_read_valid at ffffffff8ff699dd
[ffffbfbac6b3bb60] prb_read_valid at ffffffff8ff6a6d7
[ffffbfbac6b3bc28] vprintk_emit at ffffffff8ff689a8
[ffffbfbac6b3bc70] printk at ffffffff908a0b78
[ffffbfbac6b3bcd0] machine_kexec.cold at ffffffff90897c76
[ffffbfbac6b3bd20] __crash_kexec at ffffffff8ffb409a
[ffffbfbac6b3bda8] __crash_kexec at ffffffff8ffb40c8
[ffffbfbac6b3bde0] panic at ffffffff9089b0f2
[ffffbfbac6b3be08] printk at ffffffff908a0b78
[ffffbfbac6b3be60] sysrq_handle_crash at ffffffff904fad86
[ffffbfbac6b3be68] __handle_sysrq.cold at ffffffff908c13b8
[ffffbfbac6b3be98] write_sysrq_trigger at ffffffff904fb6a4
[ffffbfbac6b3beb0] proc_reg_write at ffffffff9022c900
[ffffbfbac6b3bec8] vfs_write at ffffffff9019946e
[ffffbfbac6b3bf00] ksys_write at ffffffff901998cf
[ffffbfbac6b3bf38] do_syscall_64 at ffffffff908e0750
[ffffbfbac6b3bf50] entry_SYSCALL_64_after_hwframe at ffffffff90a000da
RIP: 00007f72f9878c67 RSP: 00007fffc84d3cb8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f72f9878c67
RDX: 0000000000000002 RSI: 0000559b08fc7870 RDI: 0000000000000001
RBP: 0000559b08fc7870 R8: 00007f72f992c380 R9: 00007f72f992c400
R10: 00007f72f992c300 R11: 0000000000000246 R12: 0000000000000002
R13: 00007f72f996e5a0 R14: 0000000000000002 R15: 00007f72f996e7a0
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash>
bt –a: 显示所有active task的堆栈信息。
crash> bt -a
PID: 0 TASK: ffffffff91812940 CPU: 0 COMMAND: "swapper/0"
#0 [fffffe1743122e50] crash_nmi_callback at ffffffff8fe6011b
#1 [fffffe1743122e58] nmi_handle at ffffffff8fe2b408
#2 [fffffe1743122ea0] default_do_nmi at ffffffff908e1e22
#3 [fffffe1743122ec8] exc_nmi at ffffffff908e2042
#4 [fffffe1743122ef0] end_repeat_nmi at ffffffff90a01549
[exception RIP: default_idle+19]
RIP: ffffffff908f12f3 RSP: ffffffff91803ec0 RFLAGS: 00000246
RAX: ffffffff908f12e0 RBX: ffffffff91812940 RCX: ffff9c396f636c80
RDX: 00000000003f3e9a RSI: 0000000000000000 RDI: ffff9c396f627820
RBP: 0000000000000000 R8: 000000cd42e4dffb R9: 0000000000000001
R10: 0000000000000001 R11: 00000000000123ed R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ffffffff91803ec0] default_idle at ffffffff908f12f3
#6 [ffffffff91803ec0] default_idle_call at ffffffff908f1544
#7 [ffffffff91803ec8] cpuidle_idle_call at ffffffff8ff28345
#8 [ffffffff91803f00] do_idle at ffffffff8ff283f2
#9 [ffffffff91803f18] cpu_startup_entry at ffffffff8ff285c9
#10 [ffffffff91803f28] start_kernel at ffffffff9226c856
#11 [ffffffff91803f50] secondary_startup_64_no_verify at ffffffff8fe00107
PID: 7961 TASK: ffff9c36410bb400 CPU: 1 COMMAND: "bash"
#0 [ffffbfbac6b3bde0] panic at ffffffff9089b0f2
#1 [ffffbfbac6b3be60] sysrq_handle_crash at ffffffff904fad86
#2 [ffffbfbac6b3be68] __handle_sysrq.cold at ffffffff908c13b8
#3 [ffffbfbac6b3be98] write_sysrq_trigger at ffffffff904fb6a4
#4 [ffffbfbac6b3beb0] proc_reg_write at ffffffff9022c900
#5 [ffffbfbac6b3bec8] vfs_write at ffffffff9019946e
#6 [ffffbfbac6b3bf00] ksys_write at ffffffff901998cf
#7 [ffffbfbac6b3bf38] do_syscall_64 at ffffffff908e0750
#8 [ffffbfbac6b3bf50] entry_SYSCALL_64_after_hwframe at ffffffff90a000da
RIP: 00007f72f9878c67 RSP: 00007fffc84d3cb8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f72f9878c67
RDX: 0000000000000002 RSI: 0000559b08fc7870 RDI: 0000000000000001
RBP: 0000559b08fc7870 R8: 00007f72f992c380 R9: 00007f72f992c400
R10: 00007f72f992c300 R11: 0000000000000246 R12: 0000000000000002
R13: 00007f72f996e5a0 R14: 0000000000000002 R15: 00007f72f996e7a0
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
...
PID: 0 TASK: ffff9c3640340000 CPU: 15 COMMAND: "swapper/15"
#0 [fffffe3f47bcfe50] crash_nmi_callback at ffffffff8fe6011b
#1 [fffffe3f47bcfe58] nmi_handle at ffffffff8fe2b408
#2 [fffffe3f47bcfea0] default_do_nmi at ffffffff908e1e22
#3 [fffffe3f47bcfec8] exc_nmi at ffffffff908e2042
#4 [fffffe3f47bcfef0] end_repeat_nmi at ffffffff90a01549
[exception RIP: default_idle+19]
RIP: ffffffff908f12f3 RSP: ffffbfbac00ebee8 RFLAGS: 00000242
RAX: ffffffff908f12e0 RBX: ffff9c3640340000 RCX: ffff9c396fdb6c80
RDX: 000000000037b5da RSI: 0000000000000083 RDI: 000000000000000f
RBP: 0000000000000000 R8: 0000d03ebd5eae08 R9: 0000000000000001
R10: 0000000000000001 R11: 0000000000002800 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ffffbfbac00ebee8] default_idle at ffffffff908f12f3
#6 [ffffbfbac00ebee8] default_idle_call at ffffffff908f1544
#7 [ffffbfbac00ebef0] cpuidle_idle_call at ffffffff8ff28345
#8 [ffffbfbac00ebf28] do_idle at ffffffff8ff283f2
#9 [ffffbfbac00ebf40] cpu_startup_entry at ffffffff8ff285c9
#10 [ffffbfbac00ebf50] secondary_startup_64_no_verify at ffffffff8fe00107
crash>
ps命令
ps:展示系统中的进程状态,和正常系统运行时的ps命令类似流程
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 0 0 0 ffffffff91812940 RU 0.0 0 0 [swapper/0]
0 0 1 ffff9c36402e0000 RU 0.0 0 0 [swapper/1]
> 0 0 2 ffff9c36402e4e00 RU 0.0 0 0 [swapper/2]
> 0 0 3 ffff9c36402e3400 RU 0.0 0 0 [swapper/3]
> 0 0 4 ffff9c3640311a00 RU 0.0 0 0 [swapper/4]
> 0 0 5 ffff9c3640310000 RU 0.0 0 0 [swapper/5]
> 0 0 6 ffff9c3640314e00 RU 0.0 0 0 [swapper/6]
> 0 0 7 ffff9c3640313400 RU 0.0 0 0 [swapper/7]
> 0 0 8 ffff9c3640320000 RU 0.0 0 0 [swapper/8]
> 0 0 9 ffff9c3640324e00 RU 0.0 0 0 [swapper/9]
> 0 0 10 ffff9c3640323400 RU 0.0 0 0 [swapper/10]
> 0 0 11 ffff9c3640321a00 RU 0.0 0 0 [swapper/11]
> 0 0 12 ffff9c3640344e00 RU 0.0 0 0 [swapper/12]
> 0 0 13 ffff9c3640343400 RU 0.0 0 0 [swapper/13]
> 0 0 14 ffff9c3640341a00 RU 0.0 0 0 [swapper/14]
> 0 0 15 ffff9c3640340000 RU 0.0 0 0 [swapper/15]
1 0 4 ffff9c364028ce00 IN 0.1 170388 19044 systemd
2 0 12 ffff9c364028b400 IN 0.0 0 0 [kthreadd]
3 2 0 ffff9c3640289a00 ID 0.0 0 0 [rcu_gp]
4 2 0 ffff9c3640288000 ID 0.0 0 0 [rcu_par_gp]
6 2 0 ffff9c36402bb400 ID 0.0 0 0 [kworker/0:0H]
8 2 0 ffff9c36402b8000 ID 0.0 0 0 [mm_percpu_wq]
9 2 0 ffff9c36402cb400 IN 0.0 0 0 [rcu_tasks_rude_]
10 2 0 ffff9c36402c9a00 IN 0.0 0 0 [rcu_tasks_trace]
11 2 0 ffff9c36402c8000 IN 0.0 0 0 [ksoftirqd/0]
12 2 5 ffff9c36402cce00 ID 0.0 0 0 [rcu_sched]
13 2 0 ffff9c36402e1a00 IN 0.0 0 0 [migration/0]
14 2 0 ffff9c3640369a00 IN 0.0 0 0 [cpuhp/0]
15 2 1 ffff9c3640368000 IN 0.0 0 0 [cpuhp/1]
16 2 1 ffff9c364036ce00 IN 0.0 0 0 [migration/1]
17 2 1 ffff9c364036b400 IN 0.0 0 0 [ksoftirqd/1]
19 2 1 ffff9c3640383400 ID 0.0 0 0 [kworker/1:0H]
20 2 2 ffff9c3640381a00 IN 0.0 0 0 [cpuhp/2]
21 2 2 ffff9c3640380000 IN 0.0 0 0 [migration/2]
22 2 2 ffff9c36403ab400 IN 0.0 0 0 [ksoftirqd/2]
24 2 2 ffff9c36403a8000 ID 0.0 0 0 [kworker/2:0H]
25 2 3 ffff9c36403ace00 IN 0.0 0 0 [cpuhp/3]
26 2 3 ffff9c36403d0000 IN 0.0 0 0 [migration/3]
27 2 3 ffff9c36403d4e00 IN 0.0 0 0 [ksoftirqd/3]
29 2 3 ffff9c36403d1a00 ID 0.0 0 0 [kworker/3:0H]
30 2 4 ffff9c36403f3400 IN 0.0 0 0 [cpuhp/4]
dis命令
dis反汇编命令
dis <address>:反汇编命令,-l可以展示源代码行。
先使用bt查看调用信息,再使用dis对查到的地址 执行反汇编。
crash> bt
PID: 7961 TASK: ffff9c36410bb400 CPU: 1 COMMAND: "bash"
#0 [ffffbfbac6b3bde0] panic at ffffffff9089b0f2
#1 [ffffbfbac6b3be60] sysrq_handle_crash at ffffffff904fad86
#2 [ffffbfbac6b3be68] __handle_sysrq.cold at ffffffff908c13b8
#3 [ffffbfbac6b3be98] write_sysrq_trigger at ffffffff904fb6a4
#4 [ffffbfbac6b3beb0] proc_reg_write at ffffffff9022c900
#5 [ffffbfbac6b3bec8] vfs_write at ffffffff9019946e
#6 [ffffbfbac6b3bf00] ksys_write at ffffffff901998cf
#7 [ffffbfbac6b3bf38] do_syscall_64 at ffffffff908e0750
#8 [ffffbfbac6b3bf50] entry_SYSCALL_64_after_hwframe at ffffffff90a000da
RIP: 00007f72f9878c67 RSP: 00007fffc84d3cb8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f72f9878c67
RDX: 0000000000000002 RSI: 0000559b08fc7870 RDI: 0000000000000001
RBP: 0000559b08fc7870 R8: 00007f72f992c380 R9: 00007f72f992c400
R10: 00007f72f992c300 R11: 0000000000000246 R12: 0000000000000002
R13: 00007f72f996e5a0 R14: 0000000000000002 R15: 00007f72f996e7a0
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash> dis ffffffff908e0750
0xffffffff908e0750 <do_syscall_64+64>: mov %rax,0x50(%r12)
crash>
mount命令
mount:展示当前挂载的文件系统的命令
crash> mount
MOUNT SUPERBLK TYPE DEVNAME DIRNAME
ffff9c36401f1180 ffff9c364004a800 rootfs rootfs /
ffff9c3948ee4140 ffff9c3645c9b800 proc proc /proc
ffff9c3948ee43c0 ffff9c3645c99000 sysfs sysfs /sys
ffff9c3948ee1900 ffff9c396f64e000 devtmpfs devtmpfs /dev
ffff9c3948ee0280 ffff9c396f64e800 securityfs securityfs /sys/kernel/security
ffff9c3949405400 ffff9c3645c9f800 tmpfs tmpfs /dev/shm
ffff9c3949404c80 ffff9c3645c98000 devpts devpts /dev/pts
ffff9c3949404b40 ffff9c3645c99800 tmpfs tmpfs /run
ffff9c3949406b40 ffff9c3645c98800 tmpfs tmpfs /sys/fs/cgroup
ffff9c3949406280 ffff9c3645c9d000 cgroup cgroup /sys/fs/cgroup/systemd
ffff9c3949407e00 ffff9c3645c9f000 bpf none /sys/fs/bpf
ffff9c3949412280 ffff9c3641002800 cgroup cgroup /sys/fs/cgroup/pids
ffff9c39494143c0 ffff9c3641006800 cgroup cgroup /sys/fs/cgroup/net_cls,net_prio
ffff9c3949414b40 ffff9c3641006000 cgroup cgroup /sys/fs/cgroup/perf_event
ffff9c3949415180 ffff9c3641004000 cgroup cgroup /sys/fs/cgroup/cpuset
ffff9c3949414500 ffff9c3641005800 cgroup cgroup /sys/fs/cgroup/hugetlb
ffff9c3949416c80 ffff9c3641002000 cgroup cgroup /sys/fs/cgroup/freezer
ffff9c3949417900 ffff9c3641007000 cgroup cgroup /sys/fs/cgroup/cpu,cpuacct
ffff9c3949416280 ffff9c3641005000 cgroup cgroup /sys/fs/cgroup/rdma
ffff9c39497e1900 ffff9c3641000800 cgroup cgroup /sys/fs/cgroup/devices
ffff9c39497e1cc0 ffff9c3641001800 cgroup cgroup /sys/fs/cgroup/blkio
ffff9c39497e1a40 ffff9c3641003000 cgroup cgroup /sys/fs/cgroup/memory
ffff9c3645c42780 ffff9c3645cf7000 xfs /dev/vda1 /
ffff9c3948e2ef00 ffff9c364226c000 selinuxfs selinuxfs /sys/fs/selinux
ffff9c3641092140 ffff9c3645ef6000 autofs systemd-1 /proc/sys/fs/binfmt_misc
ffff9c3949469900 ffff9c394778b800 hugetlbfs hugetlbfs /dev/hugepages
ffff9c3641321400 ffff9c364226d800 mqueue mqueue /dev/mqueue
ffff9c3948c80000 ffff9c396f64a800 debugfs debugfs /sys/kernel/debug
ffff9c3648648a00 ffff9c3640f3d000 tracefs tracefs /sys/kernel/tracing
ffff9c3948d4bcc0 ffff9c3948cc0800 tmpfs tmpfs /tmp
ffff9c3948e197c0 ffff9c3648430000 configfs configfs /sys/kernel/config
ffff9c396f66ba40 ffff9c3645cf5000 fusectl fusectl /sys/fs/fuse/connections
ffff9c3949413a40 ffff9c36438bd800 xfs /dev/vda2 /boot
crash>
net命令
net:展示网络相关的信息
crash> net
NET_DEVICE NAME IP ADDRESS(ES)
ffff9c3641554000 lo 127.0.0.1
ffff9c3645863000 ens3 192.168.xxx.xxx
crash>
退出 crash 工具
exit命令
crash> exit
help 帮助
以上是常用的命令,如果想进一步学习,可以在crash中执行help命令获取帮助
crash> help
* files mod sbitmapq union
alias foreach mount search vm
ascii fuser net set vtop
bpf gdb p sig waitq
bt help ps struct whatis
btop ipcs pte swap wr
dev irq ptob sym q
dis kmem ptov sys
eval list rd task
exit log repeat timer
extend mach runq tree
crash version: 8.0.2-1.ile2312sp1 gdb version: 10.2
For help on any command above, enter "help <command>".
For help on input options, enter "help input".
For help on output options, enter "help output".
crash>
help获取特定命令的帮助
crash> help ps
NAME
ps - display process status information
SYNOPSIS
ps [-k|-u|-G|-y policy] [-s] [-p|-c|-t|-[l|m][-C cpu]|-a|-g|-r|-S|-A]
[pid | task | command] ...
DESCRIPTION
This command displays process status for selected, or all, processes
in the system. If no arguments are entered, the process data is
is displayed for all processes. Specific processes may be selected
by using the following identifier formats:
pid a process PID.
task a hexadecimal task_struct pointer.
command a command name. If a command name is made up of letters that
are all numerical values, precede the name string with a "\".
If the command string is enclosed within "'" characters, then
the encompassed string must be a POSIX extended regular expression
that will be used to match task names.
The process list may be further restricted by the following options:
-k restrict the output to only kernel threads.
-u restrict the output to only user tasks.
-G display only the thread group leader in a thread group.
-y policy restrict the output to tasks having a specified scheduling policy
expressed by its integer value or by its (case-insensitive) name;
multiple policies may be entered in a comma-separated list:
0 or NORMAL
1 or FIFO
2 or RR
3 or BATCH
4 or ISO
5 or IDLE
6 or DEADLINE
The process identifier types may be mixed. For each task, the following
items are displayed:
1. the process PID.
2. the parent process PID.
3. the CPU number that the task ran on last.
4. the task_struct address or the kernel stack pointer of the process.
(see -s option below)
5. the task state (RU, IN, UN, ZO, ST, TR, DE, SW, WA, PA, ID, NE).
6. the percentage of physical memory being used by this task.
7. the virtual address size of this task in kilobytes.
8. the resident set size of this task in kilobytes.
9. the command name.
总结
以上是对浪潮云启操作系统(InLinux)下使用crash排查问题的简单介绍,大家可以根据自己的需要,去做深入探索。