1、查看函数的调用栈,通过内核查看内核源码的方式明确基本的函数功能。
2、查看发生crash时传入函数(exception RIP)的参数,分析传入参数的正确性。
3、查看导致宕机进程完整的cmdline,根据运行时cmdline推断函数作用。
一、读取进程crash时参数
以 echo c > /proc/sysrq-trigger
的方式构造宕机产生vmcore,如果没有产生vmcore有可能是kdump工具没有安装好。
# 获取panic进程名
KERNEL: /usr/lib/debug/boot/vmlinux-5.15.0-52-generic
DUMPFILE: /var/crash/202211042234/vmcore.202211042234
CPUS: 4
DATE: Fri Nov 4 22:33:14 CST 2022
UPTIME: 00:59:42
LOAD AVERAGE: 0.24, 0.06, 0.02
TASKS: 487
NODENAME: curtis-Aspire-E5-471G
RELEASE: 5.15.0-52-generic
VERSION: #58~20.04.1-Ubuntu SMP Thu Oct 13 13:09:46 UTC 2022
MACHINE: x86_64 (2394 Mhz)
MEMORY: 7.9 GB
PANIC: "Kernel panic - not syncing: sysrq triggered crash"
PID: 4634
COMMAND: "bash"
TASK: ffff9ed18589b200 [THREAD_INFO: ffff9ed18589b200]
CPU: 2
STATE: TASK_RUNNING (PANIC)
crash> task_struct.comm ffff9ed18589b200
comm = "bash\000gnome-term"
# 获取进程的mm_struct地址
crash> task_struct.mm ffff9ed18589b200
mm = 0xffff9ed1d7bb9dc0
# 获取参数的存放起始地址和结束地址
crash> mm_struct.arg_start,arg_end 0xffff9ed1d7bb9dc0
arg_start = 140726798788586 -> 0x7FFD82DA27EA
arg_end = 140726798788596 -> 0x7FFD82DA27F4
# 找到虚拟地址对应的物理地址
crash> vtop 0x7FFD82DA27EA
VIRTUAL PHYSICAL
7ffd82da27ea 1c48df7ea
crash> vtop 0x7FFD82DA27F4
VIRTUAL PHYSICAL
7ffd82da27f4 1c48df7f4
# 读取物理内存数据
# 通过 echo c > /proc/sysrq-trigger 构造的宕机参数为什么只能看到/bin/bas?
# -p address argument is a physical address 读取物理地址内容
# -e addr display memory until reaching specified ending hexadecimal address. 结束地址
crash> rd -p 1c48df7ea -e 1c48df7f4
1c48df7ea: 7361622f6e69622f /bin/bas
二、查看函数调用栈bt
# 打印所有函数的调用栈
crash> bt -t
PID: 4634 TASK: ffff9ed18589b200 CPU: 2 COMMAND: "bash"
START: machine_kexec at ffffffff93a8aea0
[ffffb850c1a5bc38] machine_kexec at ffffffff93a8aea0
[ffffb850c1a5bc98] __crash_kexec at ffffffff93b98732
[ffffb850c1a5bd20] __crash_kexec at ffffffff93b98761
[ffffb850c1a5bd68] panic at ffffffff946be4d5
[ffffb850c1a5bde8] sysrq_handle_crash at ffffffff941def3a
[ffffb850c1a5bdf8] __handle_sysrq.cold at ffffffff94713280
[ffffb850c1a5be40] write_sysrq_trigger at ffffffff941dfa28
[ffffb850c1a5be58] proc_reg_write at ffffffff93e381ba
[ffffb850c1a5be78] vfs_write at ffffffff93d81899
[ffffb850c1a5beb0] ksys_write at ffffffff93d83c07
[ffffb850c1a5bef0] __x64_sys_write at ffffffff93d83caa
[ffffb850c1a5bf00] do_syscall_64 at ffffffff9475e27c
[ffffb850c1a5bf10] syscall_exit_to_user_mode at ffffffff94762977
[ffffb850c1a5bf18] __x64_sys_openat at ffffffff93d80440
[ffffb850c1a5bf28] do_syscall_64 at ffffffff9475e289
[ffffb850c1a5bf50] entry_SYSCALL_64_after_hwframe at ffffffff94800099
RIP: 00007ff530f4e0a7 RSP: 00007ffd82da0758 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff530f4e0a7
RDX: 0000000000000002 RSI: 000055eb20053fc0 RDI: 0000000000000001
RBP: 000055eb20053fc0 R8: 000000000000000a R9: 0000000000000001
R10: 000055eb1ff8c017 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ff53102d6a0 R14: 00007ff5310294a0 R15: 00007ff5310288a0
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
# 查看调用栈中函数源码所在路径
crash> bt -l
PID: 4634 TASK: ffff9ed18589b200 CPU: 2 COMMAND: "bash"
#0 [ffffb850c1a5bc38] machine_kexec at ffffffff93a8aea0
/build/linux-hwe-5.15-p6caTn/linux-hwe-5.15-5.15.0/arch/x86/kernel/machine_kexec_64.c: 357
#1 [ffffb850c1a5bc98] __crash_kexec at ffffffff93b98732
/build/linux-hwe-5.15-p6caTn/linux-hwe-5.15-5.15.0/kernel/kexec_core.c: 964
#2 [ffffb850c1a5bd68] panic at ffffffff946be4d5
/build/linux-hwe-5.15-p6caTn/linux-hwe-5.15-5.15.0/arch/x86/include/asm/smp.h: 62
#3 [ffffb850c1a5bde8] sysrq_handle_crash at ffffffff941def3a
/build/linux-hwe-5.15-p6caTn/linux-hwe-5.15-5.15.0/drivers/tty/sysrq.c: 155
#4 [ffffb850c1a5bdf8] __handle_sysrq.cold at ffffffff94713280
/build/linux-hwe-5.15-p6caTn/linux-hwe-5.15-5.15.0/include/linux/rcupdate.h: 719
#5 [ffffb850c1a5be40] write_sysrq_trigger at ffffffff941dfa28
/build/linux-hwe-5.15-p6caTn/linux-hwe-5.15-5.15.0/drivers/tty/sysrq.c: 1166
#6 [ffffb850c1a5be58] proc_reg_write at ffffffff93e381ba
/build/linux-hwe-5.15-p6caTn/linux-hwe-5.15-5.15.0/fs/proc/inode.c: 335
#7 [ffffb850c1a5be78] vfs_write at ffffffff93d81899
/build/linux-hwe-5.15-p6caTn/linux-hwe-5.15-5.15.0/fs/read_write.c: 593
#8 [ffffb850c1a5beb0] ksys_write at ffffffff93d83c07
/build/linux-hwe-5.15-p6caTn/linux-hwe-5.15-5.15.0/fs/read_write.c: 649
#9 [ffffb850c1a5bef0] __x64_sys_write at ffffffff93d83caa
/build/linux-hwe-5.15-p6caTn/linux-hwe-5.15-5.15.0/fs/read_write.c: 658
#10 [ffffb850c1a5bf00] do_syscall_64 at ffffffff9475e27c
/build/linux-hwe-5.15-p6caTn/linux-hwe-5.15-5.15.0/arch/x86/entry/common.c: 50
#11 [ffffb850c1a5bf18] __x64_sys_openat at ffffffff93d80440
/build/linux-hwe-5.15-p6caTn/linux-hwe-5.15-5.15.0/fs/open.c: 1242
#12 [ffffb850c1a5bf28] do_syscall_64 at ffffffff9475e289
/build/linux-hwe-5.15-p6caTn/linux-hwe-5.15-5.15.0/arch/x86/entry/common.c: 86
#13 [ffffb850c1a5bf50] entry_SYSCALL_64_after_hwframe at ffffffff94800099
/build/linux-hwe-5.15-p6caTn/linux-hwe-5.15-5.15.0/arch/x86/entry/entry_64.S: 118
RIP: 00007ff530f4e0a7 RSP: 00007ffd82da0758 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff530f4e0a7
RDX: 0000000000000002 RSI: 000055eb20053fc0 RDI: 0000000000000001
RBP: 000055eb20053fc0 R8: 000000000000000a R9: 0000000000000001
R10: 000055eb1ff8c017 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ff53102d6a0 R14: 00007ff5310294a0 R15: 00007ff5310288a0
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
三、加载内核模块调试信息
因为这个非法地址访问是我在自己的内核驱动中做的相关操作,在实际的项目中,也有可能在调试内核驱动的时候发生OOPS,crash也允许我们加载奔溃时的内核驱动。
crash> mod -s kprobe_example /home/curtis/write_code/kprobe_example/kprobe_example.ko
MODULE NAME BASE SIZE OBJECT FILE
ffffffffc14590c0 kprobe_example ffffffffc1457000 16384 /home/curtis/write_code/kprobe_example/kprobe_example.ko
# dis -l 查看发生oops时源码所在路径
crash> dis -l ffffffff846817f0
/build/linux-hwe-5.15-p6caTn/linux-hwe-5.15-5.15.0/lib/string.c: 387
0xffffffff846817f0 <strcmp+16>: cmp (%rsi,%rax,1),%dl
四、调试实例
驱动中构建一个非法地址访问导致的OOPS问题,构造函数源代码如下:
static int __kprobes handler_pre(struct kprobe *p, struct pt_regs *regs)
{
char buff[] = "just for test string!";
char *tmp = 0xffffffffffffffdc;
if (!strcmp(buff, tmp)) {
printk("Found target file!\n");
}
return 0;
}
使用crash打开转储文件进行调试。
$ sudo crash ./vmcore.202211051708 /usr/lib/debug/boot/vmlinux-5.15.0-52-generic
KERNEL: /usr/lib/debug/boot/vmlinux-5.15.0-52-generic [TAINTED]
DUMPFILE: ./vmcore.202211051708
CPUS: 4
DATE: Sat Nov 5 17:07:40 CST 2022
UPTIME: 00:12:21
LOAD AVERAGE: 0.35, 0.46, 0.38
TASKS: 510
NODENAME: curtis-Aspire-E5-471G
RELEASE: 5.15.0-52-generic
VERSION: #58~20.04.1-Ubuntu SMP Thu Oct 13 13:09:46 UTC 2022
MACHINE: x86_64 (2394 Mhz)
MEMORY: 7.9 GB
PANIC: "Oops: 0000 [#1] SMP PTI" (check log for details)
PID: 833
COMMAND: "rs:main Q:Reg"
TASK: ffff88a107351900 [THREAD_INFO: ffff88a107351900]
CPU: 2
STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 833 TASK: ffff88a107351900 CPU: 2 COMMAND: "rs:main Q:Reg"
#0 [ffffad6040e47980] machine_kexec at ffffffff8408aea0
#1 [ffffad6040e479e0] __crash_kexec at ffffffff84198732
#2 [ffffad6040e47ab0] crash_kexec at ffffffff84199a1c
#3 [ffffad6040e47ac0] oops_end at ffffffff840429fa
#4 [ffffad6040e47ae8] page_fault_oops at ffffffff8409d31d
#5 [ffffad6040e47b70] kernelmode_fixup_or_oops at ffffffff8409d522
#6 [ffffad6040e47bb0] __bad_area_nosemaphore at ffffffff8409d6fd
#7 [ffffad6040e47bf8] bad_area_nosemaphore at ffffffff8409d756
#8 [ffffad6040e47c08] do_kern_addr_fault at ffffffff8409e272
#9 [ffffad6040e47c30] exc_page_fault at ffffffff84d62527
#10 [ffffad6040e47c60] asm_exc_page_fault at ffffffff84e00b66
[exception RIP: strcmp+16] //这里就是发生crash时函数相关寄存器的值
RIP: ffffffff846817f0 RSP: ffffad6040e47d10 RFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffad6040e47d88 RCX: ffffffff843817e5
RDX: 000000000000006a RSI: ffffffffffffffdc RDI: ffffad6040e47d1a
RBP: ffffad6040e47d38 R8: 0000000000000001 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffff7
R13: ffffffffc1459000 R14: ffffffff843817e1 R15: ffff88a25729ff40
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#11 [ffffad6040e47d10] handler_pre at ffffffffc1457068 [kprobe_example]
#12 [ffffad6040e47d40] kprobe_ftrace_handler at ffffffff8408f6bb
#13 [ffffad6040e47e08] vfs_write at ffffffff843817e1
#14 [ffffad6040e47e38] vfs_write at ffffffff843817e5
#15 [ffffad6040e47e48] ksys_write at ffffffff84383c07
#16 [ffffad6040e47ea8] __x64_sys_write at ffffffff84383caa
#17 [ffffad6040e47eb8] do_syscall_64 at ffffffff84d5e27c
#18 [ffffad6040e47ef0] syscall_exit_to_user_mode at ffffffff84d62977
#19 [ffffad6040e47f08] do_syscall_64 at ffffffff84d5e289
#20 [ffffad6040e47f20] do_syscall_64 at ffffffff84d5e289
#21 [ffffad6040e47f50] entry_SYSCALL_64_after_hwframe at ffffffff84e00099
RIP: 00007f8aff44b2cf RSP: 00007f8afe2a0860 RFLAGS: 00000293
RAX: ffffffffffffffda RBX: 00007f8af40013a0 RCX: 00007f8aff44b2cf
RDX: 0000000000000060 RSI: 00007f8af40013a0 RDI: 0000000000000007
RBP: 00007f8af40010c0 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 00007f8af4028dd0
R13: 0000000000000060 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
# 查看发生crash时函数strcmp相关参数
# 在x86_64架构中,RDI/RSI/RDX/RCX寄存器分别存放函数调用的第一到第四个参数
crash> rd 0xffffad6040e47d1a 3 //第一个参数
ffffad6040e47d1a: 726f66207473756a 7473207473657420 just for test st
ffffad6040e47d2a: c7000021676e6972 ring!...
crash> rd ffffffffffffffdc 3 //第二个参数,非法地址访问导致oops,地址类型为64-bit KVADDR
rd: read error: kernel virtual address: ffffffffffffffdc type: "64-bit KVADDR"