内核崩溃时会产生如下的内容:
[31484.746132] BUG: kernel NULL pointer dereference, address: 0000000000000000
[31484.746135] #PF: supervisor write access in kernel mode
[31484.746136] #PF: error_code(0x0002) - not-present page
[31484.746137] PGD 0 P4D 0
[31484.746156] Oops: 0002 [#1] SMP PTI
[31484.746173] CPU: 0 PID: 5573 Comm: insmod Tainted: G OE 5.4.0-29-generic #33-Ubuntu
[31484.746175] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[31484.746179] RIP: 0010:main_init+0x51/0x1000 [kbug]
[31484.746181] Code: 6b f0 64 4c 8d 8f 90 0a 00 00 41 b8 18 00 00 00 48 c7 c1 50 30 41 c0 48 c7 c2 70 30 41 c0 48 c7 c7 a0 30 41 c0 e8 af af ff ff <c7> 04 25 00 00 00 00 00 00 00 00 b8 ff ff ff ff 5d c3 00 00 00 00
[31484.746183] RSP: 0018:ffffac2c426bbc60 EFLAGS: 00010286
[31484.746184] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[31484.746185] RDX: ffff95f1f9427740 RSI: ffff95f1f94178c8 RDI: ffff95f1f94178c8
[31484.746186] RBP: ffffac2c426bbc60 R08: ffff95f1f94178c8 R09: 0000000000000004
[31484.746187] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffffc0417000
[31484.746188] R13: ffff95f1c3ddb1e0 R14: ffffffffc0414018 R15: ffffffffc0414000
[31484.746190] FS: 00007f3d7e201540(0000) GS:ffff95f1f9400000(0000) knlGS:0000000000000000
[31484.746191] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[31484.746192] CR2: 0000000000000000 CR3: 0000000002ca6004 CR4: 00000000001606f0
[31484.746218] Call Trace:
[31484.746332] do_one_initcall+0x4a/0x1fa
[31484.746395] ? _cond_resched+0x19/0x30
[31484.746442] ? kmem_cache_alloc_trace+0x163/0x230
[31484.746462] do_init_module+0x62/0x250
[31484.746465] load_module+0x10b8/0x1200
[31484.746469] __do_sys_finit_module+0xbe/0x120
[31484.746471] ? __do_sys_finit_module+0xbe/0x120
[31484.746473] __x64_sys_finit_module+0x1a/0x20
[31484.746491] do_syscall_64+0x57/0x190
[31484.746495] entry_SYSCALL_64_after_hwframe+0x44/0xa9
第1行BUG: kernel NULL pointer dereference, address: 0000000000000000
说明了崩溃的原因: 空指针解引用;
第8行RIP: 0010:main_init+0x51/0x1000 [kbug]
说明了崩溃的位置main_init函数, 偏移量0x51, 崩溃模块名称: kbug;
有了这些信息我们就可以使用gdb来定位崩溃在代码哪一行了, 步骤如下:
- 使用-O1 -g重新编译带调试信息的kbug模块
- 然后使用gdb加载kbug模块
- 再使用list *main_init+0x51 即可看到崩溃在哪里了
示例如下:
root@pc:/home/user/project/linux/kernel/kbug# gdb kbug.ko
Reading symbols from kbug.ko...
(gdb) list *main_init+0x51
0xfd is in main_init (/home/user/project/linux/kernel/kbug/src/main.c:26).
21 */
22 static __init int main_init(void)
23 {
24 klogw(log_tag "kernel module load.");
25
26 *(int*)0 = 0;
27
28 return -1;
29 }
30
(gdb)
在本例中崩溃的位置在/home/user/project/linux/kernel/kbug/src/main.c:26
, 可以看到对应的代码*(int*)0 = 0;
正是崩溃的原因
上面是崩溃在我们自己编写的模块中的情况, 如果崩溃在Linux内核本身也可以类似处理:
以下示例摘自 Linux内核文档
# 启用调试信息重新编译内核
$ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO
$ make vmlinux
# 使用gdb加载内核, 并使用list查看代码
$ gdb vmlinux
(gdb) list *vt_ioctl+0xda8
# 当然也可以只make崩溃的模块
$ make drivers/tty/
$ gdb drivers/tty/vt/vt_ioctl.o
(gdb) list *vt_ioctl+0xda8