示例
在驱动代码中增加空指针操作,内核崩溃
[ 5.529101] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 5.529266] pgd = c0004000
[ 5.529323] [00000000] *pgd=00000000
[ 5.529780] Internal error: Oops: 817 [#1] SMP ARM
[ 5.530051] Modules linked in:
[ 5.530244] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.0.0 #3
[ 5.530339] Hardware name: ARM-Versatile Express
[ 5.530578] task: ee860000 ti: ee84a000 task.ti: ee84a000
[ 5.531114] PC is at pl031_probe+0x120/0x1e8
[ 5.531361] LR is at 0xeea68131
[ 5.531451] pc : [<c0363998>] lr : [<eea68131>] psr: 60000013
[ 5.531451] sp : ee84be60 ip : eea68db0 fp : 00000000
[ 5.531768] r10: c063b294 r9 : 000000a2 r8 : 00000001
[ 5.531946] r7 : c078de00 r6 : ee18b9c0 r5 : eea8c200 r4 : c0774444
[ 5.532154] r3 : 00000100 r2 : 00000126 r1 : eea68aac r0 : 00000000
[ 5.532393] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 5.532545] Control: 10c5387d Table: 8e1b806a DAC: 00000015
[ 5.532717] Process swapper/0 (pid: 1, stack limit = 0xee84a210)
[ 5.533023] Stack: (0xee84be60 to 0xee84c000)
[ 5.533197] be60: c05a10f0 ee18b9c0 c078dd68 eea8c200 c078dd14 00000000 eea8c2c4 c027506c
[ 5.533470] be80: c07b8c5c eea8c200 c078dd14 c07820c8 00000000 c02aaef4 eea8c200 c078dd14
[ 5.534479] bea0: eea8c234 c07820c8 00000000 c02ab170 00000000 c078dd14 c02ab0e4 c02a9550
[ 5.534797] bec0: ee83025c eea55334 c078dd14 eea68e00 00000000 c02aa7a4 c05a10f0 00000006
[ 5.535009] bee0: c078dd14 c078dd14 c07765e0 ee18b880 c0625514 c02ab78c c07765e0 c07765e0
[ 5.535251] bf00: ee18b880 c0008950 ee829480 c04923c4 00000011 00000000 00000000 c0124150
[ 5.535508] bf20: 00000000 c077946c 60000113 00000003 000000a2 c05e5024 eefeb334 c003a77c
[ 5.536181] bf40: c059f498 eefeb339 00000006 00000006 c0779454 eefeb300 c0644fd8 00000006
[ 5.536394] bf60: c063b28c c07994c0 c07994c0 c063b294 00000000 c0609e48 00000006 00000006
[ 5.536705] bf80: c0609594 c04870c4 000094c0 c04870c4 00000000 00000000 00000000 00000000
[ 5.536876] bfa0: 00000000 c04870d0 00000000 c000e4c0 00000000 00000000 00000000 00000000
[ 5.537111] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 5.537287] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[ 5.538294] [<c0363998>] (pl031_probe) from [<c027506c>] (amba_probe+0xc0/0x144)
[ 5.538656] [<c027506c>] (amba_probe) from [<c02aaef4>] (driver_probe_device+0x80/0x22c)
[ 5.539030] [<c02aaef4>] (driver_probe_device) from [<c02ab170>] (__driver_attach+0x8c/0x90)
[ 5.539401] [<c02ab170>] (__driver_attach) from [<c02a9550>] (bus_for_each_dev+0x68/0x9c)
[ 5.539667] [<c02a9550>] (bus_for_each_dev) from [<c02aa7a4>] (bus_add_driver+0x148/0x1f0)
[ 5.539941] [<c02aa7a4>] (bus_add_driver) from [<c02ab78c>] (driver_register+0x78/0xf8)
[ 5.540086] [<c02ab78c>] (driver_register) from [<c0008950>] (do_one_initcall+0x8c/0x1d8)
[ 5.540466] [<c0008950>] (do_one_initcall) from [<c0609e48>] (kernel_init_freeable+0x1d4/0x274)
[ 5.540924] [<c0609e48>] (kernel_init_freeable) from [<c04870d0>] (kernel_init+0xc/0xe8)
[ 5.541238] [<c04870d0>] (kernel_init) from [<c000e4c0>] (ret_from_fork+0x14/0x34)
[ 5.541843] Code: e3a02000 ebf3d7a5 e3500000 03a03c01 (05803000)
[ 5.542446] ---[ end trace 34ef2349d98ebf53 ]---
[ 5.542819] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
定位异常代码行
oops中其实已经指明了具体崩溃函数,上述例子中[ 5.531114] PC is at pl031_probe+0x120/0x1e8
指明了,崩溃代码在函数pl031_probe
汇编代码偏移0x120
位置,函数总长度0x1e8
。
在一些函数代码比较简单的情况,可以直接检查函数实现,很快就能发现问题。崩在一些很复杂的函数中,就需要精确定位哪一行代码出问题了。
可以使用addr2line
工具定位具体代码行。
前置条件,内核打开CONFIG_DEBUG_INFO
选项,保存
1)根据System.map找到函数基地址,示例中,pl031_probe
基地址c0363878 t pl031_probe
。
2)需要查询的地址为基地址+偏移地址。示例中崩溃代码虚拟地址为0xc0363878 + 0x120 = 0xc0363998
3)使用交叉编译工具链的addr2line
工具,和vmlinux
文件具体命令如下
root@ubuntu:~/work/QEMU/linux-4.0# arm-linux-gnueabi-addr2line -C -f -e vmlinux 0xc0363998
pl031_probe
/root/work/QEMU/linux-4.0/drivers/rtc/rtc-pl031.c:388
可以看到异常代码行在rtc-pl031.c
文件第388行。
在排查一些问题时,很多时候我们没有设备中内核对应的vmlinux
文件,或者设备中的内核没有打开CONFIG_DEBUG_INFO
。可以使用相同版本内核代码重新编译内核,虽然对应函数的链接地址与异常设备不一致,但是只要不改变崩溃函数的实现代码,函数新的基地址+偏移,还是可以定位到代码崩溃在函数哪一行的。