Debug 内核 Oops

内核的Oops有点像用户态的 段错误(segfaults). 通常,CPU寄存器和调用栈信息会被dump出来。利用这些信息,能够查出来发生问题的代码。

下面用一个例子来说明。

1. 首先,写一个简单的内核模块代码:

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
 
static void create_oops() {
        *(int *)0 = 0;
}
 
static int __init my_oops_init(void) {
        printk("oops from the module\n");
        create_oops();
       return (0);
}
static void __exit my_oops_exit(void) {
        printk("Goodbye world\n");
}
 
module_init(my_oops_init);
module_exit(my_oops_exit);
显然,这个模块在被载入的时候,将会出错。

把这段代码保存为 oops.c, 放到 oops 目录下。

然后,编译:

export ARCH=arm
export CROSS_COMPILE=arm-linux-gnueabi-
make -C /home/charles/code/linux-3.2 M=`pwd` modules

或者写一个Makefile如下:

obj-m := oops.o

ARCH = arm
CROSS_COMPILE = arm-linux-gnueabi- 
EXTRA_CFLAGS = -g -O0
all:
	make ARCH=$(ARCH) CROSS_COMPILE=$(CROSS_COMPILE) -C $(HOME)/code/linux-3.10.28 M=$(PWD) modules
clean:
	make ARCH=$(ARCH) CROSS_COMPILE=$(CROSS_COMPILE) -C $(HOME)/code/linux-3.10.28 M=$(PWD) clean



会生成一系列的文件:

:~/code/oops$ ls
Makefile       Module.symvers  oops.ko     oops.mod.o
modules.order  oops.c          oops.mod.c  oops.o
:~/code/oops$ cat Makefile 
obj-m := oops.o

然后,把 oops.ko 拷贝到目标机(实质是qemu虚拟机)的 /lib/modules/3.2.0/下面:

~ # ls /lib/modules/3.2.0/
oops.ko
然后,加载 oops:

~ # modprobe oops
Disabling lock debugging due to kernel taint
oops: module license 'unspecified' taints kernel.
oops from the module
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = 8738c000
[00000000] *pgd=673c6831, *pte=00000000, *ppte=00000000
Internal error: Oops: 817 [#1] SMP
Modules linked in: oops(P+)
CPU: 0    Tainted: P           O  (3.2.0 #1)
PC is at my_oops_init+0x10/0x1c [oops]
LR is at my_oops_init+0xc/0x1c [oops]
pc : [<7f002010>]    lr : [<7f00200c>]    psr: 60000013
sp : 873c5eb0  ip : 88820000  fp : 7f002000
r10: 873c4000  r9 : 8046d100  r8 : 0000001c
r7 : 00000001  r6 : 873f7a80  r5 : 7f000074  r4 : 7f000074
r3 : 804554ac  r2 : 804554ac  r1 : 60000093  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c53c7d  Table: 6738c06a  DAC: 00000015
Process modprobe (pid: 474, stack limit = 0x873c42f0)
Stack: (0x873c5eb0 to 0x873c6000)
5ea0:                                     00000000 80008678 80572140 8009cc3c
5ec0: 00000000 00000000 870834a0 88890000 00000001 80058ec0 7f0000bc 7f000074
5ee0: 7f000074 873f7a80 00000001 0000001c 7f0000bc 00000024 000000b2 80058588
5f00: 7f000080 000b70ca 0001d9c1 00000000 800567d4 000a2974 7f0001b0 873c4000
5f20: 00000068 00000000 00000000 00000000 00000000 00000000 00000000 00000000
5f40: 88890000 00005d02 888942a0 8889410e 88895c50 870834e0 000001c4 00000214
5f60: 00000000 00000000 00000025 00000026 0000000f 00000000 0000000d 00000000
5f80: 00000004 000b70ca 000c30e8 00000000 00000080 8000e2a8 873c4000 00000000
5fa0: 0001d9c1 8000e100 000b70ca 000c30e8 000c30e8 00005d02 000a2974 00000000
5fc0: 000b70ca 000c30e8 00000000 00000080 000b70d8 7ec5ff80 000b70ca 0001d9c1
5fe0: 2acc76a0 7ec5f990 0001d359 2acc76b0 800d0010 000c30e8 00000000 00000000
[<7f002010>] (my_oops_init+0x10/0x1c [oops]) from [<80008678>] (do_one_initcall+0xfc/0x164)
[<80008678>] (do_one_initcall+0xfc/0x164) from [<80058588>] (sys_init_module+0xd10/0x1a60)
[<80058588>] (sys_init_module+0xd10/0x1a60) from [<8000e100>] (ret_fast_syscall+0x0/0x30)
Code: e92d4008 e59f000c eb4c4240 e3a00000 (e5800000) 
---[ end trace a9cf7df06d0f6920 ]---
Segmentation fault

其中能看到 pc, lr(link register)和 sp 寄存器的值和调用堆栈。

my_oops_init+0x10/0x1c
表示 符号+偏移/长度

2. 下面开始 debug.

首先,在  host 机器上,把模块加载到 gdb里面:

$ arm-linux-gnueabi-gdb oops.ko 
GNU gdb (crosstool-NG linaro-1.13.1-2012.04-20120426 - Linaro GCC 2012.04) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=i686-build_pc-linux-gnu --target=arm-linux-gnueabi".
For bug reporting instructions, please see:
<https://bugs.launchpad.net/gcc-linaro>...
Reading symbols from /home/charles/code/oops/oops.ko...done.

然后,把符号文件加入进来:

(gdb) add-symbol-file oops.ko 0x7f002000
add symbol table from file "oops.ko" at
	.text_addr = 0x7f002000
(y or n) y  
Reading symbols from /home/charles/code/oops/oops.ko...done.
0x7f002000为oops.ko代码段的地址,可以用如下的方式得到:

~ # cat /sys/module/oops/sections/.init.text 
0x7f002000
根据 pc的值可以知道发生问题的函数,对它进行反汇编:

(gdb) disassemble  my_oops_init
Dump of assembler code for function my_oops_init:
   0x00000000 <+0>:	push	{r3, lr}
   0x00000004 <+4>:	ldr	r0, [pc, #12]	; 0x18 <my_oops_init+24>
   0x00000008 <+8>:	bl	0x8 <my_oops_init+8>
   0x0000000c <+12>:	mov	r0, #0
   0x00000010 <+16>:	str	r0, [r0]
   0x00000014 <+20>:	pop	{r3, pc}
   0x00000018 <+24>:	andeq	r0, r0, r0
End of assembler dump.
根据上面的便宜值0x10,可以知道出错时正在执行的代码的位置为:

0x00000000 + 0x10 = 0x00000010, 即是 str r0,[r0]

(gdb) l *0x00000010
0x10 is in my_oops_init (/home/charles/code/oops/oops.c:6).
1	#include <linux/kernel.h>
2	#include <linux/module.h>
3	#include <linux/init.h>
4	 
5	static void create_oops() {
6	        *(int *)0 = 0;
7	}
8	 
9	static int __init my_oops_init(void) {
10	        printk("oops from the module\n");

即在第6行。

这种方法其实是把问题搞复杂了,其实,不需要知道oops 模块在内核中的地址.

直接根据

my_oops_init+0x10/0x1c
就可以定位到出错的代码在函数  oops_init里的行数。

$ arm-linux-gnueabi-gdb oops.ko 
GNU gdb (crosstool-NG linaro-1.13.1-2012.04-20120426 - Linaro GCC 2012.04) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=i686-build_pc-linux-gnu --target=arm-linux-gnueabi".
For bug reporting instructions, please see:
<https://bugs.launchpad.net/gcc-linaro>...
Reading symbols from /home/charles/code/oops/oops.ko...done.
(gdb) disassemble  my_oops_
my_oops_exit  my_oops_init  
(gdb) disassemble  my_oops_init 
Dump of assembler code for function my_oops_init:
   0x00000000 <+0>:	push	{r3, lr}
   0x00000004 <+4>:	ldr	r0, [pc, #12]	; 0x18 <my_oops_init+24>
   0x00000008 <+8>:	bl	0x8 <my_oops_init+8>
   0x0000000c <+12>:	mov	r0, #0
   0x00000010 <+16>:	str	r0, [r0]
   0x00000014 <+20>:	pop	{r3, pc}
   0x00000018 <+24>:	andeq	r0, r0, r0
End of assembler dump.
(gdb) print /x  0x00000000+0x10
$1 = 0x10
(gdb) list *0x10
0x10 is in my_oops_init (/home/charles/code/oops/oops.c:6).
1	#include <linux/kernel.h>
2	#include <linux/module.h>
3	#include <linux/init.h>
4	 
5	static void create_oops() {
6	        *(int *)0 = 0;
7	}
8	 
9	static int __init my_oops_init(void) {
10	        printk("oops from the module\n");
(gdb) 


参考:

1. http://www.linuxforu.com/2011/01/understanding-a-kernel-oops/


  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值