linux驱动程序段错误调试

本文深入解析内核驱动中的段错误,通过分析空指针异常的实例,详细阐述了定位错误位置、反汇编模块及分析栈回溯的方法。通过合理运用调试技术,快速定位并解决复杂BUG。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一、堆、栈

在分析段错误之前,先了解一下什么是堆?什么是栈?

堆:一般由开发者分配释放,如果没有释放,程序结束时,在有的OS中可能会被自动释放,分配方式类似于链表。堆的操作方式为,队列优先,先进先出的原则。

栈:由操作系统自动分配,存放函数的参数值,局部变量。栈的操作方式为,先进后出的原则。

堆栈中定义了一些操作。 两个最重要的是PUSH和POP。 
PUSH:操作在堆栈的顶部加入一个元素。
POP:操作相反,在堆栈顶部移去一个元素,并将堆栈的大小减一

二、分析内核驱动段错误

段错误通常都是因为指针地址出错导致的,这在C语言的代码中,非常普遍,也是非常致命的BUG。有时候处理起来非常困难。这里简单分析一下驱动中段错误的解决过程,其他复杂的段错误,都跟这个处理过程差不多,遇见复杂的段错误,需要配合打印,GDB等多种方法配合。反正DBUG是一个繁琐而又痛苦的过程,但是多数程序猿们又相爱相杀,乐此不疲,真是真爱啊!

1、这是一段段错误信息,先来阅读这个信息内容。

Unable to handle kernel NULL pointer dereference at virtual address 00000000
//内核出现了一个空指针
pgd = 7fc41725
[00000000] *pgd=337a9831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] ARM
Modules linked in: buttons(O)
CPU: 0 PID: 856 Comm: buttons_test Tainted: G           O      4.19.8 #9
Hardware name: SMDK2440
PC is at sixth_drv_open+0x58/0x17c [buttons]
//PC就是发生错误的指令的地址
//大多时候,PC值只会给出一个地址,不到指示说是在哪个函数里
LR is at sixth_drv_open+0x4c/0x17c [buttons]
pc : [<bf0002dc>]    lr : [<bf0002d0>]    psr: 60000013
sp : c35abdf0  ip : 00000000  fp : 00000000
r10: c35abf70  r9 : c353f8c0  r8 : c3724a70
r7 : c0675008  r6 : 00000000  r5 : c36586c0  r4 : bf000ae8
r3 : 00000000  r2 : 347b7ccd  r1 : 00000000  r0 : 00000000
//执行这条导致错误的指令时各个寄存器的值
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
Control: c000717f  Table: 33598000  DAC: 00000051
Process buttons_test (pid: 856, stack limit = 0xaced0220)
//发生错误时当前进程的名称是buttons_test 
Stack: (0xc35abdf0 to 0xc35ac000)//栈信息
bde0:                                     bf0009cc bf000af0 bf000b20 c00c8a60
be00: 00000000 347b7ccd 00000002 c353f8c0 c3724a70 c353f8c8 c00c89b0 c304e770
be20: c353f8c0 c00c09f0 c35abec0 00000000 00000002 00000000 c304e770 c00d2214
be40: 00000000 33ff418f 00000000 00000000 00000006 00000041 c36e8858 00000054
be60: 00000000 c00a1408 c35ad540 c35aa000 c35ad490 c35ad600 00000054 00000002
be80: c3724a70 c340fd30 c31df220 c359adb8 00000000 347b7ccd 00000000 00000003
bea0: c35abf70 c0675008 00000001 fffff000 c35aa000 00000000 bec32d54 c00d3efc
bec0: c340fd30 c31df220 1c9e12f6 00000007 c35e0015 0000001c 00000000 c304a198
bee0: c3724a70 00000101 00000002 00000036 00000000 00000000 00000000 c35abf00
bf00: c36e8820 b6f13000 00100877 00000000 b6f13000 00100875 00000000 c35ad7e0
bf20: b6f14000 347b7ccd 00000003 c365d300 c35e0000 00000000 00000000 00000002
bf40: ffffff9c c00e0bf0 00000002 347b7ccd ffffff9c 00000003 c0675008 ffffff9c
bf60: c35e0000 c00c1ef8 00000005 b6f13000 00000002 c00a0000 00000006 00000100
bf80: 00000001 347b7ccd 00008588 00000000 000083d4 00000005 c00091e4 c35aa000
bfa0: 00000000 c0009000 00008588 00000000 00008618 00000002 bec32eac 00000000
bfc0: 00008588 00000000 000083d4 00000005 00000000 00000000 b6f14000 bec32d54
bfe0: 00000000 bec32d38 000084fc b6e7a35c 60000010 00008618 00000000 00000000

[<bf0002dc>] (sixth_drv_open [buttons]) from [<c00c8a60>] (chrdev_open+0xb0/0x170)
[<c00c8a60>] (chrdev_open) from [<c00c09f0>] (do_dentry_open+0x1dc/0x380)
[<c00c09f0>] (do_dentry_open) from [<c00d2214>] (path_openat+0x504/0xf4c)
[<c00d2214>] (path_openat) from [<c00d3efc>] (do_filp_open+0x6c/0xe0)
[<c00d3efc>] (do_filp_open) from [<c00c1ef8>] (do_sys_open+0x128/0x1f4)
[<c00c1ef8>] (do_sys_open) from [<c0009000>] (ret_fast_syscall+0x0/0x50)
//(回溯)

2、定位PC指针

分析oops文件,得到PC=0xbf0002dc。

a、查看内核和驱动的调用地址列表

   cat /proc/kallsyms >kallsyms.dis

检索PC=0xbf0002dc相近的地址,以确认出错模块。我的检索结果如下,很明显在buttons这个驱动中出错。

bf000000 t $a   [buttons]
bf000000 t sixth_drv_poll       [buttons]
bf000040 t $d   [buttons]
bf000048 t $a   [buttons]
bf000164 t $d   [buttons]
bf000174 t $a   [buttons]
bf0001ac t $d   [buttons]
bf0001b4 t $a   [buttons]
bf0001b4 t buttons_remove       [buttons]

b、反汇编出错模块

如果地址处于内核,就反汇编内核,地址处于加载的某个模块 就反汇编模块。如何判断PC指针是否是属于内核本身呢,查看内核根目录的System.map ,看看是否有跟PC指针相近的地址,有就说明内核本身出错,反汇编内核,反之,反汇编模块。

这里是模块出错,反汇编模块arm-linux-objdump -D buttons.ko >buttons.dis

c、分析反汇编文件

PC = 0xbf0002dc,在函数中的偏移为0x14,buttons反汇编文件中先到对应的函数,查找0x000002dc相近的函数,找到如下函数存在:

00000284 <sixth_drv_open>:
 284:	e92d4010 	push	{r4, lr}
 288:	e5913020 	ldr	r3, [r1, #32]
 28c:	e24dd008 	sub	sp, sp, #8
 290:	e3130b02 	tst	r3, #2048	; 0x800
 294:	e59f012c 	ldr	r0, [pc, #300]	; 3c8 <sixth_drv_open+0x144>
 298:	0a000034 	beq	370 <sixth_drv_open+0xec>
 29c:	ebfffffe 	bl	0 <down_trylock>
 2a0:	e3500000 	cmp	r0, #0
 2a4:	1a000033 	bne	378 <sixth_drv_open+0xf4>
 2a8:	e3a03000 	mov	r3, #0
 2ac:	e59f4118 	ldr	r4, [pc, #280]	; 3cc <sixth_drv_open+0x148>
 2b0:	e59f1118 	ldr	r1, [pc, #280]	; 3d0 <sixth_drv_open+0x14c>
 2b4:	e2842008 	add	r2, r4, #8
 2b8:	e58d2004 	str	r2, [sp, #4]
 2bc:	e58d1000 	str	r1, [sp]
 2c0:	e5940010 	ldr	r0, [r4, #16]
 2c4:	e1a02003 	mov	r2, r3
 2c8:	e59f1104 	ldr	r1, [pc, #260]	; 3d4 <sixth_drv_open+0x150>
 2cc:	ebfffffe 	bl	0 <request_threaded_irq>
 2d0:	e3500000 	cmp	r0, #0
 2d4:	1a000036 	bne	3b4 <sixth_drv_open+0x130>
 2d8:	e3a01000 	mov	r1, #0
 2dc:	e5912000 	ldr	r2, [r1]
 2e0:	e59fe0f0 	ldr	lr, [pc, #240]	; 3d8 <sixth_drv_open+0x154>
 2e4:	e59fc0f0 	ldr	ip, [pc, #240]	; 3dc <sixth_drv_open+0x158>

得到0x000002dc的指令 “2dc:    e5912000     ldr    r2, [r1]”,意思就是在r1所在的地址取值。显然地址出错。查看buttons的sixth_drv_open函数,这里太简单了,一目了然。具体案列具体分析,阅读出错指令前后的汇编代码,大致就能推断出出错语句,结合打印。定能精确定位。

static int sixth_drv_open(struct inode *inode, struct file *file)
{
	int ret;
	unsigned int *val;	

	if (file->f_flags & O_NONBLOCK){
		if (down_trylock(&button_lock))
			return -EBUSY;
	}else{
		/* 获取信号量 */
		down(&button_lock);
	}
	/* 配置GPF0,2为输入引脚 */
	/* 配置GPG3,11为输入引脚 */
	ret = request_irq(pins_desc[0].irq,  buttons_irq, 0, "S2", &pins_desc[0]);
	if (ret) {
		printk("reqeust_irq %d for EINT0 err : %d!\n", pins_desc[0].irq, ret);
		//return ret;
	}
	
	*val = *val + 0x0333;
	
	ret = request_irq(pins_desc[1].irq,  buttons_irq, 0, "S3", &pins_desc[1]);
	if (ret) {
		printk("reqeust_irq for EINT2 err : %d!\n", ret);
		//return ret;
	}
	ret = request_irq(pins_desc[2].irq, buttons_irq, 0, "S4", &pins_desc[2]);
	if (ret) {
		printk("reqeust_irq for EINT11 err : %d!\n", ret);
		//return ret;
	}
	ret = request_irq(pins_desc[3].irq, buttons_irq, 0, "S5", &pins_desc[3]);	
	if (ret) {
		printk("reqeust_irq for EINT19 err : %d!\n", ret);
		//return ret;
	}
	
	return 0;
}

三、分析栈回溯

截取oops的栈回溯信息

pc : [<bf0002dc>]    lr : [<bf0002d0>]    psr: 60000013
sp : c35abdf0  ip : 00000000  fp : 00000000
r10: c35abf70  r9 : c353f8c0  r8 : c3724a70
r7 : c0675008  r6 : 00000000  r5 : c36586c0  r4 : bf000ae8
r3 : 00000000  r2 : 347b7ccd  r1 : 00000000  r0 : 00000000

Stack: (0xc35abdf0 to 0xc35ac000)//栈信息
bde0:                                     bf0009cc bf000af0 bf000b20 c00c8a60
be00: 00000000 347b7ccd 00000002 c353f8c0 c3724a70 c353f8c8 c00c89b0 c304e770
be20: c353f8c0 c00c09f0 c35abec0 00000000 00000002 00000000 c304e770 c00d2214
be40: 00000000 33ff418f 00000000 00000000 00000006 00000041 c36e8858 00000054
be60: 00000000 c00a1408 c35ad540 c35aa000 c35ad490 c35ad600 00000054 00000002
be80: c3724a70 c340fd30 c31df220 c359adb8 00000000 347b7ccd 00000000 00000003
bea0: c35abf70 c0675008 00000001 fffff000 c35aa000 00000000 bec32d54 c00d3efc
bec0: c340fd30 c31df220 1c9e12f6 00000007 c35e0015 0000001c 00000000 c304a198
bee0: c3724a70 00000101 00000002 00000036 00000000 00000000 00000000 c35abf00
bf00: c36e8820 b6f13000 00100877 00000000 b6f13000 00100875 00000000 c35ad7e0
bf20: b6f14000 347b7ccd 00000003 c365d300 c35e0000 00000000 00000000 00000002
bf40: ffffff9c c00e0bf0 00000002 347b7ccd ffffff9c 00000003 c0675008 ffffff9c
bf60: c35e0000 c00c1ef8 00000005 b6f13000 00000002 c00a0000 00000006 00000100
bf80: 00000001 347b7ccd 00008588 00000000 000083d4 00000005 c00091e4 c35aa000
bfa0: 00000000 c0009000 00008588 00000000 00008618 00000002 bec32eac 00000000
bfc0: 00008588 00000000 000083d4 00000005 00000000 00000000 b6f14000 bec32d54
bfe0: 00000000 bec32d38 000084fc b6e7a35c 60000010 00008618 00000000 00000000

每个函数在执行时,系统都会为这个函数分配一个栈,栈中保存返回地址、局部变量等。

第二部分的分析由PC 指针可以定位到最终调用了sixth_drv_open函数,那么是如何调用到这个函数的呢?

00000284 <sixth_drv_open>:
 284:	e92d4010 	push	{r4, lr}
 288:	e5913020 	ldr	r3, [r1, #32]
 28c:	e24dd008 	sub	sp, sp, #8
 290:	e3130b02 	tst	r3, #2048	; 0x800
 294:	e59f012c 	ldr	r0, [pc, #300]	; 3c8 <sixth_drv_open+0x144>
 298:	0a000034 	beq	370 <sixth_drv_open+0xec>
 29c:	ebfffffe 	bl	0 <down_trylock>

反汇编buttons.dis中发现push指令,这就是压栈,保存函数调用的返回地址等。还有sub减法指令。由此可见站空间为4个32位数据。lr在ARM汇编指令中,就是r14寄存器。按照寄存器顺序压栈,如下:从oops的栈信息可以阅读出lr = c00c8a60,也就是调用sixth_drv_open的函数返回地址为c00c8a60。

知道了返回地址c00c8a60,定位函数。在kallsyms.dis中检索相近的地址。发现c00c8a60舒宏宇内核调用。反汇编内核,检索文件,发现在chrdev_open中调用了sixth_drv_open。这里得到chrdev_open->sixth_drv_open

 c00c89b0 <chrdev_open>:
  203948 c00c89b0:   e92d43f0    push    {r4, r5, r6, r7, r8, r9, lr}
  203949 c00c89b4:   e59f715c    ldr r7, [pc, #348]  ; c00c8b18 <chrdev_open+0x168>
  203950 c00c89b8:   e24dd00c    sub sp, sp, #12
  203951 c00c89bc:   e5973000    ldr r3, [r7]
  203952 c00c89c0:   e1a08000    mov r8, r0
  203953 c00c89c4:   e1a09001    mov r9, r1
  203954 c00c89c8:   e58d3004    str r3, [sp, #4]
  203955 c00c89cc:   e5905130    ldr r5, [r0, #304]  ; 0x130

阅读这一段汇编代码,可以得到lr = c00c09f0 检索内核反汇编文件。do_dentry_open->chrdev_open->sixth_drv_open

 c00c0814 <do_dentry_open>:
  195257 c00c0814:   e92d41f0    push    {r4, r5, r6, r7, r8, lr}
  195258 c00c0818:   e1a04000    mov r4, r0
  195259 c00c081c:   e1a05001    mov r5, r1
  195260 c00c0820:   e2806008    add r6, r0, #8
  195261 c00c0824:   e1a00006    mov r0, r6
  195262 c00c0828:   e1a07002    mov r7, r2
  195263 c00c082c:   eb003492    bl  c00cda7c <path_get>

阅读这一段汇编代码,可以得到lr = c00d2214检索内核反汇编文件,以此类推

最终分析得到的和oops后面打印的信息肯定一样

[<bf0002dc>] (sixth_drv_open [buttons]) from [<c00c8a60>] (chrdev_open+0xb0/0x170)
[<c00c8a60>] (chrdev_open) from [<c00c09f0>] (do_dentry_open+0x1dc/0x380)
[<c00c09f0>] (do_dentry_open) from [<c00d2214>] (path_openat+0x504/0xf4c)
[<c00d2214>] (path_openat) from [<c00d3efc>] (do_filp_open+0x6c/0xe0)
[<c00d3efc>] (do_filp_open) from [<c00c1ef8>] (do_sys_open+0x128/0x1f4)
[<c00c1ef8>] (do_sys_open) from [<c0009000>] (ret_fast_syscall+0x0/0x50)

总结:合理的运用调试技术,快速解决BUG。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值