Linux运行错误分析之根据内核打印的段错误信息分析

先给自己打个广告,本人的微信公众号正式上线了,搜索:张笑生的地盘,主要关注嵌入式软件开发,股票基金定投,足球等等,希望大家多多关注,有问题可以直接留言给我,一定尽心尽力回答大家的问题
在这里插入图片描述
本系列文章是为记录在学习韦东山老师的嵌入式开发教程中的课程笔记,并整理一个比较详细的课堂笔记,方便一起学习的同学们参考。
如果还没有购买韦老师的教学视频,或者不知道去哪里购买的,我这里给大家一个小程序链接
![在这里插入图片描述](https://img-blog.csdnimg.cn/20191022081505705.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6y9ibG9nLmNzZG4ubmV0L215c2VsZnpoYW5namk=,size_16,color_FFFFFF,t_70
一 why
我们写好一个驱动程序之后,肯定需要调试驱动程序,调试过程不可避免会出现错误,那么我们如何分析导致这个错误的原因呢?本文就为了分析由于某种错误,导致内核崩溃,根据打印出的段错误信息,分析找到我们驱动程序错误之处。
二 how
a. 错误背景,我们的某个驱动程序运行之后,内核打印的段错误如下:

Unable to handle kernel paging request at virtual address 56000050
pgd = c3acc000
[56000050] *pgd=00000000
Internal error: Oops: 5 [#1] ARM
Modules linked in: first_drv(O)
CPU: 0    Tainted: G           O  (3.4.2 #13)
PC is at first_drv_open+0x18/0x40 [first_drv]
LR is at first_drv_open+0xc/0x40 [first_drv]
pc : [<bf000018>]    lr : [<bf00000c>]    psr: 60000013
sp : c3809df8  ip : bf00043c  fp : 00000000
r10: c34093b8  r9 : 00000002  r8 : c380def0
r7 : c3842a20  r6 : c3abde88  r5 : c3a244a0  r4 : 00000000
r3 : c03dc018  r2 : 56000050  r1 : c03f43e8  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: c000717f  Table: 33acc000  DAC: 00000015
Process my_test (pid: 906, stack limit = 0xc3808270)
Stack: (0xc3809df8 to 0xc380a000)
9de0:                                                       00000000 c00901d4
9e00: c3409f38 00000000 c3809ef8 c3abde88 c3842a20 c3650f18 c0090150 c008ba68
9e20: c3809ef8 c3842a20 00000000 c381c8a0 c389f000 00000002 c34093b8 c008bc04
9e40: c381c8a0 00000000 c3abde88 c3abde88 c3809ef8 00000000 c3809ebc c0099b54
9e60: c389f005 c00775f4 000000b7 00000026 c34093b8 c3abde88 c03e16f8 c3809ef8
9e80: ffffff9c c3809f78 c3808000 c389f000 c3808000 c3809ebc be82dd74 c0099fcc
9ea0: c3809ec4 c0077d3c c3acedb8 00000028 c3a6bde0 c3808000 b6eea0dc c380def0
9ec0: c3650f18 00000000 b6eea0dc 00000001 c3809f78 c3809ef8 ffffff9c c389f000
9ee0: c3808000 c389f000 be82dd74 c009a378 00000041 c3809fb0 c380def0 c3650f18
9f00: 002b0d71 00000003 c389f005 c380d450 c3401318 c3abde88 00000101 00000000
9f20: 00000000 00000000 c3842ca0 0000001c 00000000 000b6f0d 00000003 c381dd40
9f40: c381dd48 c381dd44 00000000 00000002 00000000 00000000 00000001 00000002
9f60: ffffff9c 00000000 00000003 c008b684 be82da5c c007e2c0 00000002 c0070000
9f80: 00000026 00000100 b6f10000 00008468 00000000 00008328 00000005 c0015448
9fa0: 00000000 c00152a0 00008468 00000000 000084ec 00000002 be82decc 00000001
9fc0: 00008468 00000000 00008328 00000005 00000000 00000000 b6f10000 be82dd74
9fe0: 00000000 be82dd58 00008404 b6e7635c 60000010 000084ec 00000000 00000000
[<bf000018>] (first_drv_open+0x18/0x40 [first_drv]) from [<c00901d4>] (chrdev_o                                             pen+0x84/0x134)
[<c00901d4>] (chrdev_open+0x84/0x134) from [<c008ba68>] (__dentry_open+0x1fc/0x                                             2d0)
[<c008ba68>] (__dentry_open+0x1fc/0x2d0) from [<c008bc04>] (nameidata_to_filp+0                                             x60/0x68)
[<c008bc04>] (nameidata_to_filp+0x60/0x68) from [<c0099b54>] (do_last+0x310/0x6                                             cc)
[<c0099b54>] (do_last+0x310/0x6cc) from [<c0099fcc>] (path_openat+0xbc/0x380)
[<c0099fcc>] (path_openat+0xbc/0x380) from [<c009a378>] (do_filp_open+0x30/0x84                                             )
[<c009a378>] (do_filp_open+0x30/0x84) from [<c008b684>] (do_sys_open+0xf4/0x1b0                                             )
[<c008b684>] (do_sys_open+0xf4/0x1b0) from [<c00152a0>] (ret_fast_syscall+0x0/0                                             x2c)
Code: eb4b2d89 e59fc028 e3a00000 e59c2000 (e5923000)
---[ end trace 5e576730776b9c9e ]---

b. 分析上面的段错误
(1)访问出错

Unable to handle kernel paging request at virtual address 56000050
内核使用56000050地址来访问时发生了错误

(2)出错指令地址

pgd = c3acc000
[56000050] *pgd=00000000
Internal error: Oops: 5 [#1] ARM
Modules linked in: first_drv(O)
CPU: 0    Tainted: G           O  (3.4.2 #13)
PC is at first_drv_open+0x18/0x40 [first_drv]
PC就是发生错误时指令的地址,0x18是该指令的偏移,0x40是该函数大小
大多时候,PC值只会给出一个地址,不会指示说在哪个函数里面

(3)出错时LR寄存器的值

LR is at first_drv_open+0xc/0x40 [first_drv]
LR寄存器的值

(4)执行这条指令发生错误时,各个寄存器的值

pc : [<bf000018>]    lr : [<bf00000c>]    psr: 60000013
sp : c3809df8  ip : bf00043c  fp : 00000000
r10: c34093b8  r9 : 00000002  r8 : c380def0
r7 : c3842a20  r6 : c3abde88  r5 : c3a244a0  r4 : 00000000
r3 : c03dc018  r2 : 56000050  r1 : c03f43e8  r0 : 00000000
执行这条指令发生错误时,各个寄存器的值

(5)发生错误时当前进程

Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: c000717f  Table: 33acc000  DAC: 00000015
Process my_test (pid: 906, stack limit = 0xc3808270)
发生错误时当前进程的名称是my_test

(6)发生错误时的栈信息

栈
Stack: (0xc3809df8 to 0xc380a000)
9de0:                                                       00000000 c00901d4
9e00: c3409f38 00000000 c3809ef8 c3abde88 c3842a20 c3650f18 c0090150 c008ba68
9e20: c3809ef8 c3842a20 00000000 c381c8a0 c389f000 00000002 c34093b8 c008bc04
9e40: c381c8a0 00000000 c3abde88 c3abde88 c3809ef8 00000000 c3809ebc c0099b54
9e60: c389f005 c00775f4 000000b7 00000026 c34093b8 c3abde88 c03e16f8 c3809ef8
9e80: ffffff9c c3809f78 c3808000 c389f000 c3808000 c3809ebc be82dd74 c0099fcc
9ea0: c3809ec4 c0077d3c c3acedb8 00000028 c3a6bde0 c3808000 b6eea0dc c380def0
9ec0: c3650f18 00000000 b6eea0dc 00000001 c3809f78 c3809ef8 ffffff9c c389f000
9ee0: c3808000 c389f000 be82dd74 c009a378 00000041 c3809fb0 c380def0 c3650f18
9f00: 002b0d71 00000003 c389f005 c380d450 c3401318 c3abde88 00000101 00000000
9f20: 00000000 00000000 c3842ca0 0000001c 00000000 000b6f0d 00000003 c381dd40
9f40: c381dd48 c381dd44 00000000 00000002 00000000 00000000 00000001 00000002
9f60: ffffff9c 00000000 00000003 c008b684 be82da5c c007e2c0 00000002 c0070000
9f80: 00000026 00000100 b6f10000 00008468 00000000 00008328 00000005 c0015448
9fa0: 00000000 c00152a0 00008468 00000000 000084ec 00000002 be82decc 00000001
9fc0: 00008468 00000000 00008328 00000005 00000000 00000000 b6f10000 be82dd74
9fe0: 00000000 be82dd58 00008404 b6e7635c 60000010 000084ec 00000000 00000000

(7)发生错误时的回溯信息,也就是函数调用过程,倒过来看函数调用过程

Backtrace: 回溯,倒过来看,想要使能有这个回溯信息的话,需要在menuconfig中配置并选择上
[<bf000018>] (first_drv_open+0x18/0x40 [first_drv]) from [<c00901d4>] (chrdev_o                                             pen+0x84/0x134)
[<c00901d4>] (chrdev_open+0x84/0x134) from [<c008ba68>] (__dentry_open+0x1fc/0x                                             2d0)
[<c008ba68>] (__dentry_open+0x1fc/0x2d0) from [<c008bc04>] (nameidata_to_filp+0                                             x60/0x68)
[<c008bc04>] (nameidata_to_filp+0x60/0x68) from [<c0099b54>] (do_last+0x310/0x6                                             cc)
[<c0099b54>] (do_last+0x310/0x6cc) from [<c0099fcc>] (path_openat+0xbc/0x380)
[<c0099fcc>] (path_openat+0xbc/0x380) from [<c009a378>] (do_filp_open+0x30/0x84                                             )
[<c009a378>] (do_filp_open+0x30/0x84) from [<c008b684>] (do_sys_open+0xf4/0x1b0                                             )
[<c008b684>] (do_sys_open+0xf4/0x1b0) from [<c00152a0>] (ret_fast_syscall+0x0/0                                             x2c)
Code: eb4b2d89 e59fc028 e3a00000 e59c2000 (e5923000)
---[ end trace 5e576730776b9c9e ]---

do_sys_open ---> do_filp_open ---> path_openat ---> do_last ---> nameidata_to_filp ---> __dentry_open ---> chrdev_open ---> first_drv_open

需要注意的是,回溯信息不一定都能够打印出来,我们需要配置使能.config中的一个和FRAME宏相关的配置选项。
c. 如何分析呢
(1)根据PC值确定该指令属于内核还是外加的模块
由打印信息可知,PC=0xbf000018,执行这条指令后崩溃,那么它属于什么的地址?是内核还是通过insmod加载的驱动程序的地址?
1.1 先判断是否属于内核的地址: 看system.map确定内核的函数地址范围: c0104000 ~ c05d11e8

vi System.map    //System.map在/work/system/linux-3.4.2目录下面
键盘按 shift + g : 跳转到文件最后一行
键盘按下 小写 gg : 跳转到文件第一行

显然0xbf000018不在这个地址范围内,则它属于insmod加载的驱动程序
(2) 假设是insmod加载的驱动程序,怎么确定是哪一个驱动程序?
当系统崩溃时,先看看加载的驱动程序的函数地址范围

# cat /proc/kallsyms      (内核函数、加载的函数地址,在板子上执行这个命令)
or
# cat /proc/kallsyms > kallsylog.txt

从这些信息里找到一个相近的地址,这个地址 ≥ 0xbf000018
比如找到了:bf000000 t first_drv_open [first_drv]
(3) 找到了 first_drv.ko,如何确定函数的哪一行语句出错
在PC上反汇编它: arm-linux-objdump -D first_drv.ko > first_drv.dis, 在dis文件里找到first_drv_open

               					 ko文件里             insmod之后
    00000000 <first_drv_open>:       bf000000 t first_drv_open	[first_drv]
    00000018                         			pc = 0xbf000018

所以是在执行函数first_drv_open偏移地址为0x18的指令时,出了错误,那么函数first_drv_open偏移地址为0x18的指令是什么呢,我们打开first_drv.dis,如下

00000000 <first_drv_open>:
   0:	e92d4010 	push	{r4, lr}
   4:	e59f002c 	ldr	r0, [pc, #44]	; 38 <first_drv_open+0x38>
   8:	ebfffffe 	bl	0 <printk>
   c:	e59fc028 	ldr	ip, [pc, #40]	; 3c <first_drv_open+0x3c>
  10:	e3a00000 	mov	r0, #0	; 0x0
  14:	e59c2000 	ldr	r2, [ip]
  18:	e5923000 	ldr	r3, [r2]     **//这里出错**
  1c:	e3c33c3f 	bic	r3, r3, #16128	; 0x3f00
  20:	e5823000 	str	r3, [r2]
  24:	e59c1000 	ldr	r1, [ip]
  28:	e5913000 	ldr	r3, [r1]
  2c:	e3833c15 	orr	r3, r3, #5376	; 0x1500
  30:	e5813000 	str	r3, [r1]
  34:	e8bd8010 	pop	{r4, pc}

那么此时r2的值是什么呢,根据上面的执行指令发生错误时,各个寄存器的值。r2=56000050

pc : [<bf000018>]    lr : [<bf00000c>]    psr: 60000013
sp : c3809df8  ip : bf00043c  fp : 00000000
r10: c34093b8  r9 : 00000002  r8 : c380def0
r7 : c3842a20  r6 : c3abde88  r5 : c3a244a0  r4 : 00000000
r3 : c03dc018  r2 : 56000050  r1 : c03f43e8  r0 : 00000000

显然这条错误时因为访问地址56000050时出的错误,然后我们分析源代码,发现如下

gpfcon = (volatile unsigned long *)0x56000050;

应该修改为

gpfcon = (volatile unsigned long *)ioremap(0x56000050, 16);
  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值