Linux内核异常调试工具与方法

常见调试通道(工具)

串口调试通道

通过串口shell查看/修改寄存器:

查看一个寄存器空间内容:
  • a.1 启动linux内核,进入shell界面

  • a.2 执行“cd /sys/class/sunxi_dump”

  • a.3 执行“echo xxxx > dump”,xxx为想要查看的地址,虚拟/物理地址都可以

  • a.4 执行“cat dump”,即可看到寄存器的内容

# cd /sys/class/sunxi_dump/
# echo 0x01f01400 >dump
# cat dump
0x00020200
#
查看一段寄存器空间的内容:
# echo 0x01f01400,0x01f01408 >dump
# cat dump
0x01f01400: 0x00020200 0x00000000 0x00000000
#
修改单个寄存器内容:
  • d.1 步骤与修改单个寄存器类似,只将a.3步骤的输入换成一组地址即可
  • d.2 执行“echo xxx1 xxx2,xxx3 xxx4 >write,其中xxx1 xxx3为地址,xxx2- xxx4为对应的内容,组与组之间用逗号分隔
# cd /sys/class/sunxi_dump/
# echo 0x01c20800 0x22222222,0x01c20804 0x22222222 >write
# cat write
reg         to_write    after_write
0x01c20800  0x22222222  0x22222222
0x01c20804  0x22222222  0x22222222
#

查看/修改axp寄存器

查看axp寄存器
  • a.1 执行echo xxx1 > /sys/class/axp/axp_reg 其中xxx1为寄存器地址
  • a.2 cat /sys/class/axp/axp_reg
/ # echo 0x8f > /sys/class/axp/axp_reg
/ # cat /sys/class/axp/axp_reg
REG[0x8f]=0x85
/ #
修改axp寄存器
  • b.1 往axp寄存器0x0f写入值0x55:echo 0x0f55 > /sys/class/axp/axp_reg
  • b.2 读出axp寄存器0x0f的值:echo 0x0f > /sys/class/axp/axp_reg ; cat /sys/class/axp/axp_reg
/ # echo 0x0f55 > /sys/class/axp/axp_reg
/ # echo 0x0f > /sys/class/axp/axp_reg
/ # cat /sys/class/axp/axp_reg
REG[0xf]=0x55
/ #

ADB调试通道

查看当前已经连接的设备:

> adb devices
List of devices attached
0402101560      device
  • 其中 0402101560 为其设备号,这个用于区别多设备相连接时的标识。如果此时需要连
    接一个设备请指定相应的设备号,比如:adb -s 0402101560 shell

通过 adb 进入到 shell 控制界面:

adb -s 0402101560 shell
/ #
  • 如果此时只有一个设备在线连接,可以省略设备号直接使用 adb shell 进入控制台。进入控制台
    后,与平时在 linux 下操作一样。

通过adb将数据推送到小机端

> adb push E:\temp_file\temp.txt /tmp/temp.txt
E:\temp_file\temp.txt: 1 file pushed. 0.0 MB/s (11 bytes in 0.010s)
>
  • 上述命令是把文件temp.txt 推送到小机端。

通过adb将数据从小机端推送到PC端

> adb pull /tmp/temp.txt E:\temp_file
/tmp/temp.txt: 1 file pulled. 0.0 MB/s (11 bytes in 0.010s)
>

adnroid环境super动态分区overlayfs remount方式推库,具体操作如下:

  • 推库的前提是需要unlock设备,如果设备没有Unlock,需要先进行设备解锁,具体步骤如下:
    设置系统开发者选项OEM unlocking打开选项,然后执行:
$ adb reboot bootloader
$ fastboot oem unlock
$ fastboot reboot
  • 等待重启,再次连接上adb
$ adb root
$ adb disable-verity
$ adb reboot
等重启完毕
$ adb root
$ adb remount
$ adb push xxx /system
$ adb push xxx /vendor
$ adb push xxx /produc

Jtag调试通道

  • 当整个操作系统挂死后,由于系统不再运行并且无法响应外界的信号, 则无法通过正常途径进行调
    试连接时,就可以通过 Jtag 进行调试。 Jtag 是一种硬件通过边界扫描进行调试的通道, 通过这个
    通道可以帮助我们查找到一些系统硬件状态, 了解当前系统信息, 从而推导出错的原因. 具体做法如
    下:

DS5 连接

略, 详见《DS-5 使用说明》

cast方式连接

csat初始化
  • cast连接前提,要保证ds-5硬件设备,并与板子正常连接。即灯必须要如下图所示才算正常
    在这里插入图片描述

  • a.打开csat控制台:

点击打开 “开始-> 所有程序->ARM DS-5->DS -5commad prompt”,出一个控制台窗口。

  • b. 在控制台内输入 csat,进入 csat 交互程序:
D:\Program Files\DS-5\bin>csat
###############################################
# CSAT - CoreSight Access Tool v2.0.7 #
# #
# [with Trace Commands v 2.0.1.1] #
# #
# Copyright 2007-2011 ARM Limited #
###############################################
CoreSight Component Data file read successfully.
%>
  • c. 连接相应的 D-Stream 设备
%>con usb
Attempting to connect to ...USB
Connected to:DSTREAM
Base H/W: V2 Rev C-00
TurboTAP Rev: 0.12
DSTREAM Probe V1 Rev B-00
Firmware: 4.11.0, Build 8
%>
  • 这里由于使用的是 USB 方式连接 D-Stream 设备, 而且主机只有一台 D-Stream, 因而使用 con
    usb, 如果使用网线连接 D-Stream, 或者有多台 D-Stream 连接, 则可以使用 reviscan 扫描出
    存在的设备, 然后再使用以下命令进行连接确定的某一台设备, 其中 xxx 为此设备的 IP 地址:
    %>con TCP:xxx.xxx.xxx

  • d. 扫描已经连接的设备

%>chain dev=auto clk=A
Jtag clock set to 50000000
ID:0 ARMCS-DP
%>
  • 从上面可以看到,已经扫描出一个 ID 为 0 的设备,其中 jtag clock 设置为 50000000

  • e. 打开设备

%>dvo 0
Open connection to device ID : 0x4BA00477, version 0x00000006
Msg returned with RVMOpenConn: ARM-DP Template using Rv-Msg.
%>
  • dvo 命令为打开一个设备,需要输入一个设备 ID 号,此 ID 是之前通过扫描获取得到的 ID 号,
    此时已经进入了 jtag 调试通道模式,接下来就可以通过此通道对设备进行读写的访问了,包括一
    些内存的访问。
csat查看一段寄存器空间的值
%>dmr 0 0x01c20800 8
0x01C20800 : 0x00000000 0x00000000 0x00000000 0x00000000
0x01C20810 : 0x00000000 0x55555555 0x00000005 0x00000000
  • dmr 为 dpmemread 命令的缩写,其后面要带一个 ID 号,紧接着是地址,最后为需要读取地址
    空间长度,这里为一个单元为字,如果需要以字节为单元读取,可以使用以下命令:
%>dmr 0 0x01c20800.B 32
0x01C20800 : 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x01C20810 : 0x00 0x00 0x00 0x00 0x55 0x00 0x00 0x00 0x05 0x00 0x00 0x00 0x00 0x00 0x00 0x00
csat修改一个寄存器的值
%>dmw 0 0x01c20800 0x22222222
Wrote 1 words to memory via AP0
%>
  • dmw 为 dpmemwrite 命令的缩写,其后面要带一个 ID 号,紧接着是地址,最后为需要写入的
    数据。
csat填充一段地址空间
%>dmf 0 0x40000000 0xffffffff 256
Filled pattern 0x4294967295, 256 times as words to memory via AP
%>
  • dmf 为 dpmemfill 命令的缩写, 其后面要带一个 ID 号, 紧接着是地址, 最后为需要写入的数据,
    及长度. 上述命令为使用 0xffffffff 数据从 0x40000000 空间开始填充 256 个字,即 1K 的空间
csat导出一段内存数据
%>dfs 0 0x40000000 256 MEMDP.BIN
%>
  • dfs 为 dpfsave 命令的缩写, 其后面要带一个 ID 号, 紧接着就是地址, 与所需要的地址范围长度,
    最后参数为内要保存的内容的文件名, 可以不写, 如空就是默认的文件名. 通常文件保存在程序所
    在安装目录, 如不带路径情况下. 此种方法对于异常死机串口无打印, 无法使用 DS-5 时十分方便,
    常用于导出内核日志 log_buf, 以及显示, 多媒体在内存中的数据.
导入一份二进制文件到内存
%>dfl 0 bin 0x8000 myfile.bin
Load a binary file into 0x8000 on AP 0
%>
  • dfl 为 dpfileload 命令的缩写, 其后面要带一个 ID 号, 紧接着就是文件的格式 bin, 然后是需要加
    载的到内存地址, 最后为需要加载的文件. 通过这个命令可以将这个文件导入到内存空间之中.

::: note
以上提到的地址都为物理地址, 因为 csat 只能识别物理地址, 如果要访问一个虚拟地址, 则需要自
行进行转换.
:::

Linux异常总结与分析

信息收集测试类

  • memtester: 用于内存压力测试, 通常可以用于校验DRAM数据是否正确.

  • lsof: 用于查看当前系统所打开的文件句柄,在分析程序耗尽文件描述符时非常有用.

  • free: 查看当前系统内存使用情况.

  • mpstat: 常用于监测当前系统调试及负载情况, 在调试开关核, 及功耗时十分有用

  • iostat: 监测当前系统IO负载情况.

  • vmstat: 用于分析当前系统虚拟内存使用情况, 经常被使用来分析系统内存泄漏.

  • top: 查看当前系统所有进程运行情况, 比如哪些进程占用率高.

  • tcpdump: 网络抓包工具, 常用于分析网络驱动, 及网络协议问题.

  • netcat: 强大的tcp/udp工具, 可用于构造各式各样的网络包,简单排查网络问题.

  • dd: 简单的文件读写工具, 可用于磁盘的裸读/写,常用于排查与IO相关的系统问题.

  • hexdump: 各种进制转换工具, 常用于二进制转换,同时可用于构造二进制文件.

Linux环境下的工具实在太多, 篇幅原因无法枚举, 实际工作中都是多种工具结合使用, 找到问题根源, 比如网络问题, 可以由ping, route, iptables之类结合使用. 再如内存问题可以由vmstat, free, slabtop组合完成, 在工作中需要及时积累各种工具的特性.

代码分析调试类:

  • objdump: 用查看目标文件或者可执行的目标文件的构成的gcc工具, 通常用于反汇编找到相应源码.

  • addr2line: 地址定位工具, 可用于定位一个指令执行地址所在的文件与行号.

  • readelf: ELF格式文件查看工具, 常用于查看一个应用程序的构成, 或者编译后文件各section代码的构成与偏移.

  • strace: 系统调用与信号跟踪工具, 用于分析一个应用程序的系统调用, 以及应用程序signal响应分析.

  • perf: Linux下系统性能分析功能, 可以用于分析当前系统的性能情况, 如Cache性能, 代码段执行占比等.

debugfs应用介绍

  • debugfs: 调试用的文件系统, 通常内核会预留一些调试接口来给用户使用,
    用户可以通过以下步骤命令使用.

挂载debugfs

a. 查看文件系统是否已经挂载:

/ # mount
/dev/mmcblk0p4 on / type ext4 (rw,noatime,nodiratime,nobarrier,noauto_da_alloc,errors=remount-ro,data=ordered)
proc on /proc type proc (rw,relatime)
devpts on /dev/pts type devpts (rw,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /dev/shm type tmpfs (rw,relatime,mode=777)
tmpfs on /tmp type tmpfs (rw,relatime)
tmpfs on /run type tmpfs (rw,nosuid,nodev,relatime,mode=755)
sysfs on /sys type sysfs (rw,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,relatime,size=479304k,nr_inodes=119826,mode=755)
none on /dev/pts type devpts (rw,relatime,mode=600,ptmxmode=000)
none on /sys/kernel/config type configfs (rw,relatime)
adb on /dev/usb-ffs/adb type functionfs (rw,relatime)
/ #

b. 如果没有挂载, 则可以通过以下的命令对此文件系统进行挂载操作:

/ # mount -t debugfs none /sys/kernel/debug/
/ # ls /sys/kernel/debug
asoc                fault_around_bytes  pm_qos
bdi                 gpio                pwm
block               hid                 regmap
clear_warn_once     iommu               regulator
clk                 ion                 sleep_time
debug_enabled       memblock            suspend_stats
device_component    mmc0                swiotlb
devices_deferred    mmc1                tracing
dma_buf             mmc2                usb
dynamic_debug       opp                 wakeup_sources
enc_test            pinctrl
extfrag             pm_genpd
/ #

使用debugfs查看当前系统预留内存

/ # cat /sys/kernel/debug/memblock/reserved
   0: 0x0000000040080000..0x0000000041d97fff  size:29792Kb
   1: 0x0000000041f00000..0x0000000041f2ffff  size:192Kb
   2: 0x0000000048000000..0x0000000048ffffff  size:16384Kb
   3: 0x000000007da44000..0x000000007e3fffff  size:9968Kb
   4: 0x000000007e57f000..0x000000007f3fffff  size:14852Kb
   5: 0x000000007f456bc0..0x000000007f55efc7  size:1057Kb
   6: 0x000000007f561140..0x000000007f56171f  size:1Kb
   7: 0x000000007f561740..0x000000007f5618c7  size:0Kb
   8: 0x000000007f561900..0x000000007f561c5f  size:0Kb
   9: 0x000000007f561c80..0x000000007f561c8f  size:0Kb
  10: 0x000000007f561cc0..0x000000007f561cc7  size:0Kb
  11: 0x000000007f561d00..0x000000007f561f16  size:0Kb
  12: 0x000000007f561f40..0x000000007f562156  size:0Kb
  13: 0x000000007f562180..0x000000007f562396  size:0Kb
  14: 0x000000007f5623c0..0x000000007f56241f  size:0Kb
  15: 0x000000007f562440..0x000000007f565507  size:12Kb
  16: 0x000000007f565520..0x000000007f56554b  size:0Kb
  17: 0x000000007f565550..0x000000007f56557b  size:0Kb
  18: 0x000000007f565580..0x000000007f5655b4  size:0Kb
  19: 0x000000007f5655b8..0x000000007f5655e3  size:0Kb
  20: 0x000000007f5655e8..0x000000007f565613  size:0Kb
  21: 0x000000007f565618..0x000000007f565643  size:0Kb
  22: 0x000000007f565648..0x000000007f565673  size:0Kb
  23: 0x000000007f565678..0x000000007f5656a3  size:0Kb
  24: 0x000000007f5656a8..0x000000007f5656d3  size:0Kb
  25: 0x000000007f5656d8..0x000000007f565703  size:0Kb
  26: 0x000000007f565708..0x000000007f565733  size:0Kb
  27: 0x000000007f565738..0x000000007f565763  size:0Kb
  28: 0x000000007f565768..0x000000007f565793  size:0Kb
  29: 0x000000007f565798..0x000000007f5657c3  size:0Kb
  30: 0x000000007f5657c8..0x000000007f5657f3  size:0Kb
  31: 0x000000007f5657f8..0x000000007f565823  size:0Kb
  32: 0x000000007f565828..0x000000007f565853  size:0Kb
  33: 0x000000007f565858..0x000000007f565883  size:0Kb
  34: 0x000000007f565888..0x000000007f5658b3  size:0Kb
  35: 0x000000007f5658b8..0x000000007f5658e3  size:0Kb
  36: 0x000000007f5658e8..0x000000007f565913  size:0Kb
  37: 0x000000007f565918..0x000000007f565943  size:0Kb
  38: 0x000000007f565948..0x000000007f565973  size:0Kb
  39: 0x000000007f565978..0x000000007f5659a3  size:0Kb
  40: 0x000000007f5659a8..0x000000007f5659d4  size:0Kb
  41: 0x000000007f5659d8..0x000000007f565a03  size:0Kb
  42: 0x000000007f565a08..0x000000007f565a33  size:0Kb
  43: 0x000000007f565a38..0x000000007f565a63  size:0Kb
  44: 0x000000007f565a68..0x000000007f565a93  size:0Kb
  45: 0x000000007f565a98..0x000000007f565ac6  size:0Kb
  46: 0x000000007f565ac8..0x000000007f565af6  size:0Kb
  47: 0x000000007f565af8..0x000000007f565b26  size:0Kb
  48: 0x000000007f565b28..0x000000007f565b56  size:0Kb
  49: 0x000000007f565b58..0x000000007f565b86  size:0Kb
  50: 0x000000007f565b88..0x000000007f565bb6  size:0Kb
  51: 0x000000007f565bb8..0x000000007f565be6  size:0Kb
  52: 0x000000007f565be8..0x000000007f565c16  size:0Kb
  53: 0x000000007f565c18..0x000000007fffffff  size:10856Kb
/ #

procfs应用介绍

  • procfs: 此文件系统用于内核反映当前系统硬件及正在运行进程的信息,挂载于/proc目录, 有些文件结点是可以进行写操作, 用于改变当前系统的运行状态.可以用于调试.

进程相关目录与文件, 结构如下:

/ # ls /proc/251
autogroup        fd               ns               stack
auxv             fdinfo           oom_adj          stat
cgroup           limits           oom_score        statm
clear_refs       map_files        oom_score_adj    status
cmdline          maps             pagemap          syscall
comm             mem              personality      task
coredump_filter  mountinfo        reclaim          time_in_state
cwd              mounts           root             timerslack_ns
environ          mountstats       smaps            wchan
exe              net              smaps_rollup
/ #

一般进程目录都是数字, 代表着其进程号. 例如上面的/proc/251/,
其中251就是它的进程号, 里面有丰富的文件内容. 以下为简要说明,

  • a.1 cwd: 指向当前进程运行目录的一个符号链接

  • a.2 environ: 当前进程环境变量列表, 通常可以用调试一个进程执行时,环境变量传递是否正确.

  • a.3 exe: 指向启动当前进程的可执行文件的符号链接, 通常可以于拷贝一个当进进程.

  • a.4 fd: 包含当前进程打开的每一个文件描述符(包括网络相关的socket),

  • 可以用于调试进程使用了哪些资源, 在调试dma-buf时十分有用.

  • a.5 maps: 当前进程关联到的每个可执行文件和库文件在内存中所对应的映射区域,以及其访问权限所组成的列表. 一般我们用于查看一个程序使用了哪些库,这些库所占的内存.

  • a.6 status: 当进程的状态信息, 包含信号表, 进程组号, 状态等.

  • a.7 sched: 进程的相关调度信息, 包括iowait, 调度虚拟时间vruntime,执行的开始时间, 迁移次数等等. 我们可以用这些信息来了解一些异常进程的状态,比如有些进程是否调度不及时.

  • a.8 stack: 进程的调用栈, 可以通过cat读取时当前的调用栈信息,对调试进程执行情况十分有用.

查看系统内存情况:

/ # cat /proc/meminfo
MemTotal:         976612 kB
MemFree:          873808 kB
MemAvailable:     883424 kB
Buffers:             812 kB
Cached:             4152 kB
SwapCached:            0 kB
Active:             4168 kB
Inactive:           1844 kB
Active(anon):       1104 kB
Inactive(anon):       24 kB
Active(file):       3064 kB
Inactive(file):     1820 kB
Unevictable:           8 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:          1120 kB
Mapped:             3116 kB
Shmem:                72 kB
KReclaimable:      17976 kB
Slab:              73888 kB
SReclaimable:      17976 kB
SUnreclaim:        55912 kB
KernelStack:        1216 kB
PageTables:          188 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:      488304 kB
Committed_AS:      28820 kB
VmallocTotal:   262930368 kB
VmallocUsed:        1536 kB
VmallocChunk:          0 kB
Percpu:             7968 kB
CmaTotal:           8192 kB
CmaFree:            6972 kB
/ #
  • Memtotal代表系统的总内存情况
  • MemFree则代表当前系统内存的空闲情况
  • Cached代表当前文件系统缓存情况

查看当前系统中断活动情况:

/ # cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
  4:     145552     144465     149458     264511     GIC-0  27 Level     arch_timer
  6:          0          0          0          0     GIC-0  83 Level     timer@3009000
  7:          0          0          0          0  wakeupgen 140 Level     arm-pmu
  8:          0          0          0          0  wakeupgen 141 Level     arm-pmu
  9:          0          0          0          0  wakeupgen 142 Level     arm-pmu
 10:          0          0          0          0  wakeupgen 143 Level     arm-pmu
 11:          0          0          0          0  wakeupgen 119 Level     ppu_interrupt
 13:          0          0          0          0  wakeupgen  66 Level     30f0000.iommu
 14:          0          0          0          0  wakeupgen  45 Level     3002000.dma-controller
 15:          0          0          0          0  wakeupgen 108 Level     7000000.rtc
 19:       2289          0          0          0  wakeupgen   0 Level     uart0
 21:        457          0          0          0  wakeupgen  41 Level     mmc2
 22:          0          0          0          0  wakeupgen  39 Level     mmc0
 23:         13          0          0          0  wakeupgen  40 Level     mmc1
 24:          0          0          0          0  wakeupgen   7 Level     twi0
 25:          0          0          0          0  wakeupgen   8 Level     twi1
 26:          0          0          0          0  wakeupgen   9 Level     twi2
 27:          0          0          0          0  wakeupgen  10 Level     twi3
 28:      36654          0          0          0  wakeupgen 113 Level     twi6
 29:          0          0          0          0  wakeupgen 114 Level     twi7
 40:          0          0          0          0  wakeupgen  22 Edge      sunxikbd
 44:          0          0          0          0  wakeupgen  33 Level     ehci_hcd:usb1
 45:          0          0          0          0  wakeupgen  34 Level     ohci_hcd:usb2
 48:     606390          0          0          0  wakeupgen  68 Level     dispaly
 51:          0          0          0          0  wakeupgen  94 Level     cedar_dev
 53:          0          0          0          0  wakeupgen  25 Level     audio jack irq
232:          1          0          0          0  sunxi_pio_edge 134 Edge      4020000.sdmmc cd
386:          1          0          0          0  sunxi-8i-nmi   0 Level     axp803_irq_chip
389:          0          0          0          0  axp803_irq_chip   2 Edge      usb out
390:          0          0          0          0  axp803_irq_chip   3 Edge      usb in
392:          0          0          0          0  axp803_irq_chip   5 Edge      ac out
393:          0          0          0          0  axp803_irq_chip   6 Edge      ac in
397:          0          0          0          0  axp803_irq_chip  10 Edge      charge over
398:          0          0          0          0  axp803_irq_chip  11 Edge      charging
401:          0          0          0          0  axp803_irq_chip  14 Edge      bat out
402:          0          0          0          0  axp803_irq_chip  15 Edge      bat in
404:          0          0          0          0  axp803_irq_chip  17 Edge      bat untemp work
406:          0          0          0          0  axp803_irq_chip  19 Edge      bat ovtemp work
408:          0          0          0          0  axp803_irq_chip  21 Edge      bat untemp chg
410:          0          0          0          0  axp803_irq_chip  23 Edge      bat ovtemp chg
411:          0          0          0          0  axp803_irq_chip  24 Edge      low warning2
412:          0          0          0          0  axp803_irq_chip  25 Edge      low warning1
424:          0          0          0          0  axp803_irq_chip  37 Edge      axp20x-pek-dbf
425:          0          0          0          0  axp803_irq_chip  38 Edge      axp20x-pek-dbr
IPI0:     35254      34501      34249      57347       Rescheduling interrupts
IPI1:         1          1          1          0       Function call interrupts
IPI2:         0          0          0          0       CPU stop interrupts
IPI3:         0          0          0          0       CPU stop (for crash dump) interrupts
IPI4:         0          0          0          0       Timer broadcast interrupts
IPI5:         1          0          0          0       IRQ work interrupts
IPI6:         0          0          0          0       CPU wake-up interrupts
Err:          0
/ #
  • 比如哪个CPU响应, 响应的次数是多少,中断在是什么, 同时也包括了中断类型.

  • 如果针对某个中断需要更详细的了解, 例如想了解94号中断, 那么可以通过以下命令查看:

/ # ls /proc/irq/51
affinity_hint            effective_affinity_list  smp_affinity_list
cedar_dev                node                     spurious
effective_affinity       smp_affinity
/ #
  • 在/proc/irq/目录下, 有当前所有中断号的文件目录结点, 当需要针对特定的某个中断进行了解时, 可以直接访问该目录.其中smp_affinity是该中断对CPU的亲和属性, 即此中断可以被哪个CPU响应, 例如:
/ # echo 1 > smp_affinity

查看当前系统内存页情况:

/ # cat /proc/pagetypeinfo
Page block order: 10
Pages per block:  1024

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
Node    0, zone    DMA32, type    Unmovable     23      3      0      0      2      3     10      7      1      0      0
Node    0, zone    DMA32, type      Movable      1      1      0      2      1      1      1      1      5      5    206
Node    0, zone    DMA32, type  Reclaimable      0      0      0      0      1      0      1      1      0      1      0
Node    0, zone    DMA32, type          CMA      1      0      1      1      1      1      0      1      0      1      1
Node    0, zone    DMA32, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0

Number of blocks type     Unmovable      Movable  Reclaimable          CMA   HighAtomic      Isolate
Node 0, zone    DMA32           20          229            5            2            0            0
/ #
  • 其中, Node代表当前内存所在结点, 由于嵌入式只有一个物理内存结点, 所以都是0, 而后面有内存页的属性分类, 比如可移动, 可回收等等.

当前系统timer运行情况:

/ # cat /proc/timer_list
Timer List Version: v0.8
HRTIMER_MAX_CLOCK_BASES: 8
now at 137224686398 nsecs

cpu: 0
 clock 0:
  .base:       (____ptrval____)
  .index:      0
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 #0: <(____ptrval____)>, tick_sched_timer, S:01
 # expires at 137244000000-137244000000 nsecs [in 19313602 to 19313602 nsecs]
 #1: <(____ptrval____)>, hrtimer_wakeup, S:01
 # expires at 137400195147-137400245147 nsecs [in 175508749 to 175558749 nsecs]
 #2: sched_clock_timer, sched_clock_poll, S:01
 # expires at 4398094511097-4398094511097 nsecs [in 4260869824699 to 4260869824699 nsecs]
 clock 1:
  .base:       (____ptrval____)
  .index:      1
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     5876752124 nsecs
active timers:
 clock 2:
  .base:       (____ptrval____)
  .index:      2
  .resolution: 1 nsecs
  .get_time:   ktime_get_boottime
  .offset:     0 nsecs
active timers:
 clock 3:
  .base:       (____ptrval____)
  .index:      3
  .resolution: 1 nsecs
  .get_time:   ktime_get_clocktai
  .offset:     5876752124 nsecs
active timers:
 clock 4:
  .base:       (____ptrval____)
  .index:      4
  .resolution: 1 nsecs
  .get_time:   ktime_get
  .offset:     0 nsecs
active timers:
 clock 5:
  .base:       (____ptrval____)
  .index:      5
  .resolution: 1 nsecs
  .get_time:   ktime_get_real
  .offset:     5876752124 nsecs
active timers:
 clock 6:
  .base:       (____ptrval____)
  .index:      6
  .resolution: 1 nsecs
  .get_time:   ktime_get_boottime
  .offset:     0 nsecs
active timers:
 clock 7:
  .base:       (____ptrval____)
  .index:      7
  .resolution: 1 nsecs
  .get_time:   ktime_get_clocktai
  .offset:     5876752124 nsecs
active timers:
  .expires_next   : 137244000000 nsecs
  .hres_active    : 1
  .nr_events      : 4985
  .nr_retries     : 2
  .nr_hangs       : 0
  .max_hang_time  : 0
  .nohz_mode      : 2
  .last_tick      : 136996000000 nsecs
  .tick_stopped   : 1
  .idle_jiffies   : 4294926546
  .idle_calls     : 11949
  .idle_sleeps    : 9952
  .idle_entrytime : 137224080856 nsecs
  .idle_waketime  : 137224049023 nsecs
  .idle_exittime  : 136992043314 nsecs
  .idle_sleeptime : 126234109476 nsecs
  .iowait_sleeptime: 6143086 nsecs
  .last_jiffies   : 4294926602
  .next_timer     : 137244000000
  .idle_expires   : 137244000000 nsecs
jiffies: 4294926602
  • 了解timer的运行情况, 对系统调度十分重要, tick是系统调度可以正常运行重要条件, 同时通过对于timer的了解可以知道有哪些周期任务正在运行, 每个timer的详细信息都可以在timer_list里查看到.

立即重启系统:

/ # echo b > /proc/sysrq-trigger
[  398.288437] sysrq: Resetting
[  398.293138] SMP: stopping secondary CPUs

系统内存分配信息:

/ # echo m > /proc/sysrq-trigger
[   63.840358] sysrq: Show Memory
[   63.843904] Mem-Info:
[   63.846503] active_anon:283 inactive_anon:6 isolated_anon:0
[   63.846503]  active_file:733 inactive_file:464 isolated_file:0
[   63.846503]  unevictable:2 dirty:4 writeback:0 unstable:0
[   63.846503]  slab_reclaimable:4418 slab_unreclaimable:13166
[   63.846503]  mapped:742 shmem:18 pagetables:68 bounce:0
[   63.846503]  free:219398 free_pcp:1059 free_cma:1755
[   63.882996] Node 0 active_anon:1056kB inactive_anon:24kB active_file:2932kB inactive_file:1856kB unevictable:8kB isolated(anon):0kB isolated(file):0kB mapped:2968kB dirty:16kB writeback:0kB shmem:72kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[   63.908248] DMA32 free:877284kB min:3932kB low:4912kB high:5892kB active_anon:1056kB inactive_anon:24kB active_file:2932kB inactive_file:1856kB unevictable:8kB writepending:16kB present:1048576kB managed:976612kB mlocked:0kB kernel_stack:1280kB pagetables:164kB bounce:0kB free_pcp:4368kB local_pcp:1348kB free_cma:6840kB
[   63.940084] lowmem_reserve[]: 0 0 0
[   63.944048] DMA32: 70*4kB (UM) 30*8kB (UME) 9*16kB (UME) 12*32kB (UMEC) 6*64kB (UMEC) 3*128kB (MEC) 2*256kB (U) 9*512kB (UMEC) 8*1024kB (UM) 7*2048kB (MEC) 207*4096kB (MC) = 877336kB
[   63.962311] 1217 total pagecache pages
[   63.966529] 0 pages in swap cache
[   63.970280] Swap cache stats: add 0, delete 0, find 0/0
[   63.976174] Free swap  = 0kB
[   63.979428] Total swap = 0kB
[   63.982691] 262144 pages RAM
[   63.985947] 0 pages HighMem/MovableOnly
[   63.990279] 17991 pages reserved
[   63.993922] 2048 pages cma reserved
/ #
  • 更详细的内容, 请参考Documentation/admin-guide/sysrq.rst文件.

查看当前系统内核的符号表

/ # cat /proc/kallsyms |more
ffffffc010080000 t _head
ffffffc010080000 T _text
ffffffc010080800 T do_undefinstr
ffffffc010080800 T _stext
ffffffc010080800 T __exception_text_start
ffffffc010080a14 T do_cp15instr
ffffffc010080afc T do_sysinstr
ffffffc010080b68 T do_mem_abort
ffffffc010080c10 T do_el0_irq_bp_hardening
ffffffc010080c2c T do_el0_ia_bp_hardening
ffffffc010080c8c T do_sp_pc_abort
ffffffc010080d08 T do_debug_exception
ffffffc010080eb8 t sun4i_handle_irq
ffffffc010080eb8 T __exception_text_end
ffffffc010080eb8 T __irqentry_text_start
ffffffc010080f6c t gic_handle_irq
ffffffc010081034 t gic_handle_irq
ffffffc0100811c4 T __irqentry_text_end
ffffffc0100811c8 T __do_softirq
ffffffc0100811c8 T __softirqentry_text_start
ffffffc010081610 T __entry_text_start
ffffffc010081610 T __softirqentry_text_end
ffffffc010081800 T vectors
ffffffc010081fd8 t __bad_stack
ffffffc0100820a4 t el0_sync_invalid
ffffffc010082214 t el0_irq_invalid
--More--
  • 当需要知道一个内核中函数的所对应的地址, 可以通过cat /proc/kallsyms命令查看, 例如, 在老化实验中, 如果一台机器已经挂死, 需要使用csat导出log_buf, 而当前手上只有其他同样固件的机器, 则可以通过下面的命令找对应的地址, 再自行转换成物理地址即可.

常见Linux Kernel异常类型

内核oops/缺页异常分析

Oops一词本意为惊讶, 一般内核出现Oops信息, 表明当前系统遇到重大的问题, 系统当前状态的一些信息就会被内核通oops抛出, 方便定位问题. 通过了解oops信息结构格式, 可以找到异常的问题点, 或者解决的方向.

普通的Oops主要包含信息有:

  • Oops类型

  • 全局页表(pdg)地址

  • 当前CPU编号及寄存器状态

  • 正常函数执行调用栈

  • 中断异常栈

  • 系统当前加载的模块

  • 进程信息

  • 下面写一个简单的内核模块,来验证如何分析一个内核oops错误。

#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>

static void creat_oops(void)
{
	printk("******start*****\n");
	*(int *)0 = 0;  //人为制造一个空指针
	printk("******end*****\n");
	
}

static int __init regulator_pmc_voltage_init(void)
{
	printk("oops module init\n");
	creat_oops();
	return 0;
}
subsys_initcall(regulator_pmc_voltage_init);

static void __exit regulator_pmc_voltage_exit(void)
{
	printk("regulator_pmc_voltage_exit goodbye\n");
}
module_exit(regulator_pmc_voltage_exit);

MODULE_LICENSE("GPL");
  • 编译,打包,烧写固件后,会出现如下log:
[    1.079767] oops module init
[    1.079834] ******start*****
[    1.079919] Unable to handle kernel write to read-only memory at virtual address 0000000000000000
[    1.088829] Mem abort info:
[    1.091521]   ESR = 0x96000045
[    1.094653]   EC = 0x25: DABT (current EL), IL = 32 bits
[    1.099899]   SET = 0, FnV = 0
[    1.102874]   EA = 0, S1PTW = 0
[    1.106050] Data abort info:
[    1.108854]   ISV = 0, ISS = 0x00000045
[    1.112724]   CM = 0, WnR = 1
[    1.115615] [0000000000000000] user address but active_mm is swapper
[    1.122028] Internal error: Oops: 96000045 [#1] PREEMPT SMP
[    1.127478] Modules linked in:
[    1.130515] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.99-00006-g02e5b77f5cd8-dirty #5
[    1.138656] Hardware name: sun50iw10 (DT)
[    1.142646] pstate: 80400005 (Nzcv daif +PAN -UAO)
[    1.147420] pc : regulator_pmc_voltage_init+0x2c/0x48
[    1.152439] lr : regulator_pmc_voltage_init+0x28/0x48
[    1.157462] sp : ffffffc01002bd90
[    1.160756] x29: ffffffc01002bd90 x28: 0000000000000000
[    1.166043] x27: ffffffc011293000 x26: ffffffc011267860
[    1.171329] x25: 0000000000000000 x24: ffffffc011267888
[    1.176616] x23: ffffffc010a2ee1c x22: ffffffc011266e50
[    1.181903] x21: ffffffc0110e0000 x20: ffffffc010d1b000
[    1.187189] x19: ffffffc011267000 x18: 000000000000000a
[    1.192476] x17: 000000004aeaa4e0 x16: 00000000b1cadfb9
[    1.197763] x15: 00000000000137da x14: ffffffc09002bab7
[    1.203049] x13: ffffffffffffffff x12: 0000000000000030
[    1.208336] x11: 00000000fffffffe x10: ffffffc01002bac5
[    1.213622] x9 : 0000000005f5e0ff x8 : ffffff803f5586e4
[    1.218909] x7 : 0000000000000038 x6 : 0000000000000004
[    1.224196] x5 : 0000000000000000 x4 : 0000000000000001
[    1.229483] x3 : ffffffc010d1b018 x2 : 0411eecd9ba58e00
[    1.234769] x1 : 0000000000000000 x0 : 0000000000000000
[    1.240056] Call trace:
[    1.242486]  regulator_pmc_voltage_init+0x2c/0x48
[    1.247168]  do_one_initcall+0x110/0x2c8
[    1.251067]  kernel_init_freeable+0x158/0x1f8
[    1.255400]  kernel_init+0x18/0x108
[    1.258866]  ret_from_fork+0x10/0x18
[    1.262428] Code: f0fff900 9126e800 97f3ffbc d2800000 (b900001f)
[    1.268509] ---[ end trace 06967c0c0dd91e52 ]---
[    1.273118] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.280712] SMP: stopping secondary CPUs
[    1.284637] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
  • pc指针指向出错指向的地址,另外Call trace展示了出错时程序的调用关系。首先观察出错函数regulator_pmc_voltage_init+0x2c/0x48,其中0x2c表示指令指针在该函数第0x2c字节处,该函数本身共0x48个字节。

当有符号表时可以使用gdb进行分析:

  • 使用gdb即可以看到出错附近的代码也会给出具体出错函数的路径。
$ ./out/gcc-linaro-5.3.1-2016.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gdb ./out/kernel/build/vmlinux
GNU gdb (Linaro_GDB-2016.05) 7.11.1.20160702-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-unknown-linux-gnu --target=aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./out/kernel/build/vmlinux...done.
(gdb) list *regulator_pmc_voltage_init+0x2c
0xffffffc010a2ee48 is in regulator_pmc_voltage_init (/home1/weidonghui/workspace/bsp/longan/kernel/linux-5.4/drivers/regulator/pmc.c:8).
3       #include <linux/kernel.h>
4
5       static void creat_oops(void)
6       {
7               printk("******start*****\n");
8               *(int *)0 = 0;
9               printk("******end*****\n");
10
11      }
12
(gdb)

当无符号表时可以使用decodecode进行分析:

  • 对于没有编译符号表的二进制文件,内核提供了一个非常好用的脚本,可以帮助快速定位问题,该脚本位于linux内核代码目录的scripts/decodecode,首先把出错log保存到一个txt文件中。
$ ARCH=arm64 CROSS_COMPILE=~/longan/out/gcc-linaro-5.3.1-2016.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu- ./scripts/decodecode < ./temp.txt
[ 1.262428] Code: f0fff900 9126e800 97f3ffbc d2800000 (b900001f)
All code
========
   0:   f0fff900        adrp    x0, 0xfffffffffff23000
   4:   9126e800        add     x0, x0, #0x9ba
   8:   97f3ffbc        bl      0xffffffffffcffef8
   c:   d2800000        mov     x0, #0x0                        // #0
  10:*  b900001f        str     wzr, [x0]               <-- trapping instruction

Code starting with the faulting instruction
===========================================
   0:   b900001f        str     wzr, [x0]
$
  • decodecode脚本把出错的oops日志信息转换成直观有用的汇编代码,并且告知出错具体是在哪个汇编语句中,这对于分析没有源代码的oops错误非常有用。

死锁问题

  • 死锁(deadlock)是指两个或多个进程因争夺资源而造成的互相等待的现象。例如进程A需要资源X,进程B需要资源Y,而双方都掌握有对方所需要的资源,且都不释放,这时会导致死锁。在内核开发中,时常要考虑并发设计,即使采用正确的编程思路,也不可避免发生死锁。在Linux内核中,常见的死锁有如下两种:

    • 递归死锁:例如在中断等延迟操作中使用了锁,和外面的锁构成了递归死锁。
    • AB-BA死锁:多个锁因处理不当而引发死锁,多个内核路径上的锁处理顺序不一致也会导致死锁。
  • Linux内核在2006年引入了死锁调试模块Lockdep,经过多年发展,Lockdep为内核开发者和驱动开发者提前发现死锁提供了方便。Lockdep跟踪每个锁的自身状态和各个锁之间的依赖关系,经过一系列的验证规则来确保锁之间依赖关系是正确的。

开启Lockdep

  • 要在Linux内核中使用Lockdep功能,需要打开以下选项:
    • CONFIG_LOCK_STAT=y
    • CONFIG_PROVE_LOCKING=y
    • CONFIG_DEBUG_LOCKDEP=y

AB-BA死锁例子(基于linux-5.4)

#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>

static DEFINE_SPINLOCK(hack_spinA);
static DEFINE_SPINLOCK(hack_spinB);


void hack_spinAB(void)
{
	printk("\nhack_lockdep:A-->B\n");
	spin_lock(&hack_spinA);
	spin_lock(&hack_spinB);
}


void hack_spinBA(void)
{
	printk("\nhack_lockdep:B-->A\n");
	spin_lock(&hack_spinB);
}


static int __init lockdep_test_init(void)
{
	printk("\nfigo: my lockdep module init\n");
	hack_spinAB();
	hack_spinBA();
	return 0;
}

static void __exit lockdep_test_exit(void)
{
	printk("\ngoodbye\n");
}

MODULE_LICENSE("GPL");
module_init(lockdep_test_init);
module_exit(lockdep_test_exit);
  • 上述死锁例子初始化两个spinlock,其中hack_spinAB()函数分别申请了hack_spinA锁和hack_spinB锁,hack_spinBA()函数要去申请hack_spinB锁。因为刚才锁hack_spinB已经被成功获取了且还没有释放,它会一直等待,而且它也被锁在hack_spinA的临界区里。

  • 重新编译内核后,在proc目录下会有lockdep,lockdep_chains和lockdep_stats上个文件节点,说明lockdep模块已经生效。运行产生如下log:

[    3.010121] sunxi_sid_init()485 - insmod ok
[    3.010191]
[    3.010191] figo: my lockdep module init
[    3.014149]
[    3.014149] hack_lockdep:A-->B
[    3.018919]
[    3.018919] hack_lockdep:B-->A
[    3.023168]
[    3.024541] ============================================
[    3.029827] WARNING: possible recursive locking detected
[    3.035117] 5.4.99-00006-g02e5b77f5cd8-dirty #7 Not tainted
[    3.040661] --------------------------------------------
[    3.045948] swapper/0/1 is trying to acquire lock:
[    3.050715] ffffffc01120e668 (hack_spinB){+.+.}, at: hack_spinBA+0x2c/0x34
[    3.057561]
[    3.057561] but task is already holding lock:
[    3.063367] ffffffc01120e668 (hack_spinB){+.+.}, at: hack_spinAB+0x38/0x44
[    3.070214]
[    3.070214] other info that might help us debug this:
[    3.076714]  Possible unsafe locking scenario:
[    3.076714]
[    3.082607]        CPU0
[    3.085034]        ----
[    3.087460]   lock(hack_spinB);
[    3.090580]   lock(hack_spinB);
[    3.093700]
[    3.093700]  *** DEADLOCK ***
[    3.093700]
[    3.099594]  May be due to missing lock nesting notation
[    3.099594]
[    3.106356] 2 locks held by swapper/0/1:
[    3.110254]  #0: ffffffc01120e620 (hack_spinA){+.+.}, at: hack_spinAB+0x30/0x44
[    3.117534]  #1: ffffffc01120e668 (hack_spinB){+.+.}, at: hack_spinAB+0x38/0x44
[    3.124814]
[    3.124814] stack backtrace:
[    3.129153] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.99-00006-g02e5b77f5cd8-dirty #7
[    3.137294] Hardware name: sun50iw10 (DT)
[    3.141281] Call trace:
[    3.143713]  dump_backtrace+0x0/0x16c
[    3.147350]  show_stack+0x24/0x30
[    3.150645]  dump_stack+0xd0/0x118
[    3.154025]  __lock_acquire+0xe04/0xe9c
[    3.157836]  lock_acquire+0x14c/0x1b8
[    3.161478]  _raw_spin_lock+0x4c/0x88
[    3.165116]  hack_spinBA+0x2c/0x34
[    3.168499]  lockdep_test_init+0x24/0x30
[    3.172398]  do_one_initcall+0x110/0x2c8
[    3.176299]  kernel_init_freeable+0x158/0x1f8
[    3.180629]  kernel_init+0x18/0x108
[    3.184097]  ret_from_fork+0x10/0x18
  • lockdep已经很清晰地显示了死锁发生的路径和发生时的函数的栈信息。结合gdb可以快速定位和解决问题。
$ ./out/gcc-linaro-5.3.1-2016.05-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gdb ./out/kernel/build/vmlinux
GNU gdb (Linaro_GDB-2016.05) 7.11.1.20160702-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=x86_64-unknown-linux-gnu --target=aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./out/kernel/build/vmlinux...done.
(gdb) list *hack_spinAB+0x30
0xffffffc010402908 is in hack_spinAB (/home1/weidonghui/workspace/bsp/longan/kernel/linux-5.4/include/linux/spinlock.h:338).
333             raw_spin_lock_init(&(_lock)->rlock);            \
334     } while (0)
335
336     static __always_inline void spin_lock(spinlock_t *lock)
337     {
338             raw_spin_lock(&lock->rlock);
339     }
340
341     static __always_inline void spin_lock_bh(spinlock_t *lock)
342     {
(gdb) list *hack_spinAB+0x38
0xffffffc010402910 is in hack_spinAB (/home1/weidonghui/workspace/bsp/longan/kernel/linux-5.4/drivers/regulator/lock_test_1.c:14).
9       void hack_spinAB(void)
10      {
11              printk("\nhack_lockdep:A-->B\n");
12              spin_lock(&hack_spinA);
13              spin_lock(&hack_spinB);
14      }
15
16
17      void hack_spinBA(void)
18      {
(gdb)

递归死锁问题例子(基于linux-4.9)

#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/freezer.h>
#include <linux/mutex.h>
#include <linux/delay.h>


static DEFINE_MUTEX(mutex_a);
static struct delayed_work delay_task;
static void lockdep_timefunc(unsigned long);
static DEFINE_TIMER(lockdep_timer, lockdep_timefunc, 0, 0);

static void lockdep_timefunc(unsigned long dummy)
{
	schedule_delayed_work(&delay_task, 10);
	mod_timer(&lockdep_timer, jiffies + msecs_to_jiffies(100));
}


static void lockdep_test_worker(struct work_struct *work)
{
	mutex_lock(&mutex_a);
	mdelay(300);		//处理一些事情,这里用mdelay代替
	mutex_unlock(&mutex_a);
}


static int lockdep_thread(void *nothing)
{
	set_freezable();
	set_user_nice(current, 0);
	while (!kthread_should_stop()) {
		mdelay(500);	//处理一些事情,这里用mdelay代替
		
		//遇到某些特殊情况,需要取消delay_task
		mutex_lock(&mutex_a);
		cancel_delayed_work_sync(&delay_task);
		mutex_unlock(&mutex_a);
	}
	return 0;
}


static int __init lockdep_test_init(void)
{
	struct task_struct *lock_thread;
	printk("figo: my lockdep module init\n");

	/*创建一个线程来处理某些事情*/
	lock_thread = kthread_run(lockdep_thread, NULL, "lockdep_test");

	/*创建一个delay worker*/
	INIT_DELAYED_WORK(&delay_task, lockdep_test_worker);

	/*创建一个定时器来模拟某些异步事件,比如中断等*/
	lockdep_timer.expires = jiffies + msecs_to_jiffies(500);
	add_timer(&lockdep_timer);
	return 0;
}

static void __exit lockdep_test_exit(void)
{
	printk("goodbye\n");
}

MODULE_LICENSE("GPL");
module_init(lockdep_test_init);
module_exit(lockdep_test_exit);
  • 首先创建一个内核线程lockdep_thread,用于周期性地处理某些事情,创建一个kworker来处理一些类似中断下半部的延迟操作,最后使用一个定时器来模拟异步事件(例如中断)。在lockdep_thread内核线程中,某些特殊情况下常常需要取消kworker。代码中首先申请了一个mutex_a互斥体锁,然后调用cancel_delayed_work_sync()函数取消kworker。另一方面,定时器在定时地调用kworker,并在回调函数lockdep_test_worker()函数中申请mutex_a互斥体锁。log如下:
[    3.229887] ======================================================
[    3.229889] [ INFO: possible circular locking dependency detected ]
[    3.229893] 4.9.191 #2 Not tainted
[    3.229895] -------------------------------------------------------
[    3.229898] kworker/0:1/32 is trying to acquire lock:
[    3.229919]  (mutex_a){+.+...}, at: [<ffffff80083040f4>] lockdep_test_worker+0x30/0x60
[    3.229921]
[    3.229921] but task is already holding lock:
[    3.229933]  ((&(&delay_task)->work)){+.+...}, at: [<ffffff80080b4858>] process_one_work+0x1a4/0x354
[    3.229935]
[    3.229935] which lock already depends on the new lock.
[    3.229935]
[    3.229937]
[    3.229937] the existing dependency chain (in reverse order) is:
[    3.229945]
[    3.229945] -> #1 ((&(&delay_task)->work)){+.+...}:
[    3.229953]        __lock_acquire+0x13a0/0x16f4
[    3.229958]        lock_acquire+0x8c/0xb4
[    3.229962]        flush_work+0x5c/0x224
[    3.229967]        __cancel_work_timer+0x128/0x1a0
[    3.229973]        cancel_delayed_work_sync+0x10/0x18
[    3.229977]        lockdep_thread+0x70/0x90
[    3.229982]        kthread+0xd4/0xdc
[    3.229988]        ret_from_fork+0x10/0x50
[    3.229995]
[    3.229995] -> #0 (mutex_a){+.+...}:
[    3.230000]        print_circular_bug+0x60/0x2b8
[    3.230004]        __lock_acquire+0x1098/0x16f4
[    3.230008]        lock_acquire+0x8c/0xb4
[    3.230016]        mutex_lock_nested+0x5c/0x390
[    3.230020]        lockdep_test_worker+0x30/0x60
[    3.230025]        process_one_work+0x210/0x354
[    3.230030]        worker_thread+0x288/0x3a8
[    3.230034]        kthread+0xd4/0xdc
[    3.230038]        ret_from_fork+0x10/0x50
[    3.230040]
[    3.230040] other info that might help us debug this:
[    3.230040]
[    3.230043]  Possible unsafe locking scenario:
[    3.230043]
[    3.230045]        CPU0                    CPU1
[    3.230046]        ----                    ----
[    3.230051]   lock((&(&delay_task)->work));
[    3.230056]                                lock(mutex_a);
[    3.230060]                                lock((&(&delay_task)->work));
[    3.230064]   lock(mutex_a);
[    3.230066]
[    3.230066]  *** DEADLOCK ***
[    3.230066]
[    3.230069] 2 locks held by kworker/0:1/32:
[    3.230081]  #0:  ("events"){.+.+..}, at: [<ffffff80080b4858>] process_one_work+0x1a4/0x354
[    3.230093]  #1:  ((&(&delay_task)->work)){+.+...}, at: [<ffffff80080b4858>] process_one_work+0x1a4/0x354
[    3.230095]
[    3.230095] stack backtrace:
[    3.230100] CPU: 0 PID: 32 Comm: kworker/0:1 Not tainted 4.9.191 #2
[    3.230103] Hardware name: sun50iw10 (DT)
[    3.230112] Workqueue: events lockdep_test_worker
[    3.230115] Call trace:
[    3.230122] [<ffffff8008089004>] dump_backtrace+0x0/0x240
[    3.230128] [<ffffff8008089258>] show_stack+0x14/0x1c
[    3.230134] [<ffffff80082b257c>] dump_stack+0xb0/0xe8
[    3.230139] [<ffffff80080e2ad0>] print_circular_bug+0x29c/0x2b8
[    3.230144] [<ffffff80080e50dc>] __lock_acquire+0x1098/0x16f4
[    3.230148] [<ffffff80080e5c00>] lock_acquire+0x8c/0xb4
[    3.230154] [<ffffff80085539cc>] mutex_lock_nested+0x5c/0x390
[    3.230159] [<ffffff80083040f4>] lockdep_test_worker+0x30/0x60
[    3.230165] [<ffffff80080b48c4>] process_one_work+0x210/0x354
[    3.230170] [<ffffff80080b5858>] worker_thread+0x288/0x3a8
[    3.230175] [<ffffff80080bae78>] kthread+0xd4/0xdc
[    3.230179] [<ffffff8008083180>] ret_from_fork+0x10/0x50
  • lockdep信息首先提示可能出现递归死锁(INFO: possible circular locking dependency detected),接下来提示“kworker/0:1/32”线程尝试去获取mutex_a互斥体锁,但是该锁已经被其他进程持有,持有该锁的进程是在&delay_task->work里。
  • 接下来的函数调用堆栈显示上述两个尝试去获取mutex_a锁的调用路径。
    (1)内核线程lockdep_thread首先成功获取了mutex_a互斥体锁,然后调用cancel_delayed_work_sync()函数取消kworker。注意cancel_delayed_work_sync()函数中会去调用flush操作和等待所有的kworker回调函数执行完成,然后才会调用mutex_unlock(&mutex_a)释放该锁。
[    3.229945] -> #1 ((&(&delay_task)->work)){+.+...}:
[    3.229953]        __lock_acquire+0x13a0/0x16f4
[    3.229958]        lock_acquire+0x8c/0xb4
[    3.229962]        flush_work+0x5c/0x224
[    3.229967]        __cancel_work_timer+0x128/0x1a0
[    3.229973]        cancel_delayed_work_sync+0x10/0x18
[    3.229977]        lockdep_thread+0x70/0x90
[    3.229982]        kthread+0xd4/0xdc
[    3.229988]        ret_from_fork+0x10/0x50

(2)kworker回调函数lockdep_test_worker()首先会尝试获取mutex_a互斥体锁,注意刚才内核线程lockdep_thread已经获取了mutex_a互斥体锁,并且一直在等待当前kworker回调函数执行完成,所以死锁发生了。

[    3.229995] -> #0 (mutex_a){+.+...}:
[    3.230000]        print_circular_bug+0x60/0x2b8
[    3.230004]        __lock_acquire+0x1098/0x16f4
[    3.230008]        lock_acquire+0x8c/0xb4
[    3.230016]        mutex_lock_nested+0x5c/0x390
[    3.230020]        lockdep_test_worker+0x30/0x60
[    3.230025]        process_one_work+0x210/0x354
[    3.230030]        worker_thread+0x288/0x3a8
[    3.230034]        kthread+0xd4/0xdc
[    3.230038]        ret_from_fork+0x10/0x50
  • 下面画出该死锁场景的CPU调用关系图,一目了然
CPU0CPU1
内核线程lockdep_thread
lock(mutex_a)
cancel_delayed_work_sync()
等待worker执行完成
delay worker回调函数
lock(mutex_a)尝试去获取锁
  • 0
    点赞
  • 15
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值