D状态检测
其核心思想为创建一个内核监测进程循环监测处于D状态的每一个进程(任务)。
内核配置:CONFIG_DETECT_HUNG_TASK
Kernel hacking --->
[*] Detect Hung Tasks
(120) Default timeout for hung task detection (in seconds) (NEW)
[ ] Panic (Reboot) On Hung Tasks (NEW)
进程进入D状态时间超过120秒后打印
INFO: task sync:16015 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
sync D c0512378 0 16015 1807 0x00000000
[] (__schedule+0x1d0/0x414) from [] (io_schedule+0x64/0x8c)
[] (io_schedule+0x64/0x8c) from [] (sleep_on_page+0x8/0x10)
[] (sleep_on_page+0x8/0x10) from [] (__wait_on_bit+0x78/0xb0)
[] (__wait_on_bit+0x78/0xb0) from [] (wait_on_page_bit+0xb4/0xbc)
[] (wait_on_page_bit+0xb4/0xbc) from [] (filemap_fdatawait_range+0xd4/0x130)
[] (filemap_fdatawait_range+0xd4/0x130) from [] (filemap_fdatawait+0x38/0x40)
[] (filemap_fdatawait+0x38/0x40) from [] (sync_inodes_sb+0x108/0x13c)
[] (sync_inodes_sb+0x108/0x13c) from [] (iterate_supers+0xa4/0xec)
[] (iterate_supers+0xa4/0xec) from [] (sys_sync+0x34/0x9c)
[] (sys_sync+0x34/0x9c) from [] (ret_fast_syscall+0x0/0x30)
关闭打印:echo 0 > /proc/sys/kernel/hung_task_timeout_secs
也可手动检测,top或者ps查看进程状态,然后使用命令cat /proc/pid/status查看状态:State: D (disk sleep),查看堆栈信息:cat /proc/pid/stack
R状态检测
Kernel hacking --->
-*- Kernel debugging
[*] Detect Hard and Soft Lockups
[ ] Panic (Reboot) On Soft Lockups
CONFIG_LOCKUP_DETECTOR=y
暂没有复现出R状态的卡住状态情况。
扩展
CONFIG_DEBUG_SPINLOCK=y 检测spinlock的未初始化使用等问题。配合NMI watchdog使用,能发现spinlock死锁。
CONFIG_DEBUG_MUTEXES=y 检测并报告mutex错误