引用
- Linux内核调试技术——进程D状态死锁检测
- Linux内核调试方法总结之死锁问题分析
- Linux 内核中的 Soft 和 Hard Lockup
- Linux死锁调试之hardlockup
- Linux死锁调试之softlockup
- Linux soft lockup分析
- softlockup检测(watchdog)原理(用于检测系统调度是否正常)
- Linux内核为什么会发生soft lockup?
- linux 内核Lockup机制浅析
- 朴英敏: 用crash工具分析Linux内核死锁的一次实战
- 【嵌入式Linux学习七步曲之第五篇 Linux内核及驱动编程】Linux内核抢占实现机制分析
- 理解Linux内核抢占模型(最透彻一篇)
- 宋宝华: 是谁关闭了Linux抢占,而抢占又关闭了谁?
- Softlockup detector and hardlockup detector (aka nmi_watchdog)
- linux sysctl files
一. Hung task
1.1 原理
核心思想为创建一个内核监测进程(khungtaskd)循环(每隔CONFIG_DEFAULT_HUNG_TASK_TIMEOUT时间)监测处于D状态的每一个进程(任务),统计它们在两次检测之间的调度次数,如果发现有任务在两次监测之间没有发生任何的调度,则可判断该进程一直处于D状态,很有可能已经死锁,因此触发报警日志打印,输出进程的基本信息,栈回溯以及寄存器保存信息以供内核开发人员定位。
D状态:线程的一种等待状态TASK_UNINTERRUPTIBLE,该种状态下进程不接收信号,只能通过wake_up唤醒。
内核源码: /kernel/hung_task.c
1.2 可能原因
相关routine或内核线程使用的completion, mutex, wait even等同步机制,且超过CONFIG_DEFAULT_HUNG_TASK_TIMEOUT没被唤醒。
1.完成量相关
/**
* wait_for_completion: - waits for completion of a task
* @x: holds the state of this particular completion
*
* This waits to be signaled for completion of a specific task. It is NOT
* interruptible and there is no timeout.
*
* See also similar routines (i.e. wait_for_completion_timeout()) with timeout
* and interrupt capability. Also see complete().
*/
void __sched wait_for_completion(struct completion *x)
{
wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
}
EXPORT_SYMBOL(wait_for_completion);
2. mutex锁相关
/**
* mutex_lock - acquire the mutex
* @lock: the mutex to be acquired
*
* Lock the mutex exclusively for this task. If the mutex is not
* available right now, it will sleep until it can get it.
*
* The mutex must later on be released by the same task that
* acquired it. Recursive locking is not allowed. The task
* may not exit without first unlocking the mutex. Also, kernel
* memory where the mutex resides must not be freed with
* the mutex still locked. The mutex must first be initialized
* (or statically defined) before it can be locked. memset()-ing
* the mutex to 0 is not allowed.
*
* (The CONFIG_DEBUG_MUTEXES .config option turns on debugging
* checks that will enforce the restrictions and will also do
* deadlock debugging)
*
* This function is similar to (but not equivalent to) down().
*/
void __sched mutex_lock(struct mutex *lock)
{
might_sleep();
if (!__mutex_trylock_fast(lock))
__mutex_lock_slowpath(lock);
}
EXPORT_SYMBOL(mutex_lock);
static noinline void __sched
__mutex_lock_slowpath(struct mutex *lock)
{
__mutex_lock(lock, TASK_UNINTERRUPTIBLE, 0, NULL, _RET_IP_);
}
3. wait event
#define __wait_event(wq_head, condition) \
(void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0, \
schedule())
/**
* wait_event - sleep until a condition gets true
* @wq_head: the waitqueue to wait on
* @condition: a C expression for the event to wait for
*
* The process is put to sleep (TASK_UNINTERRUPTIBLE) until the
* @condition evaluates to true. The @condition is checked each time
* the waitqueue @wq_head is woken up.
*
* wake_up() has to be called after changing any variable that could
* change the result of the wait condition.
*/
#define wait_event(wq_head, condition) \
do { \
might_sleep(); \
if (condition) \
break; \
__wait_event(wq_head, condition); \
} while (0)
1.3 相关sysctl设定
root@spc:/proc# sysctl -a | grep hung
kernel.hung_task_check_count = 4194304 //khungtaskd一次检测的最大线程数
kernel.hung_task_check_interval_secs = 0
kernel.hung_task_panic = 0 //是否将hung task检测结果转为panic
kernel.hung_task_timeout_secs = 120 //khungtaskd两次检测的最大timeout时间
kernel.hung_task_warnings = 10 //hung task警告信息的发送次数。
二. Soft lockup
A 'softlockup' is defined as a bug that causes the kernel to loop in
kernel mode for more than 20 seconds (see "Implementation" below for
details), without giving other tasks a chance to run. The current
stack trace is displayed upon detection and, by default, the system
will stay locked up. Alternatively, the kernel can be configured to
panic; a sysctl, "kernel.softlockup_panic", a kernel parameter,
"softlockup_panic" (see "Documentation/admin-guide/kernel-parameters.rst" for
details), and a compile option, "BOOTPARAM_SOFTLOCKUP_PANIC", are
provided for this.
2.1 原理
softlockup(watchdog)用于检测系统调度是否正常,即软锁的情况,当发生softlockup时,内核不能调度,但还能响应中断,对用户的表现可能为:能ping通,但无法登陆系统,无法进行正常操作。
基本原理为:为每个CPU启动一个内核线程(watchdog/x),此线程为优先级最高的实时线程,在该线程得到调度时,会更新相应的计数(时间戳),同时会启动定时器,当定时器到期时检查相应的时间戳,如果超过指定时间(默认为20s),都没有更新,则说明这段时间内都没有发生调度(因为此线程优先级最高),则打印相应告警或根据配置可以进入panic流程。
- soft lockup是针对单独CPU而不是整个系统的。
- soft lockup指的是发生的CPU上在20秒(默认)中没有发生调度切换。
内核源码:/kernel/watchdog.c --- kthread_create() 检测线程
2.2 可能原因
1. 在未开启内核抢占的linux上,有内核线程进入死循环。(CONFIG_PREEMPT=y)
==》利用相关内核API,适当释放自己,调度别人。sleep()或者cond_resched();
2. 软中断上发送死循环
3. 同一CPU上的过期timer积累到一定量,其回调函数的延时之和大于20秒,将会soft lockup。
2.3 相关sysctl设定
root@spc:/proc# sysctl -a | grep softlock
kernel.softlockup_all_cpu_backtrace = 0
kernel.softlockup_panic = 0 //是否开启softlockup为panic
kernel.watchdog_thresh = 10 //喂狗时间,2倍该时间后,单个CPU CORE没有发生进程切换,就会触发soft lockup
三. Hard lockup
A 'hardlockup' is defined as a bug that causes the CPU to loop in
kernel mode for more than 10 seconds (see "Implementation" below for
details), without letting other interrupts have a chance to run.
Similarly to the softlockup case, the current stack trace is displayed
upon detection and the system will stay locked up unless the default
behavior is changed, which can be done through a sysctl,
'hardlockup_panic', a compile time knob, "BOOTPARAM_HARDLOCKUP_PANIC",
and a kernel parameter, "nmi_watchdog"
(see "Documentation/admin-guide/kernel-parameters.rst" for details).
3.1 原理
利用了NMI(非屏蔽中断)不能被屏蔽的特性,也就是说系统即使中断死锁后,NMI也能被触发。如果中断正常,hrtimer会定期被调用而去更新计数变量hrtimer_interrupts, 而NMI中断函数后也会定期判断此值有没有变化,如果没有变化说明中断已经死掉。
内核源码:/kernel/watchdog.c -- watchdog_overflow_callback (NMI中断)
3.2 可能原因
1. 用spin_lock_irq/irqsave导致了死锁。
2. 关闭本地中断后,执行时间太长。
3.3 相关sysctl设定
root@spc:/proc# sysctl -a | grep hardlock
kernel.hardlockup_all_cpu_backtrace = 0
kernel.hardlockup_panic = 0