上篇文章:ARM Linux 系统稳定性分析入门及渐进 4 – (RT-Thread)栈分类
1.1 Hung Task 原理
创建一个内核监测线程(khungtaskd) 循环(每隔CONFIG_DEFAULT_HUNG_TASK_TIMEOUT 时间) 监测处于 D状态 的每一个进程(任务),统计它们在两次检测之间的调度次数,如果发现有任务在两次监测之间没有发生任何的调度,则可判断该进程一直处于D状态,很有可能已经死锁,因此触发报警日志打印,输出进程的基本信息,栈回溯以及寄存器保存信息以供内核开发人员定位。如果配置了 hung_task_panic
(proc或内核启动参数),则直接发起 panic。
D状态:线程的一种等待状态 TASK_UNINTERRUPTIBLE,该种状态下进程不接收信号,只能通过 wake_up
唤醒。
可能原因
内核线程使用的 completion, mutex, wait even等同步机制,且超过CONFIG_DEFAULT_HUNG_TASK_TIMEOUT 没被唤醒, 比如嵌套使用 mutex 锁导致内核某进程/线程长期处于D状态,无法唤醒。
说明:
- 必须是内核线程(用户态任务无法触发);但如果某用户态任务调度到内核态,内核态处理出现D,会触发。
- 默认 hungtask 是不会 panic 和挂掉,内核只是会打印 hung 住的堆栈,因为内核无法判断 hung 住是主观故意为之还是异常,所以内核认为此异常不必须挂掉。
1.2 代码分析
linux/kernel/hung_task.c
static int __init hung_task_init(void)
{
atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
/* disable hung task detector on suspend */
pm_notifier(hungtask_pm_notify, 0);
watchdog_task = kthread_run(watchdog, NULL, "khungtaskd");
return 0;
}
subsys_initcall(hung_task_init);
- 注册 panic 通知链,在 panic 时执行相关操作;
- 注册 suspend 通知链, 在系统 susupend 的时候不在做检查;
- 初始化一个内核线程
watchdog
来检测系统中是否存在D状态超过120s的进程。
1.2.1 内核线程处理:watchdog
/*
* kthread which checks for tasks stuck in D state
*/
static int watchdog(void *dummy)
{
unsigned long hung_last_checked = jiffies;
set_user_nice(current, 0); --->(1)
for ( ; ; ) { --->(2)
unsigned long timeout = sysctl_hung_task_timeout_secs;
long t = hung_timeout_jiffies(hung_last_checked, timeout);
if (t <= 0) { --->(3)
if (!atomic_xchg(&reset_hung_task, 0) && --->(4)
!hung_detector_suspended)
check_hung_uninterruptible_tasks(timeout); --->(5)
hung_last_checked = jiffies;
continue;
}
schedule_timeout_interruptible(t); --->(6)
}
return 0;
}
(1) 设置当前 khungtaskd
内核线程的 nice 为 0,即普通优先级,为了不影响业务运行;
(2) 死循环进行检测;
(3) 离上次检测是否超过120S,t 大于 0 表示 khungtaskd 线程还需要睡眠 t 秒, t 小于0 表示已经超过120 秒 khungtaskd 线程没被执行了;
(4) 将0赋值过去,并返回旧值;
(5) D状态进程检测;
(6) 设置为khungtaskd内核线程为TASK_INTERRUPTIBLE状态,并在 t jiffies后唤醒。
/*
* Check whether a TASK_UNINTERRUPTIBLE does not get woken up for
* a really long time (120 seconds). If that happens, print out
* a warning.
*/
static void check_hung_uninterruptible_tasks(unsigned long timeout)
{
int max_count = sysctl_hung_task_check_count; --->(1)
unsigned long last_break = jiffies;
struct task_struct *g, *t;
/*
* If the system crashed already then all bets are off,
* do not report extra hung tasks:
*/
if (test_taint(TAINT_DIE) || did_panic) --->(2)
return;
hung_task_show_lock = false;
rcu_read_lock();
for_each_process_thread(g, t) {
if (!max_count--)
goto unlock;
if (time_after(jiffies, last_break + HUNG_TASK_LOCK_BREAK)) {
if (!rcu_lock_break(g, t))
goto unlock;
last_break = jiffies;
}
/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
if (t->state == TASK_UNINTERRUPTIBLE)
check_hung_task(t, timeout); --->(3)
}
unlock:
rcu_read_unlock();
if (hung_task_show_lock)
debug_show_all_locks();
if (hung_task_call_panic) {
trigger_all_cpu_backtrace();
panic("hung_task: blocked tasks");
}
}
(1) hung task检测的最大进程数,默认为最大的进程号, int __read_mostly sysctl_hung_task_check_count = PID_MAX_LIMIT
;
(2) 如果系统已经处于crash状态了,就不在报hung task了;
(3) 检测D状态的进程是否发生hung_task。
static void check_hung_task(struct task_struct *t, unsigned long timeout)
{
unsigned long switch_count = t->nvcsw + t->nivcsw;
/*
* Ensure the task is not frozen.
* Also, skip vfork and any other user process that freezer should skip.
*/
if (unlikely(t->flags & (PF_FROZEN | PF_FREEZER_SKIP)))
return;
/*
* When a freshly created task is scheduled once, changes its state to
* TASK_UNINTERRUPTIBLE without having ever been switched out once, it
* musn't be checked.
*/
if (unlikely(!switch_count))
return;
if (switch_count != t->last_switch_count) {
t->last_switch_count = switch_count;
return;
}
trace_sched_process_hang(t);
if (sysctl_hung_task_panic) {
console_verbose();
hung_task_show_lock = true;
hung_task_call_panic = true;
}
/*
* Ok, the task did not get scheduled for more than 2 minutes,
* complain:
*/
if (sysctl_hung_task_warnings) {
if (sysctl_hung_task_warnings > 0)
sysctl_hung_task_warnings--;
pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
t->comm, t->pid, timeout);
pr_err(" %s %s %.*s\n",
pr_err(" %s %s %.*s\n",
print_tainted(), init_utsname()->release,
(int)strcspn(init_utsname()->version, " "),
init_utsname()->version);
pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
" disables this message.\n");
sched_show_task(t);
hung_task_show_lock = true;
}
touch_nmi_watchdog();
}
TODO
上篇文章:ARM Linux 系统稳定性分析入门及渐进 4 – (RT-Thread)栈分类
推荐阅读:
https://www.cnblogs.com/wuchanming/p/4907562.html
https://blog.csdn.net/weixin_28949049/article/details/116691319