目录
1、实验目的:使用crash 解决自旋锁(spinlock)死锁问题
4.3、ps |grep RU 找到可执行状态线程,在使用bt命令查看栈回溯确定死锁线程
环境:arm64,Linux version 5.10.66
1、实验目的:使用crash 解决自旋锁(spinlock)死锁问题
实验程序如下,当程序编译成ko并使用insmod加载到设备后,串口输入 echo kdump-3 > /proc/dbug/dump 命令之后执行我们的测试程序,测试程序是典型的AB-BA死锁,等待20秒后 soft lockup 检测到某个cpu上20s没有发生调度。(spinlock死锁情况),调用panic触发kdump。kdump产生vmcore文件后使用crash命令来分析下自旋锁(spinlock)死锁问题。
2、实验步骤
前提:设备linux支持了kdump功能,支持方法参考此专题文章
a、编译测试模块:将上述源码编译成kdump.ko驱动模块,在设备串口中使用insmod命令加载此驱动到设备中。此时会生成 /proc/dbug/kdump 调试文件
b、触发panic:执行 echo kdump-4 > /proc/dbug/kdump 命令执行测试程序,测试程序会执行 panic 触发kdump机制,启动捕获内核,并在捕获内核中生成 /proc/vmcore 文件。
c、保存vmcore文件:执行 cd /proc;tar -czf /tmp/3588/vmcore.tar.gz ./vmcore 将捕获内核下的vmcore文件压缩并拷贝到u盘或者nfs挂载的目录中。
d、使用crash分析vmcore文件:执行 crash vmlinux vmcore 命令使用crash分析vmcore文件。
e、由于vmcore文件只会保留kdump.ko的代码部分,因此需要在crash中使用 mod加载kdump.ko 驱动模块的调试以及符号信息。这样执行 crash> dis -l kdump_proc_write 命令才会正确显示汇编代码对应的行号信息
mod -s kdump /kdump/demo/stack/kdump.ko
3、soft lockup 机制简述
soft lockup 机制的作用是检测cpu是否发生调度,如果某个cpu上20秒时间没有发生调度则认为有异常情况。
a、soft lockup 核心思想
为每个cpu启动一个内核线程(watchdog/x),此线程为优先级最高的实时线程,在该线程得到调度时,会更新相应的计数(时间戳),同时会启动定时器,当定时器到期时检查相应的时间戳,如果超过指定时间(默认为20s)都没有更新,则说明这段时间内都没有发生调度(因为此线程优先级最高),则打印相应告警或根据配置可以进入panic流程。
b、触发 soft lockup 机制的可能原因
(1)在未开启内核抢占的linux上,有内核线程进入死循环。(CONFIG_PREEMPT=y)
(2)线程等待自旋锁,等待时间超过 20 秒
(3)线程内部循环次数太多,程序执行时间超过20秒
c、soft lockup 机制的proc接口
/proc/sys/kernel/soft_watchdog /* soft lockup开关 */
/proc/sys/kernel/softlockup_all_cpu_backtrace
/proc/sys/kernel/softlockup_panic
/proc/sys/kernel/sysctl_writes_strict
4、spinlock 死锁问题分析
解决自旋锁的死锁问题,只需要找到持有锁和等待锁的线程即可,根据自旋锁的特点,等待锁的线程会进入死循环自旋等待锁释放,因此持有锁和等待锁的线程一定是可执行状态并且占用cpu正在运行,可执行状态(RU: R (TASK_RUNNING))并不意味着进程一定在运行中,也可以在运行队列里;这里提供3种思路查找死锁线程。
4.1、从cpu正在执行的线程中找死锁线程
spinlock 锁是自旋死循环检测,因此等待锁线程会一直占用着cpu,根据percpu变量 runqueues 来找到每个cpu正在运行什么线程即可。
a、percpu变量 runqueues 的定义
(1) struct rq runqueues 定义
X:\work\linux\linux-6.0\kernel\sched\core.c
DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
struct rq {
/* 三个不同的runqueue,供3个不同的调度器分别使用 */
struct cfs_rq cfs;
struct sched_entity *curr; /* 当前运行的进程的调度实体 struct task_struct *curtask->se */
struct rt_rq rt;
struct dl_rq dl;
/* 记录当前cpu正在执行的线程 */
struct task_struct __rcu *curr;
}
(2) struct task_struct __rcu *curr 成员赋值
通过下面函数赋值:
static inline int task_current(struct rq *rq, struct task_struct *p)
{
return rq->curr == p;
}
static inline int task_running(struct rq *rq, struct task_struct *p)
{
#ifdef CONFIG_SMP
return p->on_cpu;
#else
return task_current(rq, p);
#endif
}
(3)关于 struct cfs_rq cfs 成员中 curr 的简单说明
struct cfs_rq cfs;
struct sched_entity *curr; /* 当前运行的进程的调度实体 struct task_struct *curtask->se */
struct cfs_rq {
/* cfs调度算法的就绪队列,满足执行条件的线程都会链接在此红黑树上 */
struct rb_root_cached tasks_timeline;
/* 当前运行的进程的调度实体 struct task_struct *curtask->se */
struct sched_entity *curr;
};
/* 将调度实体插入红黑树中,红黑树的根为 cfs_rq->tasks_timeline */
static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
rb_add_cached(&se->run_node, &cfs_rq->tasks_timeline, __entity_less);
}
b、根据全局的percpu变量 runqueues 来解析出每个cpu正在运行的线程
根据全局percpu runqueues 结构体的定义,其中成员 cur 代表此cpu正在运行的线程结构体地址,即 struct task_struct 结构体地址。使用 p 命令查看每个cpu上正在运行的线程地址,黄色部分为cpu编号,红色部分为对应cpu上正在运行线程的 struct task_struct 结构体地址。
crash> p runqueues:0 |grep curr
curr = 0x0,
curr = 100,
curr = 0,
curr = 0xffffffc011da24c0 <init_task>,
crash>
crash> p runqueues:1 |grep curr
curr = 0x0,
curr = 100,
curr = 0,
curr = 0xffffff8100856c00,
crash>
crash> p runqueues:2 |grep curr
curr = 0xffffff81011792c0,
curr = 100,
curr = 0,
curr = 0xffffff8101179200,
crash> p runqueues:3 |grep curr
curr = 0xffffff810117a4c0,
curr = 100,
curr = 0,
curr = 0xffffff810117a400,
crash>
crash> p runqueues:4 |grep curr
curr = 0x0,
curr = 100,
curr = 0,
curr = 0xffffff810089a400,
crash> p runqueues:5 |grep curr
curr = 0x0,
curr = 100,
curr = 0,
curr = 0xffffff810089b600,
crash> p runqueues:6 |grep curr
curr = 0x0,
curr = 100,
curr = 0,
curr = 0xffffff810089c800,
crash> p runqueues:7 |grep curr
curr = 0x0,
curr = 100,
curr = 0,
curr = 0xffffff810089da00,
通过 struct 命令打印每个cpu上正在运行线程的名字,pid等信息。
crash> struct task_struct 0xffffffc011da24c0 |grep -E " comm|pid ="
pid = 0,
thread_pid = 0xffffffc011db0f10 <init_struct_pid>,
comm = "swapper/0\000\000\000\000\000\000",
crash>
crash> struct task_struct 0xffffff8100856c00 |grep -E " comm|pid ="
pid = 0,
thread_pid = 0xffffffc011db0f10 <init_struct_pid>,
comm = "swapper/1\000\000\000\000\000\000",
crash>
crash> struct task_struct 0xffffff8101179200 |grep -E " comm|pid ="
pid = 1004,
thread_pid = 0xffffff8105260b40,
comm = "spin thread1\000\000\000",
crash>
crash> struct task_struct 0xffffff810117a400 |grep -E " comm|pid ="
pid = 1005,
thread_pid = 0xffffff8105260c00,
comm = "spin thread2\000\000\000",
crash>
crash> struct task_struct 0xffffff810089a400 |grep -E " comm|pid ="
pid = 0,
thread_pid = 0xffffffc011db0f10 <init_struct_pid>,
comm = "swapper/4\000\000\000\000\000\000",
crash>
crash> struct task_struct 0xffffff810089b600 |grep -E " comm|pid ="
pid = 0,
thread_pid = 0xffffffc011db0f10 <init_struct_pid>,
comm = "swapper/5\000\000\000\000\000\000",
crash>
crash> struct task_struct 0xffffff810089c800 |grep -E " comm|pid ="
pid = 0,
thread_pid = 0xffffffc011db0f10 <init_struct_pid>,
comm = "swapper/6\000\000\000\000\000\000",
crash>
crash> struct task_struct 0xffffff810089da00 |grep -E " comm|pid ="
pid = 0,
thread_pid = 0xffffffc011db0f10 <init_struct_pid>,
comm = "swapper/7\000\000\000\000\000\000",
crash>
c、bt命令打印正在cpu上运行线程的栈回溯,确定死锁线程
/* 看红色部分确定 spinlock_thread1 线程因为自旋锁等待而占用cpu执行 */
crash> bt 1004
PID: 1004 TASK: ffffff8101179200 CPU: 2 COMMAND: "spin thread1"
#0 [ffffffc01217bb50] __crash_kexec at ffffffc0100d9244
#1 [ffffffc01217bce0] panic at ffffffc010e847f4
#2 [ffffffc01217bdc0] watchdog_timer_fn at ffffffc0100efe24
#3 [ffffffc01217be10] __hrtimer_run_queues at ffffffc0100b97ac
#4 [ffffffc01217be90] hrtimer_run_queues at ffffffc0100ba110
#5 [ffffffc01217bec0] run_local_timers at ffffffc0100b8b14
#6 [ffffffc01217bee0] update_process_times at ffffffc0100b8bb8
#7 [ffffffc01217bf10] tick_nohz_handler at ffffffc0100c9954
#8 [ffffffc01217bf40] arch_timer_handler_phys at ffffffc010af2a50
#9 [ffffffc01217bf50] handle_percpu_devid_irq at ffffffc01009a9c0
#10 [ffffffc01217bf90] generic_handle_irq_desc at ffffffc010093fe4
#11 [ffffffc01217bfa0] __handle_domain_irq at ffffffc010094a28
#12 [ffffffc01217bfd0] gic_handle_irq at ffffffc010000994
--- <IRQ stack> ---
#13 [ffffffc013c0bd90] el1_irq at ffffffc010002548
#14 [ffffffc013c0bdb0] __cmpwait_case_32 at ffffffc01008809c
#15 [ffffffc013c0be00] do_raw_spin_lock at ffffffc010089a04
#16 [ffffffc013c0be30] _raw_spin_lock at ffffffc010eb34e0
#17 [ffffffc013c0be40] spinlock_thread1 at ffffffc008f0b100 [kdump]
#18 [ffffffc013c0be60] kthread at ffffffc010058270
crash>
/* 正常的调度idel线程状态,和自旋锁无关 */
crash> bt 0
PID: 0 TASK: ffffffc011da24c0CPU: 0 COMMAND: "swapper/0"
bt: WARNING: cannot determine starting stack frame for task ffffffc011da24c0PID: 0 TASK: ffffff8100856c00 CPU: 1 COMMAND: "swapper/1"
bt: WARNING: cannot determine starting stack frame for task ffffff8100856c00PID: 0 TASK: ffffff8100898000 CPU: 2 COMMAND: "swapper/2"
#0 [ffffffc01236be80] __switch_to at ffffffc010007e88
#1 [ffffffc01236beb0] __schedule at ffffffc010eafa54
#2 [ffffffc01236bf30] schedule_idle at ffffffc010eaff5c
#3 [ffffffc01236bf50] do_idle at ffffffc01006bea4
#4 [ffffffc01236bfb0] cpu_startup_entry at ffffffc01006c210
#5 [ffffffc01236bfd0] secondary_start_kernel at ffffffc010015934
4.2、soft lookup 机制确定死锁线程
根据下面红色打印,可以确认以下结论:
CPU#3 上面执行的 spin thread1 线程已经占用cpu 22 秒了,
CPU#5 上面执行的 spin thread2 线程已经占用cpu 22 秒了,所以这两个线程可能是死锁了。
/mnt/mtd # echo kdump-4 > /proc/dbug/kdump
/mnt/mtd # 23] open embedsky board device!
[ 728.437843] printk kbuf kdump-4
[ 728.437843]
[ 728.437926] spinlock_t lock name spin thread1 time = 1
[ 728.437980] spinlock_t lock name spin thread2 time = 1
[ 754.280911] watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [ 754.280919] Modules linked in: kdump(O) r8168(O) r8125(O) i2c_common(O) avs(O) 8811au(O) 8812au(O) netctrl(O) GobiNet(O) driverbox(O) dgs(O) license(O) osa(O) fwcrypto(PO) crypto(O)
[ 754.280940] CPU: 3 PID: 6737 Comm: spin thread1 Kdump: loaded Tainted: P O 5.10.66 #2
[ 754.280942] Hardware name: Rockchip RK3588 NVR DEMO LP4 V10 Board (DT)
[ 754.280946] pstate: 00c00009 (nzcv daif +PAN +UAO -TCO BTYPE=--)
[ 754.280953] pc : __cmpwait_case_32+0x1c/0x24
[ 754.281016] Call trace:
[ 754.281020] __cmpwait_case_32+0x1c/0x24
[ 754.281022] do_raw_spin_lock+0xdc/0xe0
[ 754.281027] _raw_spin_lock+0x1c/0x28
[ 754.281032] spinlock_thread1+0x60/0x88 [kdump]
[ 754.281036] kthread+0xf8/0x108
[ 754.281040] ret_from_fork+0x10/0x18[ 754.284281] Kernel panic - not syncing: softlockup: hung tasks
[ 754.300914] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [spin thread2:6738]
[ 757.696838] CPU: 3 PID: 6737 Comm: spin thread1 Kdump: loaded Tainted: P O L 5.10.66 #2
[ 757.850248] CPU: 5 PID: 6738 Comm: spin thread2 Kdump: loaded Tainted: P O L 5.10.66 #2
[ 757.859369] Hardware name: Rockchip RK3588 NVR DEMO LP4 V10 Board (DT)
[ 757.865891] pstate: 00c00009 (nzcv daif +PAN +UAO -TCO BTYPE=--)
[ 757.964261] Call trace:
[ 757.966711] __cmpwait_case_32+0x1c/0x24
[ 757.970635] do_raw_spin_lock+0xdc/0xe0
[ 757.974468] _raw_spin_lock+0x1c/0x28
[ 757.978128] spinlock_thread2+0x5c/0x84 [kdump]
[ 757.982657] kthread+0xf8/0x108
[ 757.985793] ret_from_fork+0x10/0x18
4.3、ps |grep RU 找到可执行状态线程,在使用bt命令查看栈回溯确定死锁线程
这种方法比较低效,不过多介绍
crash> ps |grep RU
> 0 0 0 ffffffc011da24c0 RU 0.0 0 0 [swapper/0]
> 0 0 1 ffffff8100856c00 RU 0.0 0 0 [swapper/1]
0 0 2 ffffff8100898000 RU 0.0 0 0 [swapper/2]
0 0 3 ffffff8100899200 RU 0.0 0 0 [swapper/3]
> 0 0 4 ffffff810089a400 RU 0.0 0 0 [swapper/4]
> 0 0 5 ffffff810089b600 RU 0.0 0 0 [swapper/5]
> 0 0 6 ffffff810089c800 RU 0.0 0 0 [swapper/6]
> 0 0 7 ffffff810089da00 RU 0.0 0 0 [swapper/7]
21 2 2 ffffff81008aec00 RU 0.0 0 0 [migration/2]
22 2 2 ffffff81008f0000 RU 0.0 0 0 [ksoftirqd/2]
26 2 3 ffffff81008f4800 RU 0.0 0 0 [migration/3]
54 2 3 ffffff81009b4800 RU 0.0 0 0 [kworker/3:1]
225 2 2 ffffff810326da00 RU 0.0 0 0 [kworker/2:3]
> 1004 2 2 ffffff8101179200 RU 0.0 0 0 [spin thread1]
> 1005 2 3 ffffff810117a400 RU 0.0 0 0 [spin thread2]
/* 自旋锁等待 */
crash> bt 1004
PID: 1004 TASK: ffffff8101179200 CPU: 2 COMMAND: "spin thread1"
#0 [ffffffc01217bb50] __crash_kexec at ffffffc0100d9244
#1 [ffffffc01217bce0] panic at ffffffc010e847f4
#2 [ffffffc01217bdc0] watchdog_timer_fn at ffffffc0100efe24
#3 [ffffffc01217be10] __hrtimer_run_queues at ffffffc0100b97ac
#4 [ffffffc01217be90] hrtimer_run_queues at ffffffc0100ba110
#5 [ffffffc01217bec0] run_local_timers at ffffffc0100b8b14
#6 [ffffffc01217bee0] update_process_times at ffffffc0100b8bb8
#7 [ffffffc01217bf10] tick_nohz_handler at ffffffc0100c9954
#8 [ffffffc01217bf40] arch_timer_handler_phys at ffffffc010af2a50
#9 [ffffffc01217bf50] handle_percpu_devid_irq at ffffffc01009a9c0
#10 [ffffffc01217bf90] generic_handle_irq_desc at ffffffc010093fe4
#11 [ffffffc01217bfa0] __handle_domain_irq at ffffffc010094a28
#12 [ffffffc01217bfd0] gic_handle_irq at ffffffc010000994
--- <IRQ stack> ---
#13 [ffffffc013c0bd90] el1_irq at ffffffc010002548
#14 [ffffffc013c0bdb0] __cmpwait_case_32 at ffffffc01008809c
#15 [ffffffc013c0be00] do_raw_spin_lock at ffffffc010089a04
#16 [ffffffc013c0be30] _raw_spin_lock at ffffffc010eb34e0
#17 [ffffffc013c0be40] spinlock_thread1 at ffffffc008f0b100 [kdump]
#18 [ffffffc013c0be60] kthread at ffffffc010058270
/* 正常调度执行 */
crash> bt 22
PID: 22 TASK: ffffff81008f0000 CPU: 2 COMMAND: "ksoftirqd/2"
#0 [ffffffc01243bd30] __switch_to at ffffffc010007e88
#1 [ffffffc01243bd60] __schedule at ffffffc010eafa54
#2 [ffffffc01243bde0] schedule at ffffffc010eafd20
#3 [ffffffc01243be00] smpboot_thread_fn at ffffffc01005e8f4
#4 [ffffffc01243be60] kthread at ffffffc010058270
crash>