lockdep是内核提供协助发现死锁问题的feature
以如下ABBA类型circular locking为例
分析一下打印信息
1[ 13.007000] WARNING: possible circular locking dependency detected
2[ 13.007587] 5.0.0-rc3-00018-g2fa53f892422-dirty #477 Not tainted
3[ 13.008124] ------------------------------------------------------
4[ 13.008624] test_progs/246 is trying to acquire lock:
5[ 13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
6[ 13.009770]
7[ 13.009770] but task is already holding lock:
8[ 13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
9[ 13.010877]
10[ 13.010877] which lock already depends on the new lock.
11[ 13.010877]
12[ 13.011532]
13[ 13.011532] the existing dependency chain (in reverse order) is:
14[ 13.012129]
15[ 13.012129] -> #4 (bpf_event_mutex){+.+.}:
16[ 13.012582] perf_event_query_prog_array+0x9b/0x130
17[ 13.013016] _perf_ioctl+0x3aa/0x830
18[ 13.013354] perf_ioctl+0x2e/0x50
19[ 13.013668] do_vfs_ioctl+0x8f/0x6a0
20[ 13.014003] ksys_ioctl+0x70/0x80
21[ 13.014320] __x64_sys_ioctl+0x16/0x20
22[ 13.014668] do_syscall_64+0x4a/0x180
23[ 13.015007] entry_SYSCALL_64_after_hwframe+0x49/0xbe
24[ 13.015469]
25[ 13.015469] -> #3 (&cpuctx_mutex){+.+.}:
26[ 13.015910] perf_event_init_cpu+0x5a/0x90
27[ 13.016291] perf_event_init+0x1b2/0x1de
28[ 13.016654] start_kernel+0x2b8/0x42a
29[ 13.016995] secondary_startup_64+0xa4/0xb0
30[ 13.017382]
31[ 13.017382] -> #2 (pmus_lock){+.+.}:
32[ 13.017794] perf_event_init_cpu+0x21/0x90
33[ 13.018172] cpuhp_invoke_callback+0xb3/0x960
34[ 13.018573] _cpu_up+0xa7/0x140
35[ 13.018871] do_cpu_up+0xa4/0xc0
36[ 13.019178] smp_init+0xcd/0xd2
37[ 13.019483] kernel_init_freeable+0x123/0x24f
38[ 13.019878] kernel_init+0xa/0x110
39[ 13.020201] ret_from_fork+0x24/0x30
40[ 13.020541]
41[ 13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}:
42[ 13.021051] static_key_slow_inc+0xe/0x20
43[ 13.021424] tracepoint_probe_register_prio+0x28c/0x300
44[ 13.021891] perf_trace_event_init+0x11f/0x250
45[ 13.022297] perf_trace_init+0x6b/0xa0
46[ 13.022644] perf_tp_event_init+0x25/0x40
47[ 13.023011] perf_try_init_event+0x6b/0x90
48[ 13.023386] perf_event_alloc+0x9a8/0xc40
49[ 13.023754] __do_sys_perf_event_open+0x1dd/0xd30
50[ 13.024173] do_syscall_64+0x4a/0x180
51[ 13.024519] entry_SYSCALL_64_after_hwframe+0x49/0xbe
52[ 13.024968]
53[ 13.024968] -> #0 (tracepoints_mutex){+.+.}:
54[ 13.025434] __mutex_lock+0x86/0x970
55[ 13.025764] tracepoint_probe_register_prio+0x2d/0x300
56[ 13.026215] bpf_probe_register+0x40/0x60
57[ 13.026584] bpf_raw_tracepoint_open.isra.34+0xa4/0x130
58[ 13.027042] __do_sys_bpf+0x94f/0x1a90
59[ 13.027389] do_syscall_64+0x4a/0x180
60[ 13.027727] entry_SYSCALL_64_after_hwframe+0x49/0xbe
61[ 13.028171]
62[ 13.028171] other info that might help us debug this:
63[ 13.028171]
64[ 13.028807] Chain exists of:
65[ 13.028807] tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex
66[ 13.028807]
67[ 13.029666] Possible unsafe locking scenario:
68[ 13.029666]
69[ 13.030140] CPU0 CPU1
70[ 13.030510] ---- ----
71[ 13.030875] lock(bpf_event_mutex);
72[ 13.031166] lock(&cpuctx_mutex);
73[ 13.031645] lock(bpf_event_mutex);
74[ 13.032135] lock(tracepoints_mutex);
75[ 13.032441]
76[ 13.032441] *** DEADLOCK ***
77[ 13.032441]
78[ 13.032911] 1 lock held by test_progs/246:
79[ 13.033239] #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
80[ 13.033909]
81[ 13.033909] stack backtrace:
82[ 13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #477
83[ 13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
84[ 13.035657] Call Trace:
85[ 13.035859] dump_stack+0x5f/0x8b
86[ 13.036130] print_circular_bug.isra.37+0x1ce/0x1db
87[ 13.036526] __lock_acquire+0x1158/0x1350
88[ 13.036852] ? lock_acquire+0x98/0x190
89[ 13.037154] lock_acquire+0x98/0x190
90[ 13.037447] ? tracepoint_probe_register_prio+0x2d/0x300
91[ 13.037876] __mutex_lock+0x86/0x970
92[ 13.038167] ? tracepoint_probe_register_prio+0x2d/0x300
93[ 13.038600] ? tracepoint_probe_register_prio+0x2d/0x300
94[ 13.039028] ? __mutex_lock+0x86/0x970
95[ 13.039337] ? __mutex_lock+0x24a/0x970
96[ 13.039649] ? bpf_probe_register+0x1d/0x60
97[ 13.039992] ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
98[ 13.040478] ? tracepoint_probe_register_prio+0x2d/0x300
99[ 13.040906] tracepoint_probe_register_prio+0x2d/0x300
100[ 13.041325] bpf_probe_register+0x40/0x60
101[ 13.041649] bpf_raw_tracepoint_open.isra.34+0xa4/0x130
102[ 13.042068] ? __might_fault+0x3e/0x90
103[ 13.042374] __do_sys_bpf+0x94f/0x1a90
104[ 13.042678] do_syscall_64+0x4a/0x180
105[ 13.042975] entry_SYSCALL_64_after_hwframe+0x49/0xbe
106[ 13.043382] RIP: 0033:0x7f23b10a07f9
107[ 13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
108[ 13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
109[ 13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
110[ 13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
111[ 13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
112[ 13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000
113
第5行到第8行的意思是当前线程正要对tracepoints_mutex进行上锁,但是发现bpf_event_mutex已经被本线程锁了。
第10行的意思是bpf_event_mutex曾经依赖过tracepoints_mutex
{+.+.}不容易理解,在linux内核的lockdep-design.txt有介绍如下:
'.' acquired while irqs disabled and not in irq context
'-' acquired in irq context
'+' acquired with irqs enabled
'?' acquired in irq context with irqs enabled.
再来看64行和65行,打印所在代码如下:
1kernel/locking/lockdep.c
2
3if (parent != source) {
4 printk("Chain exists of:\n ");
5 __print_lock_name(source);
6 printk(KERN_CONT " --> ");
7 __print_lock_name(parent);
8 printk(KERN_CONT " --> ");
9 __print_lock_name(target);
10 printk(KERN_CONT "\n\n");
11}
source的意思是当前线程正要上的锁
parent是target曾经的dependency chain里面上一个拿着的锁
target是在当前线程中锁住了但是曾经依赖过source的锁
15行到60行描述的是一个dependency chain
这其实是ABBA类型的扩展ABBCDEA,这种死锁不用工具是很难发现的。
通过log打印我们可以看到
A是(bpf_event_mutex)
B是(tracepoints_mutex)
C是(cpu_hotplug_lock.rw_sem)
D是(pmus_lock)
E是(&cpuctx_mutex)
纵观kernel的commit历史
hotplug的地方往往容易locking order reversed