一个经典问题,两个线程,都执行变量自增操作,由于实现是先将变量读到寄存器自增后再写回内存,不是一个汇编指令,导致存在窗口期,最后结果取决于执行顺序,变量的值可能多1或者多2,验证代码如下:
int a;
void func() {
}
void* thread_func1(void* arg) {
func();
a++;
}
void* thread_func2(void* arg) {
a++;
}
int main() {
a = 0;
pthread_t pid[2];
pthread_create(&pid[0], NULL, thread_func1, NULL);
pthread_create(&pid[1], NULL, thread_func2, NULL);
pthread_join(pid[0], NULL);
pthread_join(pid[1], NULL);
printf("a = %d\n", a);
exit(0);
}
其中func的作用是提供断点
编译
cc -g -lpthread test.c
gdb调试
(gdb) break func
Breakpoint 1 at 0x400764: file test.c, line 7.
(gdb) r
Starting program: /root/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/sbin/../lib/libthread_db.so.1".
[New Thread 0x7ffff6e0f700 (LWP 2704)]
[New Thread 0x7ffff7810700 (LWP 2703)]
[Switching to Thread 0x7ffff7810700 (LWP 2703)]
Breakpoint 1, func () at lala.c:7
7 }
(gdb) info threads
Id Target Id Frame
* 3 Thread 0x7ffff7810700 (LWP 3600) func () at lala.c:7
2 Thread 0x7ffff6e0f700 (LWP 3601) 0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6
1 Thread 0x7ffff7ff8700 (LWP 3598) 0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6
(gdb) set scheduler-locking on
(gdb) ni
0x0000000000400765 7 }
(gdb) ni
thread_func1 (arg=0x0) at lala.c:11
11 a++;
(gdb) disas
Dump of assembler code for function thread_func1:
0x0000000000400766 <+0>: push %rbp
0x0000000000400767 <+1>: mov %rsp,%rbp
0x000000000040076a <+4>: sub $0x8,%rsp
0x000000000040076e <+8>: mov %rdi,-0x8(%rbp)
0x0000000000400772 <+12>: mov $0x0,%eax
0x0000000000400777 <+17>: callq 0x400760 <func>
=> 0x000000000040077c <+22>: mov 0x20058e(%rip),%eax # 0x600d10 <a>
0x0000000000400782 <+28>: add $0x1,%eax
0x0000000000400785 <+31>: mov %eax,0x200585(%rip) # 0x600d10 <a>
0x000000000040078b <+37>: leaveq
0x000000000040078c <+38>: retq
End of assembler dump.
(gdb) ni
0x0000000000400782 11 a++;
(gdb) disas
Dump of assembler code for function thread_func1:
0x0000000000400766 <+0>: push %rbp
0x0000000000400767 <+1>: mov %rsp,%rbp
0x000000000040076a <+4>: sub $0x8,%rsp
0x000000000040076e <+8>: mov %rdi,-0x8(%rbp)
0x0000000000400772 <+12>: mov $0x0,%eax
0x0000000000400777 <+17>: callq 0x400760 <func>
0x000000000040077c <+22>: mov 0x20058e(%rip),%eax # 0x600d10 <a>
=> 0x0000000000400782 <+28>: add $0x1,%eax
0x0000000000400785 <+31>: mov %eax,0x200585(%rip) # 0x600d10 <a>
0x000000000040078b <+37>: leaveq
0x000000000040078c <+38>: retq
End of assembler dump.
可见,实现是,通过rip指令寄存器采用地址无关代码技术定位到data段的变量a,从内存中读取到eax寄存器,执行+1操作,然后写回内存。进程有三个线程,一个主线程和两个使用pthread_create创建的线程。pthread库底层使用clone实现共享同一个进程空间,因此两个线程thread2和thread3都可以访问同一个虚拟空间的地址a。这里断点在thread3,进入thread3,锁定只有thread3可以执行,ni单步执行到将寄存器中的变量值写回内存之前的指令。之后进入thread2,使用step命令执行,如下:
(gdb) info threads
Id Target Id Frame
* 3 Thread 0x7ffff7810700 (LWP 3600) 0x0000000000400782 in thread_func1 (arg=0x0) at lala.c:11
2 Thread 0x7ffff6e0f700 (LWP 3601) 0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6
1 Thread 0x7ffff7ff8700 (LWP 3598) 0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6
(gdb) thread 2
[Switching to thread 2 (Thread 0x7ffff6e0f700 (LWP 3601))]
#0 0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6
(gdb) set scheduler-locking on
(gdb) ni
0x00007ffff78f80f4 in clone () from /root/../lib/libc.so.6
(gdb) step
Single stepping until exit from function clone,
which has no line number information.
0x00007ffff7bc60f0 in start_thread () from /root/../lib/libpthread.so.0
(gdb) step
Single stepping until exit from function start_thread,
which has no line number information.
thread_func2 (arg=0x0) at lala.c:15
15 a++;
(gdb) step
16 }
(gdb) p a
$1 = 1
(gdb) step
0x00007ffff7bc61c3 in start_thread () from /root/../lib/libpthread.so.0
(gdb) step
Single stepping until exit from function start_thread,
which has no line number information.
[Thread 0x7ffff6e0f700 (LWP 3601) exited]
No unwaited-for children left.
(gdb) info threads
Id Target Id Frame
3 Thread 0x7ffff7810700 (LWP 3600) 0x0000000000400782 in thread_func1 (arg=0x0) at lala.c:11
1 Thread 0x7ffff7ff8700 (LWP 3598) (Exiting) 0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6
The current thread <Thread ID 2> has terminated. See `help thread'.
此时thread2执行完成退出,thread2的工作导致内存中的变量a的值自增变成1,如下:
(gdb) info threads
Id Target Id Frame
3 Thread 0x7ffff7810700 (LWP 3600) 0x0000000000400782 in thread_func1 (arg=0x0) at lala.c:11
1 Thread 0x7ffff7ff8700 (LWP 3598) (Exiting) 0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6
The current thread <Thread ID 2> has terminated. See `help thread'.
(gdb) thread 3
[Switching to thread 3 (Thread 0x7ffff7810700 (LWP 3600))]
#0 0x0000000000400782 in thread_func1 (arg=0x0) at lala.c:11
11 a++;
(gdb) info threads
Id Target Id Frame
* 3 Thread 0x7ffff7810700 (LWP 3600) 0x0000000000400782 in thread_func1 (arg=0x0) at lala.c:11
1 Thread 0x7ffff7ff8700 (LWP 3598) (Exiting) 0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6
(gdb) disassemble
Dump of assembler code for function thread_func1:
0x0000000000400766 <+0>: push %rbp
0x0000000000400767 <+1>: mov %rsp,%rbp
0x000000000040076a <+4>: sub $0x8,%rsp
0x000000000040076e <+8>: mov %rdi,-0x8(%rbp)
0x0000000000400772 <+12>: mov $0x0,%eax
0x0000000000400777 <+17>: callq 0x400760 <func>
0x000000000040077c <+22>: mov 0x20058e(%rip),%eax # 0x600d10 <a>
=> 0x0000000000400782 <+28>: add $0x1,%eax
0x0000000000400785 <+31>: mov %eax,0x200585(%rip) # 0x600d10 <a>
0x000000000040078b <+37>: leaveq
0x000000000040078c <+38>: retq
End of assembler dump.
(gdb) p a
$2 = 1
此时内存中a的值是1,thread3在eax寄存器中保持的a的值是1,于是thread3将a写回内存后,并发导致的问题发生了,最终a的值是1而不是2,主线程中最终打印a的值为2,如下:
(gdb) step
25 pthread_join(pid[1], NULL);
(gdb) step
27 printf("a = %d\n", a);
(gdb) step
a = 1
29 exit(0);
gdb调试线程出现的问题:在首次进行上述验证的时候,发现gdb中无法观察到线程栈,如下:
(gdb) info threads
Id Target Id Frame
3 LWP 20548 0x0000003f0ae073b4 in ?? ()
* 2 LWP 20547 func () at test.c:23
1 LWP 20546 0x0000003f0b0c5ea4 in ?? ()
原因是gdb使用的版本和pthread动态链接库不匹配,导致gdb无法读入多线程调试信息,如下:
Error while mapping shared library sections:
Could not open `target:/lib64/tls/libpthread.so.0' as an executable file: Unknown error 18446744073709551615
Error while mapping shared library sections:
gdb中查询加载的target-stack也可以看到,没有多线程相关内容,如下:
(gdb) maintenance print target-stack
The current target stack is:
- native (Native process)
- exec (Local exec file)
- None (None)
更改配套版本即可,更改后查询如下:
(gdb) maintenance print target-stack
The current target stack is:
- multi-thread (multi-threaded child process.)
- native (Native process)
- exec (Local exec file)
- None (None)
可以看到已经加载了multi-thread相关调试信息