利用gdb调试线程验证多线程并发问题

一个经典问题,两个线程,都执行变量自增操作,由于实现是先将变量读到寄存器自增后再写回内存,不是一个汇编指令,导致存在窗口期,最后结果取决于执行顺序,变量的值可能多1或者多2,验证代码如下:

int a;

void func() {
}

void* thread_func1(void* arg) {
    func();
    a++;
}

void* thread_func2(void* arg) {
    a++;
}

int main() {
    a = 0;
    pthread_t pid[2];
    pthread_create(&pid[0], NULL, thread_func1, NULL);
    pthread_create(&pid[1], NULL, thread_func2, NULL);

    pthread_join(pid[0], NULL);
    pthread_join(pid[1], NULL);

    printf("a = %d\n", a);

    exit(0);
}

其中func的作用是提供断点

编译

cc -g -lpthread test.c

gdb调试

(gdb) break func
Breakpoint 1 at 0x400764: file test.c, line 7.
(gdb) r
Starting program: /root/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/sbin/../lib/libthread_db.so.1".
[New Thread 0x7ffff6e0f700 (LWP 2704)]
[New Thread 0x7ffff7810700 (LWP 2703)]
[Switching to Thread 0x7ffff7810700 (LWP 2703)]

Breakpoint 1, func () at lala.c:7
7	}
(gdb) info threads
  Id   Target Id         Frame
* 3    Thread 0x7ffff7810700 (LWP 3600) func () at lala.c:7
  2    Thread 0x7ffff6e0f700 (LWP 3601) 0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6
  1    Thread 0x7ffff7ff8700 (LWP 3598) 0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6
(gdb) set scheduler-locking on
(gdb) ni
0x0000000000400765	7	}
(gdb) ni
thread_func1 (arg=0x0) at lala.c:11
11	    a++;
(gdb) disas
Dump of assembler code for function thread_func1:
   0x0000000000400766 <+0>:	push   %rbp
   0x0000000000400767 <+1>:	mov    %rsp,%rbp
   0x000000000040076a <+4>:	sub    $0x8,%rsp
   0x000000000040076e <+8>:	mov    %rdi,-0x8(%rbp)
   0x0000000000400772 <+12>:	mov    $0x0,%eax
   0x0000000000400777 <+17>:	callq  0x400760 <func>
=> 0x000000000040077c <+22>:	mov    0x20058e(%rip),%eax        # 0x600d10 <a>
   0x0000000000400782 <+28>:	add    $0x1,%eax
   0x0000000000400785 <+31>:	mov    %eax,0x200585(%rip)        # 0x600d10 <a>
   0x000000000040078b <+37>:	leaveq
   0x000000000040078c <+38>:	retq
End of assembler dump.
(gdb) ni
0x0000000000400782	11	    a++;
(gdb) disas
Dump of assembler code for function thread_func1:
   0x0000000000400766 <+0>:	push   %rbp
   0x0000000000400767 <+1>:	mov    %rsp,%rbp
   0x000000000040076a <+4>:	sub    $0x8,%rsp
   0x000000000040076e <+8>:	mov    %rdi,-0x8(%rbp)
   0x0000000000400772 <+12>:	mov    $0x0,%eax
   0x0000000000400777 <+17>:	callq  0x400760 <func>
   0x000000000040077c <+22>:	mov    0x20058e(%rip),%eax        # 0x600d10 <a>
=> 0x0000000000400782 <+28>:	add    $0x1,%eax
   0x0000000000400785 <+31>:	mov    %eax,0x200585(%rip)        # 0x600d10 <a>
   0x000000000040078b <+37>:	leaveq
   0x000000000040078c <+38>:	retq
End of assembler dump.

可见,实现是,通过rip指令寄存器采用地址无关代码技术定位到data段的变量a,从内存中读取到eax寄存器,执行+1操作,然后写回内存。进程有三个线程,一个主线程和两个使用pthread_create创建的线程。pthread库底层使用clone实现共享同一个进程空间,因此两个线程thread2和thread3都可以访问同一个虚拟空间的地址a。这里断点在thread3,进入thread3,锁定只有thread3可以执行,ni单步执行到将寄存器中的变量值写回内存之前的指令。之后进入thread2,使用step命令执行,如下:

(gdb) info threads
  Id   Target Id         Frame
* 3    Thread 0x7ffff7810700 (LWP 3600) 0x0000000000400782 in thread_func1 (arg=0x0) at lala.c:11
  2    Thread 0x7ffff6e0f700 (LWP 3601) 0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6
  1    Thread 0x7ffff7ff8700 (LWP 3598) 0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6
(gdb) thread 2
[Switching to thread 2 (Thread 0x7ffff6e0f700 (LWP 3601))]
#0  0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6
(gdb) set scheduler-locking on
(gdb) ni
0x00007ffff78f80f4 in clone () from /root/../lib/libc.so.6
(gdb) step
Single stepping until exit from function clone,
which has no line number information.
0x00007ffff7bc60f0 in start_thread () from /root/../lib/libpthread.so.0
(gdb) step
Single stepping until exit from function start_thread,
which has no line number information.
thread_func2 (arg=0x0) at lala.c:15
15	    a++;
(gdb) step
16	}
(gdb) p a
$1 = 1
(gdb) step
0x00007ffff7bc61c3 in start_thread () from /root/../lib/libpthread.so.0
(gdb) step
Single stepping until exit from function start_thread,
which has no line number information.
[Thread 0x7ffff6e0f700 (LWP 3601) exited]
No unwaited-for children left.
(gdb) info threads
  Id   Target Id         Frame
  3    Thread 0x7ffff7810700 (LWP 3600) 0x0000000000400782 in thread_func1 (arg=0x0) at lala.c:11
  1    Thread 0x7ffff7ff8700 (LWP 3598) (Exiting) 0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6

The current thread <Thread ID 2> has terminated.  See `help thread'.

此时thread2执行完成退出,thread2的工作导致内存中的变量a的值自增变成1,如下:

(gdb) info threads
  Id   Target Id         Frame
  3    Thread 0x7ffff7810700 (LWP 3600) 0x0000000000400782 in thread_func1 (arg=0x0) at lala.c:11
  1    Thread 0x7ffff7ff8700 (LWP 3598) (Exiting) 0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6

The current thread <Thread ID 2> has terminated.  See `help thread'.
(gdb) thread 3
[Switching to thread 3 (Thread 0x7ffff7810700 (LWP 3600))]
#0  0x0000000000400782 in thread_func1 (arg=0x0) at lala.c:11
11	    a++;
(gdb) info threads
  Id   Target Id         Frame
* 3    Thread 0x7ffff7810700 (LWP 3600) 0x0000000000400782 in thread_func1 (arg=0x0) at lala.c:11
  1    Thread 0x7ffff7ff8700 (LWP 3598) (Exiting) 0x00007ffff78f80f1 in clone () from /root/../lib/libc.so.6
(gdb) disassemble
Dump of assembler code for function thread_func1:
   0x0000000000400766 <+0>:	push   %rbp
   0x0000000000400767 <+1>:	mov    %rsp,%rbp
   0x000000000040076a <+4>:	sub    $0x8,%rsp
   0x000000000040076e <+8>:	mov    %rdi,-0x8(%rbp)
   0x0000000000400772 <+12>:	mov    $0x0,%eax
   0x0000000000400777 <+17>:	callq  0x400760 <func>
   0x000000000040077c <+22>:	mov    0x20058e(%rip),%eax        # 0x600d10 <a>
=> 0x0000000000400782 <+28>:	add    $0x1,%eax
   0x0000000000400785 <+31>:	mov    %eax,0x200585(%rip)        # 0x600d10 <a>
   0x000000000040078b <+37>:	leaveq
   0x000000000040078c <+38>:	retq
End of assembler dump.
(gdb) p a
$2 = 1

此时内存中a的值是1,thread3在eax寄存器中保持的a的值是1,于是thread3将a写回内存后,并发导致的问题发生了,最终a的值是1而不是2,主线程中最终打印a的值为2,如下:

(gdb) step
25	    pthread_join(pid[1], NULL);
(gdb) step
27	    printf("a = %d\n", a);
(gdb) step
a = 1
29	    exit(0);

gdb调试线程出现的问题:在首次进行上述验证的时候,发现gdb中无法观察到线程栈,如下:

(gdb) info threads
  Id   Target Id         Frame
  3    LWP 20548         0x0000003f0ae073b4 in ?? ()
* 2    LWP 20547         func () at test.c:23
  1    LWP 20546         0x0000003f0b0c5ea4 in ?? ()

原因是gdb使用的版本和pthread动态链接库不匹配,导致gdb无法读入多线程调试信息,如下:

Error while mapping shared library sections:
Could not open `target:/lib64/tls/libpthread.so.0' as an executable file: Unknown error 18446744073709551615
Error while mapping shared library sections:

gdb中查询加载的target-stack也可以看到,没有多线程相关内容,如下:

(gdb) maintenance print target-stack
The current target stack is:
  - native (Native process)
  - exec (Local exec file)
  - None (None)

更改配套版本即可,更改后查询如下:

(gdb) maintenance print target-stack
The current target stack is:
  - multi-thread (multi-threaded child process.)
  - native (Native process)
  - exec (Local exec file)
  - None (None)

可以看到已经加载了multi-thread相关调试信息

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值