编译器对外部变量的处理

最新推荐文章于 2023-02-21 13:56:12 发布

weixin_34236869

最新推荐文章于 2023-02-21 13:56:12 发布

阅读量215

点赞数

原文链接：http://blog.51cto.com/10495461/1720701

版权

一直以来对锁比较感兴趣。因为在多核编程中，锁是一个可恨有可爱的东西。说它可恨，是因为锁的使用，降低了并发性，也就降低了性能。可爱之处呢，因为锁的使用有时是无法避免的。那么如何实现一个高性能的锁又是一个很有意思的问题。以后有机会，再跟大家交流一下锁的实现部分。

今天是我在看spinlock的时候，突然想到的问题。这个问题不局限于spinlock，下面的示例我使用mutex来代替。

#include <stdlib.h>

#include <stdio.h>

#include <pthread.h>

extern int counter;

extern pthread_mutex_t counter_mutex;

void add_counter(void)

{

pthread_mutex_lock(&counter_mutex);

++counter;

pthread_mutex_unlock(&counter_mutex);

}

counter是由counter_mutex保护的。在更新counter的时候，必须要先持有counter_mutex，这样才能保证正确更新counter。另外锁的实现中，一般需要有内存barrier的指令，来禁止CPU的乱序执行。如果没有barrier的话，在CPU的指令执行过程中，counter的更新很可能发生在unlock之后。这些问题也不是今天的重点。

内存barrier指令只能保证CPU在barrier前的内存指令必须完成。但是如果在编译器将counter放到了寄存器中怎么办？比如在持有counter_mutex之前，对counter有读取的操作。那么编译器很可能会将counter在lock之前，就存到寄存器中。这样，在持有lock之后，counter因为之前已经读取到寄存器中了，这个++counter是否会直接对寄存器操作呢？如下面的代码：

#include <stdlib.h>

#include <stdio.h>

#include <pthread.h>

extern int counter;

extern pthread_mutex_t counter_mutex;

#define ASM_SEPERATOR __asm__ __volatile__ ("nop")

void add_counter(void)

{

/* 下面的代码对counter进行了读取操作，那么counter会被放入到寄存器中 */

int t = counter;

ASM_SEPERATOR;

printf("counter is %d\n", counter);

ASM_SEPERATOR;

/* 在前面的代码中，counter已经被放入到了寄存器中，那么下面的更新是否会直接更新该寄存器呢？ */

pthread_mutex_lock(&counter_mutex);

ASM_SEPERATOR;

++counter;

ASM_SEPERATOR;

pthread_mutex_unlock(&counter_mutex);

}

当想到这个问题的时候，心里升起一阵寒意。因为这样的代码肯定会存在于我们的工程中。在持有锁之前，对保护的资源进行读取的动作，这是一个很平常的行为。如果前面的读取动作导致该资源被放到寄存器中，岂不是导致锁失效了？难道在这种情况下，即使是读取动作也要加锁保护吗？如果假设为真的话，那么有bug的代码就太多了，那么早就报出很多问题了。所以这种使用方法应该是没有问题的。

还是让我们看一下反汇编吧：

00000000 :

extern pthread_mutex_t counter_mutex;

#define ASM_SEPERATOR __asm__ __volatile__ ("nop")

void add_counter(void)

{

0: 55 push %ebp

1: 89 e5 mov %esp,%ebp

3: 83 ec 28 sub $0x28,%esp

int t = counter;

6: a1 00 00 00 00 mov 0x0,%eax

b: 89 45 f4 mov %eax,-0xc(%ebp)

ASM_SEPERATOR;

e: 90 nop

printf("counter is %d\n", counter);

f: 8b 15 00 00 00 00 mov 0x0,%edx

15: b8 00 00 00 00 mov $0x0,%eax

1a: 89 54 24 04 mov %edx,0x4(%esp)

1e: 89 04 24 mov %eax,(%esp)

21: e8 fc ff ff ff call 22

ASM_SEPERATOR;

26: 90 nop

27: c7 04 24 00 00 00 00 movl $0x0,(%esp)

2e: e8 fc ff ff ff call 2f

pthread_mutex_lock(&counter_mutex);

33: 90 nop

ASM_SEPERATOR;

34: a1 00 00 00 00 mov 0x0,%eax

39: 83 c0 01 add $0x1,%eax

3c: a3 00 00 00 00 mov %eax,0x0

++counter;

41: 90 nop

ASM_SEPERATOR;

42: c7 04 24 00 00 00 00 movl $0x0,(%esp)

49: e8 fc ff ff ff call 4a

pthread_mutex_unlock(&counter_mutex);

4e: c9 leave

4f: c3 ret

红色部分的代码是将counter赋给t，这时counter已经被存入到eax中。而蓝色的代码是++counter。这里显示在counter进行自加的时候，是重新读取counter到寄存器中，然后再做自加，并没有直接利用前面的寄存器eax。

上面的汇编是没有使用优化选项的输出，下面是使用-O2优化的汇编结果：

Disassembly of section .text:

00000000 :

0: 55 push %ebp

1: 89 e5 mov %esp,%ebp

3: 83 ec 18 sub $0x18,%esp

6: 90 nop

7: a1 00 00 00 00 mov 0x0,%eax

c: c7 04 24 00 00 00 00 movl $0x0,(%esp)

13: 89 44 24 04 mov %eax,0x4(%esp)

17: e8 fc ff ff ff call 18

1c: 90 nop

1d: c7 04 24 00 00 00 00 movl $0x0,(%esp)

24: e8 fc ff ff ff call 25

29: 90 nop

2a: 83 05 00 00 00 00 01 addl $0x1,0x0

31: 90 nop

32: c7 04 24 00 00 00 00 movl $0x0,(%esp)

39: e8 fc ff ff ff call 3a

3e: c9 leave

3f: c3 ret

在t=counter时，依然是将counter放入到eax中，然后在将eax的值赋给t。而++counter的时候，干脆不用寄存器了，直接对内存进行加1的操作（x86支持对内存的加法操作）。

从汇编的结果上看，我之前的想到的问题有些杞人忧天了。即使counter在lock之前被存入某个寄存器，在自加的时候，仍然会重新读取，而不是直接使用那个寄存器。那么为什么编译器会产生这样的结果呢？因为使用了lock？比如lock的API中会有某个指令导致编译器生成这样的代码？我认为不可能。因为这样对编译器提出了非常过分的要求。因为编译的时候，编译器根本不会去检查调用的函数。在本例中，这个函数是pthread库函数，但是很多时候，这个函数甚至可以不存在。所以这个猜想肯定不对的。那么只有一个合理的解释了，因为counter是一个外部变量（非本函数内部定义）。编译器会假设该变量可能随时都会被外部更改，所以在任何时候，都需要重新读取到寄存器再使用。

这次我们干脆不是用全局变量，而是使用传入的参数：

#include <stdlib.h>

#include <stdio.h>

#include <pthread.h>

#define ASM_SEPERATOR __asm__ __volatile__ ("nop")

void add_counter(int *counter)

{

int t = *counter;

ASM_SEPERATOR;

printf("counter is %d %d\n", t, *counter);

ASM_SEPERATOR;

++*counter;

ASM_SEPERATOR;

printf("counter is %d\n", *counter);

}

反汇编输出：

00000000 :

0: 55 push %ebp

1: 89 e5 mov %esp,%ebp

3: 53 push %ebx

4: 83 ec 14 sub $0x14,%esp

7: 8b 5d 08 mov 0x8(%ebp),%ebx

a: 8b 03 mov (%ebx),%eax

c: 90 nop

d: 89 44 24 08 mov %eax,0x8(%esp)

11: 89 44 24 04 mov %eax,0x4(%esp)

15: c7 04 24 00 00 00 00 movl $0x0,(%esp)

1c: e8 fc ff ff ff call 1d

21: 90 nop

22: 90 nop

23: 8b 03 mov (%ebx),%eax

25: 83 c0 01 add $0x1,%eax

28: 89 03 mov %eax,(%ebx)

2a: 90 nop

2b: 89 44 24 04 mov %eax,0x4(%esp)

2f: c7 04 24 12 00 00 00 movl $0x12,(%esp)

36: e8 fc ff ff ff call 37

3b: 83 c4 14 add $0x14,%esp

3e: 5b pop %ebx

3f: 5d pop %ebp

40: c3 ret

蓝色部分仍然是自加的代码++*counter，和全局变量的counter一样，都是需要将外部变量的值读入到寄存器中，然后进行运算，再存入到寄存器中。

至此，我们得出结论，编译器在处理外部变量的时候，每次都需要重新读取到寄存器中，然后再使用。

转载于:https://blog.51cto.com/10495461/1720701

weixin_34236869

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
编译器对外部变量的处理

一直以来对锁比较感兴趣。因为在多核编程中，锁是一个可恨有可爱的东西。说它可恨，是因为锁的使用，降低了并发性，也就降低了性能。可爱之处呢，因为锁的使用有时是无法避免的。那么如何实现一个高性能的锁又是一个很有意思的问题。以后有机会，再跟大家交流一下锁的实现部分。今天是我在看spinlock的时候，突然想到的问题。这个问题不局限于spinlock，下面的示例我使用mutex来代替。...
复制链接

扫一扫