Model，gcc的__sync_synchronize要慎用。

最新推荐文章于 2021-10-23 10:56:26 发布

喵喵d喵喵

最新推荐文章于 2021-10-23 10:56:26 发布

阅读量979

点赞数 1

分类专栏：网络

网络专栏收录该内容

142 篇文章 0 订阅

订阅专栏

还是说Memory

29 JANUARY 2012 on C/C++

当我们在做多线程编程的时候，会涉及到一个称为memory order的问题。

例如

int x=0,y=0;
x=4;
y=3;

请问，实际执行的时候，这两条赋值语句谁先执行，谁后执行？会不会有某个时间点，在某个CPU看来，y比x大？

答案很复杂。本文的目的是从非常实践的角度来考虑这个问题。

首先，它分为两个层面。在编译器看来，x和y是两个没有关联的变量，那么编译器有权利调整这两行代码的执行顺序，只要它乐意。

其次，CPU也有权利这么做。

如果我非要严格要求顺序，那么就应该插入一个memory barrier

int x=0,y=0;
x=4;
//在此插入memory barrier指令
y=3;

下面要论述，中间那行怎么写。请耐心看下去，因为大多数人都在瞎整。

gcc的手册中有一节叫做"Built-in functions for atomic memory access"，然后里面列举了这样一个函数：

__sync_synchronize (...)
This builtin issues a full memory barrier.

来，我们写段代码试下：

int main(){
  __sync_synchronize();
  return 0;
}

然后用gcc4.2编译，

# gcc -S -c test.c

然后看对应的汇编代码，

main:
  pushq %rbp
  movq %rsp, %rbp
  movl $0, %eax
  leave
  ret

嗯？Nothing at all !!! 不信你试一试，我的编译环境是Freebsd 9.0 release, gcc (GCC) 4.2.1 20070831 patched [FreeBSD]。好，我换个高版本的gcc编译器试一试，gcc46 (FreeBSD Ports Collection) 4.6.3 20120113 (prerelease)

main:
pushq %rbp
movq %rsp, %rbp
mfence
movl $0, %eax
popq %rbp
ret

看，多了一行，mfence。怎么回事呢？这是gcc之前的一个BUG：http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793 。 2008年被发现，然后修复的。其实它之所以是一个BUG，关键在于gcc的开发者很迷惑，mfence在x86 CPU上到底有没有用？有嘛用？说到这里，我们得到一个结论：gcc的__sync_synchronize()尽量别用，因为你的代码在低版本的gcc下会有BUG。大部分人用的gcc都比4.4低。从CentOS 6开始，默认的编译器才是gcc 4.4。

那么mfence到底能不能提供我们想要的结果呢？之前intel的手册一直语焉不详，没说清楚。

最新的手册对mfence的解释是：

"Serializes all store and load operations that occurred prior to the MFENCE instruction in the
program instruction stream"

并且特别强调，这个指令影响的是data memory子系统，而不是指令执行流。

对于单个CPU来说，

"Reads cannot pass earlier MFENCE instructions"
"Writes cannot pass earlier MFENCE instructions. "
"MFENCE instructions cannot pass earlier reads or writes"

而对于多个CPU来说，

Individual processors use the same ordering principles as in a single-processor system.
Writes by a single processor are observed in the same order by all processors.
Writes from an individual processor are NOT ordered with respect to the writes from other processors.
Memory ordering obeys causality (memory ordering respects transitive visibility).
Any two stores are seen in a consistent order by processors other than those performing the stores

简单点说，对于单个CPU，即便你不用mfence，写入顺序也是保证的。

假如你在C++中，

std::string* str=new std::string();

那么不会出现str指针已经被赋值但是它指向的对象还未被初始化好的情况。

另一个有趣的问题是，gcc有一个汇编指令是用来控制内存顺序的，请看这段文档：

Accesses to non-volatile objects are not ordered with respect to volatile accesses. You cannot use a volatile object as a memory barrier to order a sequence of writes to non-volatile memory. For instance:

int *ptr = something;
volatile int vobj;
*ptr = something;     
vobj = 1;

Unless *ptr and vobj can be aliased, it is not guaranteed that the write to *ptr occurs by the time the update of vobj happens. If you need this guarantee, you must use a stronger memory barrier such as:

int *ptr = something;     
volatile int vobj;      
*ptr = something;     
asm volatile ("" : : : "memory");    
vobj = 1;

经我测试，

asm volatile ("" : : : "memory");

并不生成任何汇编代码。也就是说，这个仅仅是给编译器看的。

为了进一步证实我的观点，请看如下从Intel的Threading Building Blocks函数库中摘取的代码：

#define __TBB_compiler_fence() __asm__ __volatile__("": : :"memory")
#define __TBB_control_consistency_helper() __TBB_compiler_fence()
#define __TBB_acquire_consistency_helper() __TBB_compiler_fence()
#define __TBB_release_consistency_helper() __TBB_compiler_fence()

#ifndef __TBB_full_memory_fence
#define __TBB_full_memory_fence() __asm__ __volatile__("mfence": : :"memory")
#endif

能同时起编译器和硬件内存屏障作用的是

__asm__ __volatile__("mfence": : :"memory")

注意：mfence！

另外，我们在intel cpu上用的CAS指令都是带lock前缀的。所以在使用CAS的时候完全不必考虑memory order的问题。

最后推荐一篇文章：Mathematizing C++ Concurrency　第一作者是剑桥的某在读博士。

changming

Share this post

Websocket协议简介

今天@julyclyde 在微博上问我websocket的细节。但是这个用70个字是无法说清楚的，所以就整理在这里吧。恰好我最近要重构年前写的websocket的代码。众所周知，HTTP是一种基于消息(message)的请求(request )/应答(response)协议。当我们在网页中点击一条链接（或者提交一个表单）的时候，浏览器给服务器发一个request message，…

开始学习Libevent

虽然基于reactor模式的异步IO被吹的神乎其神，但是我一直对这些东西不怎么感冒。因为我一直是在写后台服务，我所需要处理的并发连接数也就10来个，那么干嘛要用它呢？我还是比较偏好于thread-per-connection模式。我想把ACE抛掉，换一个轻量级一点的网络库。看了一圈，都在推荐libevent/libev。然后就想，要不就拿libevent写一个简单的http server吧。其实也不是我写，就是把网上现成的一些库，组装起来而已。 http: https:…

喵喵d喵喵

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Model，gcc的__sync_synchronize要慎用。

MENUHomeSUBSCRIBEMENU还是说Memory 29 JANUARY 2012 on C/C++当我们在做多线程编程的时候，会涉及到一个称为memory order的问题。例如int x=0,y=0;x=4;y=3;请问，实际执行的时候，这两条赋值语句谁先执行，谁后执行？会不会有某个时间点，在某个CPU看来，y比x大？答案很
复制链接

扫一扫