内存屏障--- asm volatile("" ::: "memory")

最新推荐文章于 2025-09-22 14:34:45 发布

原创最新推荐文章于 2025-09-22 14:34:45 发布 · 1.9w 阅读

38 ·

CC 4.0 BY-SA版权

C/C++ 同时被 2 个专栏收录

89 篇文章

订阅专栏

开发工具

54 篇文章

订阅专栏

本文深入探讨了内存屏障在不同架构上的应用，特别是x86和x64架构，阐述了如何使用GCC inline assembler来防止指令重排序，并通过示例展示了如何在代码中实现内存屏障。同时，文章还解释了volatile关键字的作用以及如何在非原子操作中确保数据一致性。

Compiler memory barrier

These barriers prevent a compiler from reordering instructions, they do not prevent reordering by CPU.

The GNU inline assembler statement

asm volatile("" ::: "memory");

or even

__asm__ __volatile__ ("" ::: "memory");

forbids GCC compiler to reorder read and write commands around it.

  #define set_mb(var, value) do { var = value; mb(); } while (0)
  #define mb() __asm__ __volatile__ ("" : : : "memory")

1）set_mb(), mb(), barrier() 函数追踪到底，就是__asm__volatile__("" ::: "memory") 而这行代码就是内存屏障。

2）__asm__用于指示编译器在此插入汇编语句
3）__volatile__用于告诉编译器，严禁将此处的汇编语句与其它的语句重组合优化。即：原原本本按原来的样子处理这这里的汇编。
4） memory 强制 gcc 编译器假设 RAM 所有内存单元均被汇编指令修改，这样 cpu 中的 registers 和 cache 中已缓存的内存单元中的数据将作废。cpu 将不得不在需要的时候重新读取内存中的数据。这就阻止了 cpu 又将 registers, cache 中的数据用于去优化指令，而避免去访问内存。
5）""::: 表示这是个空指令。barrier() 不用在此插入一条串行化汇编指令。在后文将讨论什么叫串行化指令。

6）__asm__, __volatile__, memory 在前面已经解释.

有一帖子专门讨论了这个问题:

点击打开链接

摘一部分:

Well, a memory barrier is only needed on architectures that have weak memory ordering. x86 and x64 don't have weak memory ordering. on x86/x64 all stores have a release fence and all loads have an acquire fence. so, you should only really need asm volatile ("" : : : "memory")

For a good overview of both Intel and AMD as well as references to the relavent manufacturer specs, see http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/

Generally things like "volatile" are used on a per-field basis where loads and stores to that field are natively atomic. Where loads and stores to a field are already atomic (i.e. the "operation" in question is a load or a store to a single field and thus the entire operation is atomic) volatile or memory barriers are not needed on x86/x64. Portable code not withstanding.

When it comes to "operations" that are not atomic--e.g. loads or stores to a field that is larger than a native word or loads or stores to multiple fields within an "operation"--a means by which the operation can be viewed as atomic are required regardless of CPU architecture. generally this is done by means of a synchronization primitive like a mutex. Mutexes (the ones I've used) include memory barriers to avoid issues like processor reordering so you don't have to add extra memory barrier instructions. I generally consider not using synchronization primitives a premature optimization; but, the nature of premature optimization is, of course, 97% of the time :)

Where you don't use a synchronization primitive and you're dealing with a multi-field invariant, memory barriers that ensure the processor does not reorder stores and loads to different memory locations is important.

Now, in terms of not issuing an "mfence" instruction in asm volatile but using "memory" in the clobber list. From what I've been able to read

If your assembler instructions access memory in an unpredictable fashion, add `memory' to the list of clobbered registers. This will cause GCC to not keep memory values cached in registers across the assembler instruction and not optimize stores or loads to that memory.

When they say "GCC" and don't mention anything about the CPU, this means it applies to only the compiler. The lack of "mfence" means there is no CPU memory barrier. You can verify this by disassembling the resulting binary. If no "mfence" instruction is issue (depending on the target platform) then it's clear the CPU is not being told to issue a memory fence.

Depending on the platform you're on and what you're trying to do, there maybe something "better" or more clear... portability not withstanding.