Memory Model

Memory Model

1. C++ Memory Model Synchronization Modes

1.1 std::memory_order_seq_cst

This is the default mode used when none is specified, and it is the most restrictive. 1

From a practical point of view, this amounts to all atomic operations acting as optimization barriers: 1

  • Its OK to re-order things between atomic operations, but not across the operation.

  • Thread local stuff is also unaffected since there is no visibility to other threads.

1.2 std::memory_order_relaxed

This model allows for much less synchronization by removing the happens-before restrictions. 1

Without any happens-before edges:

  • no thread can count on a specific ordering from another thread.

  • The only ordering imposed is that once a value for a variable from thread 1 is observed in thread 2, thread 2 can not see an “earlier” value for that variable from thread 1. 1

There is also the presumption that relaxed stores from one thread are seen by relaxed loads in another thread within a reasonable amount of time. 1

  • That means that on non-cache-coherent architectures, relaxed operations need to flush the cache (although these flushes can be merged across several relaxed operations)

The relaxed mode is most commonly used when the programmer simply wants an variable to be atomic in nature rather than using it to synchronize threads for other shared memory data. 1

1.3 std::memory_order_acquire/release/acq_rel

The third mode is a hybrid between the other two. The acquire/release mode is similar to the sequentially consistent mode, except it only applies a happens-before relationship to dependent variables. This allows for a relaxing of the synchronization required between independent reads of independent writes. 1

The interactions of non-atomic variables are still the same. Any store before an atomic operation must be seen in other threads that synchronize. 1

1.4 std::memory_order_consume

std:memory_order_consume is a further subtle refinement in the release/acquire memory model that relaxes the requirements slightly by removing the happens before ordering on non-dependent shared variables as well. 1

2. atomic

2.1 Atomic Variable

Atomic variables are primarily used to synchronize shared memory accesses between threads. 1

2.2 Atomic Operation

A memory operation can be non-atomic even when performed by a single CPU instruction. 2

2.3 Operations on The Same Atomic Variable

Hence the memory model was designed to disallow visible reordering of operations on the same atomic variable:

  • All changes to a single atomic variable appear to occur in a single total modification order, specific to that variable. This is introduced in 1.10p5, and the last non-note sentence of 1.10p10 states that loads of that variable must be consistent with this modification order. 3

3. Sequentially Consistent

Sequential consistency means that all threads agree on the order in which memory operations occurred, and that order is consistent with the order of operations in the program source code. 4

4. Sequentially Consistent Memory Model

In a sequentially consistent memory model, there is no memory reordering. It’s as if the entire program execution is reduced to a sequential interleaving of instructions from each thread. In particular, the result r1 = r2 = 0 from Memory Reordering Caught in the Act becomes impossible. 5

In any case, sequential consistency only really becomes interesting as a software memory model, when working in higher-level programming languages. In Java 5 and higher, you can declare shared variables as volatile. In C++11, you can use the default ordering constraint, memory_order_seq_cst, when performing operations on atomic library types. If you do those things, the toolchain will restrict compiler reordering and emit CPU-specific instructions which act as the appropriate memory barrier types. In this way, a sequentially consistent memory model can be “emulated” even on weakly-ordered multicore devices. 5

5. Sequenced Before

If a and b are performed by the same thread, and a “comes first”, we say that a is sequenced before b. 3

C++ allows a number of different evaluation orders for each thread, notably as a result of varying argument evaluation order, and this choice may vary each time an expression is evaluated. Here we assume that each thread has already chosen its argument evaluation orders in some way, and we simply define which multi-threaded executions are consistent with this choice. Even then, there may be evaluations in the same thread, neither one of which is sequenced before the other. Thus sequenced-before is only a partial order, even when only the evaluations of a single thread are considered. But for the purposes of this discussion all of this can generally be ignored. 3

6. Happens-Before

The multi-threaded version of the sequenced-before relation. 3

If a happens before b, then b must see the effect of a, or the effects of a later action that hides the effects of a. 3

An evaluation a can happen before b either because they are executed in that order by a single thread, i.e a is sequenced before b, or because there is an intervening communication between the two threads that enforces ordering. 3

6.1 The Common Definition of The Happens-Before Relation

Let A and B represent operations performed by a multithreaded process. If A happens-before B, then the memory effects of A effectively become visible to the thread performing B before B is performed. 6

If operations A and B are performed by the same thread, and A’s statement comes before B’s statement in program order, then A happens-before B. 6

6.2 Happens-Before Does Not Imply Happening Before

参考 6

注: 这里的例子中的 Happens-Before 与 Happening Before 不一致是指令重排后源代码中的赋值顺序与编译后的指令的循序不一致导致的, 并且实际运行的指令的语义也与源码中的语义有差别, 最后的实际执行效果符合源码语义的 Happens-Before. 编译生的汇编码中的两条赋值语句实际上是解耦的, 即便发生运行时的乱序也不影响在本线程中的 Happens-Before.

6.3 Happening Before Does Not Imply Happens-Before

The happens-before relationship only exists where the language standards say it exists. 6

7. Synchronizes With Relation

7.1 Communication Between Threads

A thread T1 normally communicates with a thread T2 by assigning to some shared variable x and then synchronizing with T2. Most commonly, this synchronization would involve T1 acquiring a lock while it updates x, and then T2 acquiring the same lock while it reads x. Certainly any assignment performed prior to releasing a lock should be visible to another thread when acquiring the lock. 3

We describe this in several stages: 3

  1. Any side effect such as the assignment to x performed by a thread before it releases the lock, is sequenced before the lock release, and hence happens before it.

  2. The lock release operation synchronizes with the next acquisition of the same lock. The synchronizes with relation expresses the actual ordering constraints imposed by synchronization operations.

  3. The lock acquire operation is again sequenced before value computations such as the one that reads x.

In general, an evaluation a happens before an evaluation b if they are ordered by a chain of synchronizes with and sequenced-before relationships. 3

Atomic variables are another, less common, way to communicate between threads. Experience has shown that such variables are most useful if they have at least the same kind of acquire-release semantics as locks. In particular a store to an atomic variable synchronizes with a load that sees the written value. 3

7.2 Synchronizes-With

A release fence A synchronizes with an acquire fence B if there exist atomic operations X and Y, both operating on some atomic object M, such that A is sequenced before X, X modifies M, Y is sequenced before B, and Y reads the value written by X or a value written by any side effect in the hypothetical release sequence X would head if it were a release operation. 7

We just need a way to safely propagate modifications from one thread to another once they’re complete. That’s where the synchronizes-with relation comes in. 8

In every synchronizes-with relationship, you should be able to identify two key ingredients, which I like to call the guard variable and the payload. The payload is the set of data being propagated between threads, while the guard variable protects access to the payload. 8

An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A. 8

As for the condition that the read-acquire must “take its value from any side effect” – let’s just say it’s sufficient for the read-acquire to read the value written by the write-release. If that happens, the synchronized-with relationship is complete, and we’ve achieved the coveted happens-before relationship between threads. Some people like to call this a synchronize-with or happens-before “edge”. 8

Just as synchronizes-with is not only way to achieve a happens-before relationship, a pair of write-release/read-acquire operations is not the only way to achieve synchronizes-with; nor are C++11 atomics the only way to achieve acquire and release semantics. 8

Unlocking a mutex always synchronizes-with a subsequent lock of that mutex. 8

8. Acquire/Release Semantics

  • Acquire semantics is a property that can only apply to operations that read from shared memory, whether they are read-modify-write operations or plain loads. The operation is then considered a read-acquire. Acquire semantics prevent memory reordering of the read-acquire with any read or write operation that follows it in program order. 9

  • Release semantics is a property that can only apply to operations that write to shared memory, whether they are read-modify-write operations or plain stores. The operation is then considered a write-release. Release semantics prevent memory reordering of the write-release with any read or write operation that precedes it in program order. 9

Acquire and release semantics can be achieved using simple combinations of the memory barrier types as flow: 9

  • Acquire semantics

    #LoadLoad + #LoadStore placed after the read-acquire operation

  • Release semantics

    #LoadStore + #StoreStore placed before the write-acquire operation

注: 这里没有搞明白为什么需要 #LoadStore

The barriers must (somehow) be placed after the read-acquire operation, but before the write-release. [Update: Please note that these barriers are technically more strict than what’s required for acquire and release semantics on a single memory operation, but they do achieve the desired effect.] 9

These semantics are particularly suitable in cases when there’s a producer/consumer relationship, where one thread publishes some information and the other reads it. 4

8.1 C++ Specific Acquire Release Semantics

  • In C++11, a Release Fence Is Not Considered a “Release Operation”:

    • You might reasonably expect a release fence to be considered a “release operation”, but if you comb through the C++11 standard, you’ll find that it’s actually very careful not to call it that. 10

    • In the language of C++11, only a store can be a release operation, and only a load can be an acquire operation. (See §29.3.1 of working draft N3337.) A memory fence is neither a load nor a store, so obviously, it can’t be an acquire or release operation. Furthermore, if we accept that acquire and release semantics apply only to acquire and release operations, it’s clear that Raymond Chen’s definition does not apply to acquire and release fences. 10

  • Nor Can a Release Operation Take the Place of a Release Fence

    • A release operation only needs to prevent preceding memory operations from being reordered past itself, but a release fence must prevent preceding memory operations from being reordered past all subsequent writes. Because of this difference, a release operation can never take the place of a release fence. 10

9. Memory Reordering

Changes to memory ordering are made both by the compiler (at compile time) and by the processor (at run time), all in the name of making your code run faster. 11

Memory reordering goes largely unnoticed by programmers writing single-threaded code. It often goes unnoticed in multithreaded programming, too, since mutexes, semaphores and events are all designed to prevent memory reordering around their call sites. It’s only when lock-free techniques are used – when memory is shared between threads without any kind of mutual exclusion – that the cat is finally out of the bag, and the effects of memory reordering can be plainly observed. 11

9.1 Compiler Instruction Reordering

Compiler is free to reorder of instructions only in cases where single-threaded program behavior does not change. Such instruction reordering typically happens only when compiler optimizations are enabled. 11

The majority of function calls act as compiler barriers, whether they contain their own compiler barrier or not. 11

9.2 Processor Reordering

Like compiler reordering, processor reordering is invisible to a single-threaded program. It only becomes apparent when lock-free techniques are used – that is, when shared memory is manipulated without any mutual exclusion between threads. However, unlike compiler reordering, the effects of processor reordering are only visible in multicore and multiprocessor systems. 12

9.3. Four Types of Memory Barrier

Each type of memory barrier is named after the type of memory reordering it’s designed to prevent: for example, #StoreLoad is designed to prevent the reordering of a store followed by a load. 12

9.3.1 #LoadLoad

A LoadLoad barrier effectively prevents reordering of loads performed before the barrier with loads performed after the barrier. 12

9.3.2 #StoreStore

A StoreStore barrier effectively prevents reordering of stores performed before the barrier with stores performed after the barrier. 12

9.3.3 #LoadStore

A LoadStore barrier effectively prevents reordering of loads performed before the barrier with stores performed after the barrier.

On a real CPU, instructions which act as a #LoadStore barrier typically act as at least one of #LoadLoad or #StoreStore barrier type. 12

9.3.4 #StoreLoad

A StoreLoad barrier ensures that all stores performed before the barrier are visible to other processors, and that all loads performed after the barrier receive the latest value that is visible at the time of the barrier. In other words, it effectively prevents reordering of all stores before the barrier against all loads after the barrier, respecting the way a sequentially consistent multiprocessor would perform those operations. 12

On most processors, instructions that act as a #StoreLoad barrier tend to be more expensive than instructions acting as the other barrier types. 12

As Doug Lea also points out, it just so happens that on all current processors, every instruction which acts as a #StoreLoad barrier also acts as a full memory fence. 12

10. Strong Hardware Memory Model

Strong Hardware Memory Model Definition:

  • A strong hardware memory model is one in which every machine instruction comes implicitly with acquire and release semantics. As a result, when one CPU core performs a sequence of writes, every other CPU core sees those values change in the same order that they were written. 5

Under the above definition, the x86/64 family of processors is usually strongly-ordered. There are certain cases in which some of x86/64’s strong ordering guarantees are lost, but for the most part, as application programmers, we can ignore those cases. 5

11. 总结

11.1 内存排序

在多线程程序的执行过程中, 每一线程均可能对全局内存进行操作, 所有的这些操作构成一个集合 S.

  • 对于一次程序执行, 每一线程均可在 S 上定义一个偏序, 该偏序关系来自两个方面:

    1. 发生于该线程的内存操作指令的实际执行顺序.

      该顺序可能与代码编译产生的指令顺序不一致, 并且每次程序运行, 该顺序可能发生变化. 可通过在内存操作指令间加入 memory barrier 的方式限定指令的执行顺序.

    2. 由该线程的内存操作结果可推断出的发生于其他线程的内存操作的顺序. 该顺序可能与实际所在的线程中的顺序不同.

  • 对于分别执行于不同线程的内存操作, 可以通过线程间的同步(The Synchronizes-With Relation)定义偏序关系.

对于 sequentially consistent 内存模型, 每次程序执行, 所有线程所定义的偏序关系之间不存在矛盾.

11.2 Acquire/Release 语义

  • 可通过对 atomic 变量的 acquire 模式读获得 acquire 语义

  • 可通过对 atomic 变量的 release 模式写获得 release 语义

  • 也可通过 relaxed 模式操作 atomic 变量 + memory barrier 的方式获得 acquire / release 语义

  • 可通过 acquire / release 语义获得两个线程的同步(the synchronizes-with relation)

  • 可通过 synchronizes-with relation 获得跨线程的两个内存操作之间的 happens-before relation

  • 可通过 sequenced-before relation 获得同一个线程执行的两个内存操作之间的 happens-before relation


  1. Memory model synchronization modes ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎

  2. Atomic vs. Non-Atomic Operations ↩︎

  3. N2480: A Less Formal Explanation of the Proposed C++ Concurrency Memory Model ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎

  4. An Introduction to Lock-Free Programming ↩︎ ↩︎

  5. Weak vs. Strong Memory Models ↩︎ ↩︎ ↩︎ ↩︎

  6. The Happens-Before Relation ↩︎ ↩︎ ↩︎ ↩︎

  7. Acquire and Release Fences ↩︎

  8. The Synchronizes-With Relation ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎

  9. Acquire and Release Semantics ↩︎ ↩︎ ↩︎ ↩︎

  10. Acquire and Release Fences Don’t Work the Way You’d Expect ↩︎ ↩︎ ↩︎

  11. Memory Ordering at Compile Time ↩︎ ↩︎ ↩︎ ↩︎

  12. Memory Barriers Are Like Source Control Operations ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值