《ARMv8-A Memory systems》

1 ARMv8-A Memory systems

You should understand the operation of the memory system and access ordering in cases where your code interacts directly either with the hardware or with code executing on other cores, or if it directly loads or writes instructions to be executed, or modifies translation tables.

If you are an application developer, hardware interaction on an OS such as Linux is probably through a device driver, the interaction with other cores is through Pthreads or another multithreading API and the interaction with a paged memory system is through the operating system. In this case, the memory ordering issues are taken care of by the relevant code, however, this is not the case for all operating systems and you must check whether the same is true for the OS you work with.

However, if you are, for example, writing an operating system kernel or device drivers, or implementing a hypervisor, you must have a good understanding of the memory ordering rules of the ARM architecture.

Some reordering might be required when your code requires explicit ordering of memory accesses to be seen by cores or devices in the system.

2 The memory model

Compilers give you a wide range of options that aim to increase the speed, or reduce the size, of the executable files they generate. For each line in the source code, there are many possible choices of assembly instructions that could be used.


The ARMv8-A architecture employs a weakly ordered model of memory. This means that the order of memory accesses is not necessarily required to be the same as the program order for load and store operations.

ARMv8-A 架构采用弱有序的内存模型。这意味着内存访问的顺序不一定需要与加载和存储操作的程序顺序相同

1、对于ARM,只要没有依赖关系,对指令的执行顺序没有要求,load指令(以"L"表示)和store指令(以"S"表示)可任意交换,属于relaxed model,俗称weak order。
2、对于x86中,对于同一CPU执行的load指令后接load指令(L-L),store指令后接store指令(S-S),load指令后接store指令(L-S),都是不能交换指令的执行顺序的,只有store指令后接load指令(S-L)时才可以[注1]。这种memory order被称为TSO(Total Store Order),俗称strong order。


During the optimization process, the processor and system elements can reorder memory read operations with respect to each other to improve data throughput. Writes can also be reordered. This means that the required bandwidth between the processor and external memory can be reduced and the long latencies that are associated with such external memory accesses are hidden.


To ensure that reordering can take place, there must be memory types that allow such optimizations to take place in them.


Hardware can reorder reads and writes to Normal memory. Reads and writes can also be ordered by address dependencies, and half barriers. However, the existence of either data dependencies or explicit memory barrier instructions can override this. Certain situations require stronger ordering rules. You can provide information to the core about this through the memory type attribute of the translation table entry that describes that memory.

硬件可能对Normal memory的读取和写入重新排序。读取和写入也可以按地址依赖性和半屏障排序。但是,数据相关性或显式内存屏障指令的存在可以覆盖这一点。某些情况需要更强的排序规则。您可以通过描述该内存的转换表条目的内存类型属性向内核提供有关此信息。

High-performance systems can support techniques such as speculative memory reads, multiple issuing of instructions, or out-of-order execution and these, along with other techniques, offer further possibilities for hardware reordering of memory access:


Multiple issue of instructions

Processors can issue and execute multiple instructions per cycle. Some instructions can reach the execution stage of the pipeline in parallel, as a result they may execute in a different order to their order in the program.


Out-of-order execution

Many processors support out-of-order execution of non-dependent instructions. Because of the multiple issue of instructions, some instructions can stall in the execution stage, while they wait for others to complete, but these will not stop non-dependent instructions from completing.This can also change the program order.



When the processor encounters a conditional instruction, such as a branch, it can begin to execute instructions before it knows for certain whether that particular instruction is executed or not.


The result is therefore available sooner if conditions prove that the speculation was correct.


Instruction fetch speculation is the fetch of instructions that are not defined by the program execution order.


Speculative loads

If a load instruction that reads from a Cacheable location is speculatively executed, this can result in a cache linefill and potenti of an existing cache line.


Load and store optimizations

As reads and writes to external memory can have a long latency, processors can reduce the number of transfers for example, by merging together several stores into one larger transaction.


External memory systems

In many System on Chip (SoC) devices, there are several agents capable of initiating transfers and multiple routes to the slave devices that are read or written.

在许多片上系统 (SoC) 设备中,有几个代理能够启动传输和多条路由到读取或写入的从设备。

Some of these devices, such as a DRAM controller, are capable of accepting simultaneous requests from different masters. Transactions can be buffered, or reordered.

其中一些设备,例如 DRAM 控制器,能够同时接受来自不同主机的请求。事务可以被缓冲或重新排序。

This means that accesses from different masters can therefore take varying numbers of cycles to complete and might overtake each other.


Cache coherent multi-core processing

In a cluster, hardware cache coherency can migrate cache lines between cores.


Different cores might see updates to cached memory locations in a different order to each other.


Also, these might not be coherent with external memory.


Optimizing compilers

An optimizing compiler can reorder instructions to hide latencies or make best use of hardware features.


It can often move a memory access forward, to make it earlier, and give it more time to complete before the value is required.


They can also have instruction scheduling that can take advantage of specific core multi-issue pipelines.


In a single core system, the effects of such reordering are transparent to the programmer, because the individual processor can check for hazards and ensure that data dependencies are respected. However, in cases where you have multiple cores that communicate through shared memory, or share data in other ways, memory ordering considerations become more important.


3 Memory types

The ARMv8-A architecture defines two mutually exclusive memory types, Normal and Device and all regions of memory are configured as one or the other of these two types.

ARMv8-A 架构定义了两种互斥的内存类型,Normal 和 Device,并且所有内存区域都配置为这两种类型中的一种。

3.1 Normal memory

Normal memory is used for all code and for most data regions in memory. Examples of Normal memory include areas of RAM, Flash, or ROM in physical memory. This kind of memory provides the highest processor performance as it is weakly ordered and the compiler can perform more optimizations. The processor can reorder, repeat, and merge accesses to Normal memory.

Normal memory用于所有代码和内存中的大多数数据区域。正常内存的示例包括物理内存中的 RAM、闪存或 ROM 区域。这种内存提供了最高的处理器性能,因为它是弱排序的,编译器可以执行更多的优化。处理器可以重新排序、重复和合并对普通内存的访问。

The processor can speculatively access address locations that are marked as Normal, so that data or instructions can be read from memory without being explicitly referenced in the program, or before the actual execution of an explicit reference. Such speculative accesses can occur as a result of branch prediction, speculative cache linefills, out-of-order data loads, or other hardware optimizations.

处理器可以推测性地访问标记为 Normal 的地址位置,以便可以从内存中读取数据或指令,而无需在程序中显式引用,或者在显式引用的实际执行之前。这种推测性访问可能是分支预测、推测性缓存行填充、无序数据加载或其他硬件优化的结果。

For best performance, always mark application code and data as Normal. In circumstances where enforced memory ordering is required, do this by using explicit barrier operations. Normal memory accepts weakly ordered memory accesses without any issues. There is no requirement for Normal accesses to complete in order with respect to either other Normal accesses or to Device accesses.


However, the processor must always handle hazards that are caused by address dependencies. For example, consider the following simple code sequence:


STR X0, [X2] LDR X1, [X2]

A single processor running a single thread always ensures that the value that is placed in X1 is the value that was written from register X0 through to the address stored in X2.

运行单个线程的单个处理器始终确保放置在 X1 中的值是从寄存器 X0 写入到存储在 X2 中的地址的值。

This applies to more complex dependencies. Consider the following code:


ADD X4, X3, #3 ADD X5, X3, #2 ... STR X0, [X3] STRB W1, [X4] LDRH W2, [X5]

In this case, the accesses take place to addresses that overlap each other. The processor must ensure that the memory is updated as if the STR and STRB occurred in order, so that the LDRH returns the most up-to-date value. It would still be valid for the processor to merge the STR and STRB into a single access that contained the latest, correct data written.


3.2 Device memory

The Device memory type is used with memory-mapped peripherals and all memory regions where an access might have a side effect. For example, a read to a timer is not repeatable, as it returns different values for each read. A write to a control register can trigger an interrupt. The Device memory type imposes more restrictions on the core.


Accesses to these types of memory must occur exactly the number of times that executing the program suggests they should. Two writes to the same location must be performed as two writes, and two reads from the same location must both take place. This is important when you are accessing peripheral control registers.


There is however no guarantee about ordering between memory accesses to different devices, or usually between accesses of different memory types.


Speculative data accesses cannot be performed to regions of memory that are marked as Device.


Trying to execute code from a region marked as Device is UNPREDICTABLE.

尝试从标记为 Device 的区域执行代码是不可预测的。

When an instruction can result in UNPREDICTABLE behavior, the ARM architecture can specify a narrow range of permitted behaviors. This is defined as a number of CONSTRAINED UNPREDICTABLE behaviors. The implementation can either handle the instruction fetch as if it were to a memory location with the normal Non-cacheable attribute, or it can take a permission fault.

当指令可能导致不可预测的行为时,ARM 体系结构可以指定一个狭窄的允许行为范围。这被定义为许多受约束的不可预测的行为。该实现可以处理指令获取,就好像它是具有正常的不可缓存属性的内存位置一样,或者它可以采取权限错误。

There are four different types of device memory, defining the rules which memory accesses must obey.


As the memory type weakens those rules are relaxed.


Device-nGnRnE is the most restrictive.



Device-GRE least restrictive

The letter suffixes refer to the following three properties:


Gathering or non-Gathering (G or nG)

This determines whether multiple accesses can be merged into a single transaction for this memory region. If the address is marked as non-Gathering (nG), then the number and size of accesses that are performed to that location must exactly match the number and size of explicit accesses in the code. If the address is marked as Gathering (G), then the processor can, for example, merge two byte writes into a single halfword write.

这决定了是否可以将多次访问合并到该内存区域的单个事务中。如果地址被标记 non-Gathering(nG),则对该位置执行的访问次数和大小必须与代码中显式访问的次数和大小完全匹配。如果地址被标记为Gathering (G),那么处理器可以,例如,将两个字节写入合并为一个半字写入。

Reordering (R or nR)

This determines whether accesses to the same device can be reordered with respect to each other. If the address is marked as non-Reordering (nR), then accesses within the same block always appear on the bus in program order. The size of this block is IMPLEMENTATION DEFINED. Where the size of this block is large, it could span several table entries. In this case, the ordering rule is observed with respect to any other accesses also marked as nR.

这决定了对同一设备的访问是否可以相互重新排序。如果地址被标记non-Reordering(nR),则同一块内的访问总是按程序顺序出现在总线上。此块的大小是IMPLEMENTATION DEFINED。在这个块的大小很大的地方,它可以跨越几个表条目。在这种情况下,对于也标记为 nR 的任何其他访问,遵守排序规则。

Early Write Acknowledgement (E or nE)

This determines whether an intermediate write buffer between the processor and the device being accessed is allowed to send an acknowledgement of a write completion.


If the address is marked as non-Early Write Acknowledgement (nE), then the write response must come from the peripheral. If the address is marked as Early Write Acknowledgement (E), then it is a buffer in the interconnect logic can signal write acceptance, before the write actually being received by the end device. This is essentially a message to the external memory system.

如果地址标记non-Early Write Acknowledgement(nE),则写入响应必须来自外设。如果地址被标记为Early Write Acknowledgement (E),那么它是互连逻辑中的缓冲区,可以在终端设备实际接收写入之前发出写入接受信号。这本质上是向外部存储系统发送的消息。

4 Memory attributes

The memory map of a system can be divided into several regions. Each region can have different memory attributes, such as access permissions that include read and write permissions for different privilege levels, memory type, and cache policies.


The following figure shows an example system memory map:

Functional pieces of code and data are grouped in the memory map and the attributes for each of these areas are controlled separately by the Memory Management Unit.


In addition to the memory type, memory attributes also provide control over cacheability, shareability, access, and execution permissions. Shareable and cache properties apply only to Normal memory. Device regions are always Non-cacheable and Outer-shareable. For Cacheable locations, you can use attributes to indicate cache allocation policy to the processor.


4.1 Cacheable and shareable memory attributes

Regions of memory that are marked as Normal can be specified as either cached or non-cached. Memory caching can be separately controlled through inner and outer attributes, for multiple levels of cache. The division between inner and outer is IMPLEMENTATION DEFINED, but typically the inner attributes are used by caches in the processor. The outer attributes are used by external memory where they can be used by caches external to the core or cluster.


The shareable attribute is used to define whether a location is shared with multiple cores. Marking a region as Non-shareable means that it is only used by a particular core, whereas marking it as Inner Shareable or Outer shareable, or both, means that the location is shared with other observers, for example, a GPU or DMA device might be considered another observer.

shareable 属性用于定义一个位置是否与多个core共享。将区域标记为不可共享意味着它仅由特定核心使用,而将其标记为内部可共享或外部可共享,或两者兼而有之,意味着该位置与其他观察者共享,例如,GPU 或 DMA 设备可能被视为另一个观察者。

The division between inner and outer is also IMPLEMENTATION DEFINED. These attributes define sets of observers for which the shareability attributes make the caches transparent for data accesses. This also means that the system must provide hardware coherency management so that cores in the Inner Shareable domain see a coherent copy of locations that are marked as Inner Shareable.

内部和外部之间的划分也是实现定义的。这些属性定义了一组观察者,其共享属性使缓存对数据访问透明。这也意味着系统必须提供硬件一致性管理,以便 Inner Shareable 域中的核心看到标记为 Inner Shareable 的位置的一致副本。

If a processor or other master in the system does not support coherency, then it must treat the shareable regions as Non-cacheable.


4.2 Domains

Data memory accesses can take longer and consume more power with cache coherency hardware than they otherwise would do. This overhead can be minimized by maintaining coherency between a smaller number of masters while ensuring that they are physically close in the processor. For this reason, the architecture splits the system into domains, and makes it possible to limit the overhead to those locations where the coherency is required.


Shareability is assigned to each memory transaction in the system, based on:

Memory attributes for the region accessed (determined by MMU translation tables).

Core configuration (can differ between cores in a cluster).

Implementation of interconnect.

Integration between interconnect and the masters that are connected to it.

But there are also specific operations that can be performed with a domain defining their scope.


The following shareability domain options are available:


A domain consisting only of the local agent. Accesses that never require synchronization with other cores, processors, or devices. This domain is not typically used in Symmetric Multi-Processing (SMP) systems.

NoteSMP is a software architecture that dynamically determines the roles of individual cores. Each core in the cluster has the same view of memory and of shared hardware. Any application, process, or task can run on any core and the operating system scheduler can migrate tasks between cores to achieve optimal system load.

Inner Shareable

Outer Shareable

Full system

An operation on the full system affects all observers in the system.

5 Barriers

The ARM architecture includes barrier instructions to force access ordering and access completion at a specific point.

ARM 体系结构包括barrier指令,用于在特定点强制访问排序和访问完成。

Barriers are used to prevent unsafe optimizations from occurring and to enforce a specific memory ordering. Use of unnecessary barrier instructions can therefore reduce software performance. Consider carefully whether a barrier is necessary in a specific situation, and if so, which is the correct barrier to use.


There are three types of barrier instruction.


5.1 Instruction Synchronization Barrier (ISB)

This is used to guarantee that any subsequent instructions are fetched, so that privilege and access are checked with the current MMU configuration. It is used to ensure any previously executed context-changing operations, such as writes to system control registers, have completed by the time the ISB completes.

这用于保证获取任何后续指令,以便使用当前 MMU 配置检查特权和访问。它用于确保任何先前执行的上下文更改操作,例如写入系统控制寄存器,在 ISB 完成时已经完成。

In hardware terms, for example, this might mean that the instruction pipeline is flushed. Typical uses of this would be in memory management, cache control, and context switching code, or where code is being moved about in memory.


The following example shows how to enable the floating-point unit and SIMD, which you can do in AArch64 by writing to bit [20] of the CPACR_EL1 register. The ISB is a context synchronization event that guarantees that the enable is complete before any subsequent FPU or NEON instructions are executed.

以下示例显示如何启用浮点单元和 SIMD,您可以在 AArch64 中通过写入 CPACR_EL1 寄存器的位 [20] 来执行此操作。ISB 是一个上下文同步事件,可确保在执行任何后续 FPU 或 NEON 指令之前完成启用。

MRS X1, CPACR_EL1               // Copy contents of CPACR to X1
ORR X1, X1, #(0x3 << 20)  // Write to bit 20 of X1. (Enable FPU and SIMD)
SR CPACR_EL1, X1               // Write contents of X1 to CPACR

An ISB flushes the pipeline and ensures that the effects of any completed context-changing operation before the ISB are visible to any instruction after the ISB. Instructions from the cache or memory are refetched.

ISB 刷新流水线并确保在 ISB 之前完成的任何上下文更改操作的效果对 ISB 之后的任何指令都是可见的。来自高速缓存或内存的指令被重新获取。

It also ensures that any context-changing operations after the ISB instruction only take effect after the ISB has completed and are not seen by instructions before the ISB.

它还确保 ISB 指令之后的任何上下文更改操作仅在 ISB 完成后才生效,并且不会被 ISB 之前的指令看到。

This does not mean that an ISB is required after each instruction that modifies a processor register. For example, reads or writes to PSTATE fields, ELRs, SPs, and SPSRs always occur in program order relative to other instructions.

这并不意味着在修改处理器寄存器的每条指令之后都需要 ISB。例如,对 PSTATE 字段、ELR、SP 和 SPSR 的读取或写入始终按照相对于其他指令的程序顺序进行。

5.2 Data Memory Barrier (DMB)

This prevents reordering of data accesses instructions across the DMB instruction. All data accesses, that is, loads or stores, but not instruction fetches, performed by this processor before the DMB, are visible to all other masters within the specified shareability domain before any of the data accesses after the DMB.

这可以防止跨 DMB 指令重新排序数据访问指令。在 DMB 之前由该处理器执行的所有数据访问,即加载或存储,但不是指令提取,在 DMB 之后的任何数据访问之前对指定可共享域内的所有其他主控器都是可见的。

For example:

LDR X0, [X1] // Must be seen by the memory system before the
// STR below.
ADD X2, #1 // May be executed before or after the memory
//system sees LDR.
STR X3, [X4] // Must be seen by the memory system after the
// LDR above.

It also ensures that any explicit preceding data or unified cache maintenance operations have completed before any subsequent data accesses are executed.


For example:

DC CSW, X5 // Data clean by Set/way
LDR x0, [X1] // Effect of data cache clean might not be seen by
// this instruction
LDR X2, [X3] // Effect of data cache clean are seen by this
// instruction

5.3 Data Synchronization Barrier (DSB)

DSB enforces the same ordering as the Data Memory Barrier, but it also blocks execution of any further instructions, not just loads or stores, until synchronization is complete. This can be used to prevent execution of a SEV instruction, for instance, that would signal to other cores that an event occurred. It waits until all cache, TLB, and branch predictor maintenance operations that are issued by this processor have completed for the specified shareability domain.

DSB 强制执行与数据存储器屏障相同的顺序,但它也阻止任何进一步指令的执行,而不仅仅是加载或存储,直到同步完成。这可用于防止执行 SEV 指令,例如,该指令将向其他内核发出事件发生的信号。它一直等到此处理器发出的所有高速缓存、TLB 和分支预测器维护操作都已针对指定的可共享域完成。

For example:

DC ISW, X5 // operation must have completed before DSB can
// complete STR
STR X0, [X1] // Access must have completed before DSB can complete
ADD X2, X2, #3 // Cannot be executed until DSB completes

5.4 Using barriers

The DMB and DSB instructions take a parameter which specifies the types of access to which the barrier operates, before or after, and a shareability domain to which it applies.

DMB 和 DSB 指令采用一个参数,该参数指定屏障在之前或之后操作的访问类型,以及它适用的可共享域。

The available options are listed in the following table.



Ordered Accesses (before - after)

Shareability Domain


Operation that waits only for loads to complete, and only to the outer shareable domain

Load - Load, Load - Store

Outer Shareable


Operation that waits only for stores to complete, and only to the outer shareable domain.

Store - Store

Outer Shareable


Operation only to the outer shareable domain.

Any - Any

Outer Shareable


Operation that waits only for loads to complete and only out to the point of unification.

Load - Load, Load - Store



Operation that waits only for stores to complete and only out to the point of unification.

Store - Store



Operation only out to the point of unification.

Any - Any



Operation that waits only for loads to complete, and only to the Inner Shareable domain

Load -Load, Load - Store

Inner Shareable


Operation that waits only for stores to complete, and only to the Inner Shareable domain.

Store - Store

Inner Shareable


Operation only to the Inner Shareable domain.

Any - Any

Inner Shareable


Operation that waits only for loads to complete.

Load -Load, Load - Store

Full system


Operation that waits only for stores to complete.

Load -Load, Load - Store

Full system


Full system operation. This is the default and can be omitted.

Any - Any

Full system

The ordered access field specifies which classes of accesses the barrier operates on. There are three options.

Load - Load/Store

This means that the barrier requires all loads to complete before the barrier but does not require stores to complete. Both loads and stores that appear after the barrier in program order must wait for the barrier to complete.

Store - Store

This means that the barrier only affects store accesses and that loads can still be freely reordered around the barrier.

Any - Any

This means that both loads and stores must complete before the barrier. Both loads and stores that appear after the barrier in program order must wait for the barrier to complete.

A more subtle effect of the ordering rules is that the instruction interface, data interface, and MMU table walker of a core are considered as separate observers. This means that you might need, for example, to use DSB instructions to ensure that an access to one interface is guaranteed to be observable on a different interface.

排序规则的一个更微妙的影响是核心的指令接口、数据接口和 MMU table walker 被视为单独的观察者。这意味着您可能需要,例如,使用 DSB 指令来确保对一个接口的访问保证在不同的接口上是可观察的。

If you execute a data cache clean and invalidate instruction, DC CVAU, X0 for example, you must insert a DSB instruction after this to be sure that subsequent translation table walks, modifications to translation table entries, instruction fetches, or updates to instructions in memory, can all see the new values.

如果您执行数据缓存清理和无效指令,例如 DC CVAU,X0,则必须在此之后插入一条 DSB 指令,以确保后续转换表遍历、对转换表条目的修改、指令提取或对内存中指令的更新,都可以看到新的值。

For example, consider an update of the translation tables:

STR X0, [X1]                      // update a translation table entry
DSB ISHST                               // ensure write has completed
TLBI VAE1IS, X2                 // invalidate the TLB entry for the entry that 
                                        // changes
DSB ISH                         // ensure that TLB invalidation is complete

A DSB is required to ensure that the maintenance operations complete and an ISB is required to ensure that the effects of those operations are seen by the instructions that follow.

需要一个 DSB 来确保维护操作的完成,并且需要一个 ISB 来确保这些操作的效果可以通过下面的说明看到。

The processor might speculatively access an address that is marked as Normal at any time. So when considering whether barriers are required, consider more than just explicit accesses that are generated by load or store instructions.


5.5 One-way barriers

A64 adds new load and store instructions with implicit barrier semantics. The instructions are less restrictive than either DMB or DSB instructions. They also require that all loads and stores before or after the implicit barrier are observed in program order.

A64 添加了具有隐式屏障语义的新加载和存储指令。这些指令的限制性低于 DMB 或 DSB 指令。它们还要求按程序顺序观察隐式屏障之前或之后的所有加载和存储。

Load-Acquire (LDAR)

All loads and stores that are after an LDAR in program order, and that match the shareability domain of the target address, must be observed after the LDAR.

Store-Release (STLR)

All loads and stores preceding an STLR that match the shareability domain of the target address must be observed before the STLR.

There are also exclusive versions of the above, LDAXR and STLXR, available.

Unlike the data barrier instructions, which take a qualifier to control which shareability domains see the effect of the barrier, the LDAR and STLR instructions use the attribute of the address accessed.

An LDAR instruction guarantees that any memory access instructions after the LDAR, are only visible after the load-acquire. A store-release guarantees that all earlier memory accesses are visible before the store-release becomes visible and that the store is visible to all parts of the system capable of storing cached data at the same time.

一条LDAR指令保证 , 之后的任何内存访问指令LDAR仅在加载获取后可见。存储释放保证在存储释放变得可见之前所有早期的内存访问都是可见的,并且存储对于能够同时存储缓存数据的系统的所有部分都是可见的。

The following figure shows how accesses can cross a one-way barrier in one direction but not in the other.

5.6 Use of barriers in C code

The C11 and C++11 languages have a good platform-independent memory model that is preferable to intrinsics.

All versions of C and C++ have sequence points, but C11 and C++11 also provide memory models. Sequence points only prevent the compiler from reordering C++ source code. There is nothing to stop the processor reordering instructions in the generated object code, or for read and write buffers to reorder the sequence in which data transfers are sent to the cache. In other words, they are only relevant for single-threaded code. For multi-threaded code, then either use the memory model features of C11 / C++11, or other synchronization mechanisms such as mutexes which are provided by the operating system. Examples of sequence points in code include function calls and accesses to volatile variables.

The C language specification defines sequence points as follows:

“At certain specified points in the execution sequence, called sequence points, all side effects of previous evaluations shall be complete, and no side effects of subsequent evaluations shall have taken place.”

5.7 Barriers in Linux

The Linux kernel includes several platform-independent barrier functions. See the Linux kernel documentation in the memory-barriers.txt file at: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/ for more details.



[1]《ARMv8-A Memory systems》


[2]《对优化说不 - Linux中的Barrier》






