OpenCL™规范 3.3.7.内存排序规则

3.3.7. Memory Ordering Rules

Fundamentally, the issue in a memory model is to understand the orderings in time of modifications to objects in memory. Modifying an object or calling a function that modifies an object are side effects, i.e. changes in the state of the execution environment. Evaluation of an expression in general includes both value computations and initiation of side effects. Value computation for an lvalue expression includes determining the identity of the designated object. [C11 standard, Section, paragraph 2, modified.]


We assume that the OpenCL kernel language and host programming languages have a sequenced-before relation between the evaluations executed by a single unit of execution. This sequenced-before relation is an asymmetric, transitive, pair-wise relation between those evaluations, which induces a partial order among them. Given any two evaluations A and B, if A is sequenced-before B, then the execution of A shall precede the execution of B. (Conversely, if A is sequenced-before B, then B is sequenced-after A.) If A is not sequenced-before or sequenced-after B, then A and B are unsequenced. Evaluations A and B are indeterminately sequenced when A is either sequenced-before or sequenced-after B, but it is unspecified which. [C11 standard, Section, paragraph 3, modified.]


Sequenced-before is a partial order of the operations executed by a single unit of execution (e.g. a host thread or work-item). It generally corresponds to the source program order of those operations, and is partial because of the undefined argument evaluation order of the OpenCL C kernel language.

Sequenced-before是由单个执行单元(例如,主线程或工作项)执行的操作的部分顺序。它通常对应于这些操作的源程序顺序,并且是部分的,因为OpenCL C内核语言的参数求值顺序未定义。

In an OpenCL kernel language, the value of an object visible to a work-item W at a particular point is the initial value of the object, a value stored in the object by W, or a value stored in the object by another work-item or host thread, according to the rules below. Depending on details of the host programming language, the value of an object visible to a host thread may also be the value stored in that object by another work-item or host thread. [C11 standard, Section, paragraph 2, modified.]


Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location. [C11 standard, Section, paragraph 4.]


All modifications to a particular atomic object M occur in some particular total order, called the modification order of M. If A and B are modifications of an atomic object M, and A happens-before B, then A shall precede B in the modification order of M, which is defined below. Note that the modification order of an atomic object M is independent of whether M is in local or global memory. [C11 standard, Section, paragraph 7, modified.]


A release sequence begins with a release operation A on an atomic object M and is the maximal contiguous sub-sequence of side effects in the modification order of M, where the first operation is A and every subsequent operation either is performed by the same work-item or host thread that performed the release or is an atomic read-modify-write operation. [C11 standard, Section, paragraph 10, modified.]


OpenCL’s local and global memories are disjoint. Kernels may access both kinds of memory while host threads may only access global memory. Furthermore, the flags argument of OpenCL’s work_group_barrier function specifies which memory operations the function will make visible: these memory operations can be, for example, just the ones to local memory, or the ones to global memory, or both. Since the visibility of memory operations can be specified for local memory separately from global memory, we define two related but independent relations, global-synchronizes-with and local-synchronizes-with. Certain operations on global memory may global-synchronize-with other operations performed by another work-item or host thread. An example is a release atomic operation in one work- item that global-synchronizes-with an acquire atomic operation in a second work-item. Similarly, certain atomic operations on local objects in kernels can local-synchronize- with other atomic operations on those local objects. [C11 standard, Section, paragraph 11, modified.]


We define two separate happens-before relations: global-happens-before and local-happens-before.


A global memory action A global-happens-before a global memory action B if


  • A is sequenced before B, or

  • A在B之前排序,或者

  • A global-synchronizes-with B, or

  • 全局与B同步,或

  • For some global memory action CA global-happens-before C and C global-happens-before B.

  • 对于某些全局内存操作C,A全局发生在C之前,C全局发生在B之前。

A local memory action A local-happens-before a local memory action B if


  • A is sequenced before B, or

  • A在B之前排序,或者

  • A local-synchronizes-with B, or

  • A本地与B同步,或

  • For some local memory action CA local-happens-before C and C local-happens-before B.

  • 对于某些局部内存操作C,A局部发生在C之前,C局部发生在B之前。

An OpenCL 2.x implementation shall ensure that no program execution demonstrates a cycle in either the local-happens-before relation or the global-happens-before relation.

OpenCL 2.x实现应确保没有任何程序执行表明本地先发生后关系或全局先发生后关联中的循环。

The global- and local-happens-before relations are critical to defining what values are read and when data races occur. The global-happens-before relation, for example, defines what global memory operations definitely happen before what other global memory operations. If an operation A global-happens-before operation B then A must occur before B; in particular, any write done by A will be visible to B. The local-happens-before relation has similar properties for local memory. Programmers can use the local- and global-happens-before relations to reason about the order of program actions.


A visible side effect A on a global object M with respect to a value computation B of M satisfies the conditions:


  • A global-happens-before B, and

  • A全局发生在B之前,并且

  • there is no other side effect X to M such that A global-happens-before X and X global-happens-before B.

  • X对M没有其他副作用,使得A全局发生在X之前,X全局发生在B之前。

We define visible side effects for local objects M similarly. The value of a non-atomic scalar object M, as determined by evaluation B, shall be the value stored by the visible side effect A[C11 standard, Section, paragraph 19, modified.]


The execution of a program contains a data race if it contains two conflicting actions A and B in different units of execution, and


  • (1) at least one of A or B is not atomic, or A and B do not have inclusive memory scope, and

  • (1) A或B中至少有一个不是原子的,或者A和B不具有包含的内存范围,以及

  • (2) the actions are global actions unordered by the global-happens-before relation or are local actions unordered by the local-happens-before relation.

  • (2) 操作是全局先发生后关系无序的全局操作,或者是局部先发生后发生关系无序的局部动作。

Any such data race results in undefined behavior. [C11 standard, Section, paragraph 25, modified.]


We also define the visible sequence of side effects on local and global atomic objects. The remaining paragraphs of this subsection define this sequence for a global atomic object M; the visible sequence of side effects for a local atomic object is defined similarly by using the local-happens-before relation.


The visible sequence of side effects on a global atomic object M, with respect to a value computation B of M, is a maximal contiguous sub-sequence of side effects in the modification order of M, where the first side effect is visible with respect to B, and for every side effect, it is not the case that B global-happens-before it. The value of M, as determined by evaluation B, shall be the value stored by some operation in the visible sequence of M with respect to B[C11 standard, Section, paragraph 22, modified.]


If an operation A that modifies an atomic object M global-happens-before an operation B that modifies M, then A shall be earlier than B in the modification order of M. This requirement is known as write-write coherence.


If a value computation A of an atomic object M global-happens-before a value computation B of M, and A takes its value from a side effect X on M, then the value computed by B shall either equal the value stored by X, or be the value stored by a side effect Y on M, where Y follows X in the modification order of M. This requirement is known as read-read coherence. [C11 standard, Section, paragraph 22, modified.]


If a value computation A of an atomic object M global-happens-before an operation B on M, then A shall take its value from a side effect X on M, where X precedes B in the modification order of M. This requirement is known as read-write coherence.


If a side effect X on an atomic object M global-happens-before a value computation B of M, then the evaluation B shall take its value from X or from a side effect Y that follows X in the modification order of M. This requirement is known as write-read coherence.






