OpenCL™规范 原子操作

摘要由CSDN通过智能技术生成 Atomic Operations 原子操作

This and following sections describe how different program actions in kernel C code and the host program contribute to the local- and global-happens-before relations. This section discusses ordering rules for OpenCL 2.x atomic operations.


Device-side enqueue defines the enumerated type memory_order.


  • For memory_order_relaxed, no operation orders memory.

  • 对于memory_order_relaxed,没有任何操作对内存进行排序。

  • For memory_order_releasememory_order_acq_rel, and memory_order_seq_cst, a store operation performs a release operation on the affected memory location.

  • 对于memory_order_release、memory_order_acq_rel和memory_order_seq_cst,存储操作会对受影响的内存位置执行release操作。

  • For memory_order_acquirememory_order_acq_rel, and memory_order_seq_cst, a load operation performs an acquire operation on the affected memory location. [C11 standard, Section 7.17.3, paragraphs 2-4, modified.]

  • ​对于memory_order_acquire、memory_order_acq_rel和memory_order_seq_cst,加载操作会对受影响的内存位置执行acquire操作。[C11标准,第7.17.3节,第2-4段,修改。]

Certain built-in functions synchronize with other built-in functions performed by another unit of execution. This is true for pairs of release and acquire operations under specific circumstances. An atomic operation A that performs a release operation on a global object M global-synchronizes-with an atomic operation B that performs an acquire operation on M and reads a value written by any side effect in the release sequence headed by A. A similar rule holds for atomic operations on objects in local memory: an atomic operation A that performs a release operation on a local object M local-synchronizes-with an atomic operation B that performs an acquire operation on M and reads a value written by any side effect in the release sequence headed by A[C11 standard, Section, paragraph 11, modified.]


Atomic operations specifying memory_order_relaxed are relaxed only with respect to memory ordering. Implementations must still guarantee that any given atomic access to a particular atomic object be indivisible with respect to all other atomic accesses to that object.


There shall exist a single total order S for all memory_order_seq_cst operations that is consistent with the modification orders for all affected locations, as well as the appropriate global-happens-before and local-happens-before orders for those locations, such that each memory_order_seq_cst operation B that loads a value from an atomic object M in global or local memory observes one of the following values:

对于所有memory_order_seq_cst操作,应存在一个与所有受影响位置的修改顺序一致的单一总顺序S,以及这些位置的适当全局先发生和局部先发生顺序,使得从全局或局部存储器中的原子对象M加载值的每个memory_ order_seq_cst操作B观察以下值之一

  • the result of the last modification A of M that precedes B in S, if it exists, or

  • M在S中先于B的最后一次修改A的结果,如果存在,或者

  • if A exists, the result of some modification of M in the visible sequence of side effects with respect to B that is not memory_order_seq_cst and that does not happen before A, or

  • 如果A存在,则M在相对于B的可见副作用序列中的某些修改的结果,该修改不是memory_order_seq_cst,并且在A之前没有发生,或者

  • if A does not exist, the result of some modification of M in the visible sequence of side effects with respect to B that is not memory_order_seq_cst[C11 standard, Section 7.17.3, paragraph 6, modified.]

  • ​如果A不存在,则M在相对于B的可见副作用序列中的某些修改的结果不是memory_order_seq_cst。[C11标准,第7.17.3节,第6段,修改。]

Let X and Y be two memory_order_seq_cst operations. If X local-synchronizes-with or global-synchronizes-with Y then X both local-synchronizes-with Y and global-synchronizes-with Y.


If the total order S exists, the following rules hold:


  • For an atomic operation B that reads the value of an atomic object M, if there is a memory_order_seq_cst fence X sequenced-before B, then B observes either the last memory_order_seq_cst modification of M preceding X in the total order S or a later modification of M in its modification order. [C11 standard, Section 7.17.3, paragraph 9.]

  • ​对于读取原子对象M的值的原子操作B,如果存在在B之前排序的memory_order_seq_cst栅栏X,则B观察到M在总顺序S中在X之前的最后一次memory_order_seq_cest修改或M在其修改顺序中的后续修改。[C11标准,第7.17.3节,第9段。]

  • For atomic operations A and B on an atomic object M, where A modifies M and B takes its value, if there is a memory_order_seq_cst fence X such that A is sequenced-before X and B follows X in S, then B observes either the effects of A or a later modification of M in its modification order. [C11 standard, Section 7.17.3, paragraph 10.]

  • ​对于原子对象M上的原子操作A和B,其中A修改M,B取其值,如果存在一个memory_order_seq_cst围栏X,使得A在X之前排序,而B在S中在X之后,则B观察到A的效果或M的后续修改顺序。[C11标准,第7.17.3节,第10段。]

  • For atomic operations A and B on an atomic object M, where A modifies M and B takes its value, if there are memory_order_seq_cst fences X and Y such that A is sequenced-before XY is sequenced-before B, and X precedes Y in S, then B observes either the effects of A or a later modification of M in its modification order. [C11 standard, Section 7.17.3, paragraph 11.]

  • ​对于原子对象M上的原子操作A和B,其中A修改M,B取其值,如果存在memory_order_seq_cst栅栏X和Y,使得A在X之前排序,Y在B之前排序,并且X在S中在Y之前,则B观察到A的影响或M在其修改顺序中的后续修改。[C11标准,第7.17.3节,第11段。]

  • For atomic operations A and B on an atomic object M, if there are memory_order_seq_cst fences X and Y such that A is sequenced-before XY is sequenced-before B, and X precedes Y in S, then B occurs later than A in the modification order of M.

  • 对于原子对象M上的原子操作A和B,如果存在memory_order_seq_cst栅栏X和Y,使得A在X之前排序,Y在B之前排序,并且在S中X在Y之前,则B按M的修改顺序出现在A之后。

memory_order_seq_cst ensures sequential consistency only for a program that is (1) free of data races, and (2) exclusively uses memory_order_seq_cst synchronization operations. Any use of weaker ordering will invalidate this guarantee unless extreme care is used. In particular, memory_order_seq_cst fences ensure a total order only for the fences themselves. Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering specifications.


Atomic read-modify-write operations should always read the last value (in the modification order) stored before the write associated with the read-modify-write operation. [C11 standard, Section 7.17.3, paragraph 12.]


Implementations should ensure that no "out-of-thin-air" values are computed that circularly depend on their own computation.


Note: Under the rules described above, and independent to the previously footnoted C++ issue, it is known that x == y == 42 is a valid final state in the following problematic example:


global atomic_int x = ATOMIC_VAR_INIT(0);
local atomic_int y = ATOMIC_VAR_INIT(0);

... [execution not reading or writing x or y, leading up to:]
int t = atomic_load_explicit(&y, memory_order_acquire);
atomic_store_explicit(&x, t, memory_order_release);

... [execution not reading or writing x or y, leading up to:]
int t = atomic_load_explicit(&x, memory_order_acquire);
atomic_store_explicit(&y, t, memory_order_release);

This is not useful behavior and implementations should not exploit this phenomenon. It should be expected that in the future this may be disallowed by appropriate updates to the memory model description by the OpenCL committee.


Implementations should make atomic stores visible to atomic loads within a reasonable amount of time. [C11 standard, Section 7.17.3, paragraph 16.]


As long as the following conditions are met, a host program sharing SVM memory with a kernel executing on one or more OpenCL 2.x devices may use atomic and synchronization operations to ensure that its assignments, and those of the kernel, are visible to each other:

只要满足以下条件,与在一个或多个OpenCL 2.x设备上执行的内核共享SVM内存的主机程序就可以使用原子和同步操作来确保其分配和内核的分配彼此可见:

1.Either fine-grained buffer or fine-grained system SVM must be used to share memory. While coarse-grained buffer SVM allocations may support atomic operations, visibility on these allocations is not guaranteed except at map and unmap operations.


2.The optional OpenCL 2.x SVM atomic-controlled visibility specified by provision of the CL_MEM_SVM_ATOMICS flag must be supported by the device and the flag provided to the SVM buffer on allocation.

​2.由提供CL_MEM_SVM_ATOMICS标志指定的可选OpenCL 2.x SVM原子控制可见性必须由设备支持,并且该标志在分配时提供给SVM缓冲区。

3.The host atomic and synchronization operations must be compatible with those of an OpenCL kernel language. This requires that the size and representation of the data types that the host atomic operations act on be consistent with the OpenCL kernel language atomic types.


If these conditions are met, the host operations will apply at all_svm_devices scope.


  • 0
  • 0
    觉得还不错? 一键收藏
  • 0




当前余额3.43前往充值 >
领取后你会自动成为博主和红包主的粉丝 规则
钱包余额 0


