OpenCL™规范 3.3.7.1. 原子操作

本文详细解释了OpenCL2.x中的原子操作及其内存顺序规则,包括memory_order的不同级别,全局和局部同步,以及memory_order_seq_cst的顺序一致性保证。还讨论了如何确保数据一致性,以及主机程序与内核之间的可见性要求。
摘要由CSDN通过智能技术生成
3.3.7.1. Atomic Operations
3.3.7.1. 原子操作

This and following sections describe how different program actions in kernel C code and the host program contribute to the local- and global-happens-before relations. This section discusses ordering rules for OpenCL 2.x atomic operations.

本节和以下部分描述了内核C代码和主机程序中的不同程序操作如何促成局部和全局先发生后发生的关系。本节讨论OpenCL2.x原子操作的排序规则。

Device-side enqueue defines the enumerated type memory_order.

设备端队列定义了枚举类型memory_order。

  • For memory_order_relaxed, no operation orders memory.

  • 对于memory_order_relaxed,没有任何操作对内存进行排序。

  • For memory_order_releasememory_order_acq_rel, and memory_order_seq_cst, a store operation performs a release operation on the affected memory location.

  • 对于memory_order_release、memory_order_acq_rel和memory_order_seq_cst,存储操作会对受影响的内存位置执行release操作。

  • For memory_order_acquirememory_order_acq_rel, and memory_order_seq_cst, a load operation performs an acquire operation on the affected memory location. [C11 standard, Section 7.17.3, paragraphs 2-4, modified.]

  • ​对于memory_order_acquire、memory_order_acq_rel和memory_order_seq_cst,加载操作会对受影响的内存位置执行acquire操作。[C11标准,第7.17.3节,第2-4段,修改。]

Certain built-in functions synchronize with other built-in functions performed by another unit of execution. This is true for pairs of release and acquire operations under specific circumstances. An atomic operation A that performs a release operation on a global object M global-synchronizes-with an atomic operation B that performs an acquire operation on M and reads a value written by any side effect in the release sequence headed by A. A similar rule holds for atomic operations on objects in local memory: an atomic operation A that performs a release operation on a local object M local-synchronizes-with an atomic operation B that performs an acquire operation on M and reads a value written by any side effect in the release sequence headed by A[C11 standard, Section 5.1.2.4, paragraph 11, modified.]

​某些内置函数与另一个执行单元执行的其他内置函数同步。对于特定情况下的成对release和acquire操作,情况也是如此。对全局对象M全局执行release操作的原子操作A与对M执行acquire操作并读取由以A为首的release序列中的任何副作用写入的值的原子操作B同步。对本地内存中对象的原子操作也有类似的规则:对本地对象M执行release操作的原子操作A与对M执行acquire操作并读取由A为首的release序列中的任何副作用写入的值的原子操作B本地同步。[C11标准,第5.1.2.4节,第11段,修改。]

Atomic operations specifying memory_order_relaxed are relaxed only with respect to memory ordering. Implementations must still guarantee that any given atomic access to a particular atomic object be indivisible with respect to all other atomic accesses to that object.

指定memory_order_relaxed的原子操作仅在内存排序方面放宽。实现仍然必须保证对特定原子对象的任何给定原子访问相对于对该对象的所有其他原子访问是不可分割的。

There shall exist a single total order S for all memory_order_seq_cst operations that is consistent with the modification orders for all affected locations, as well as the appropriate global-happens-before and local-happens-before orders for those locations, such that each memory_order_seq_cst operation B that loads a value from an atomic object M in global or local memory observes one of the following values:

对于所有memory_order_seq_cst操作,应存在一个与所有受影响位置的修改顺序一致的单一总顺序S,以及这些位置的适当全局先发生和局部先发生顺序,使得从全局或局部存储器中的原子对象M加载值的每个memory_ order_seq_cst操作B观察以下值之一

  • the result of the last modification A of M that precedes B in S, if it exists, or

  • M在S中先于B的最后一次修改A的结果,如果存在,或者

  • if A exists, the result of some modification of M in the visible sequence of side effects with respect to B that is not memory_order_seq_cst and that does not happen before A, or

  • 如果A存在,则M在相对于B的可见副作用序列中的某些修改的结果,该修改不是memory_order_seq_cst,并且在A之前没有发生,或者

  • if A does not exist, the result of some modification of M in the visible sequence of side effects with respect to B that is not memory_order_seq_cst[C11 standard, Section 7.17.3, paragraph 6, modified.]

  • ​如果A不存在,则M在相对于B的可见副作用序列中的某些修改的结果不是memory_order_seq_cst。[C11标准,第7.17.3节,第6段,修改。]

Let X and Y be two memory_order_seq_cst operations. If X local-synchronizes-with or global-synchronizes-with Y then X both local-synchronizes-with Y and global-synchronizes-with Y.

设X和Y是两个memory_order_seq_cst运算。如果X局部与Y同步或全局与Y同步,则X既与Y局部同步,又与Y全局同步。

If the total order S exists, the following rules hold:

如果存在总订单S,则以下规则适用:

  • For an atomic operation B that reads the value of an atomic object M, if there is a memory_order_seq_cst fence X sequenced-before B, then B observes either the last memory_order_seq_cst modification of M preceding X in the total order S or a later modification of M in its modification order. [C11 standard, Section 7.17.3, paragraph 9.]

  • ​对于读取原子对象M的值的原子操作B,如果存在在B之前排序的memory_order_seq_cst栅栏X,则B观察到M在总顺序S中在X之前的最后一次memory_order_seq_cest修改或M在其修改顺序中的后续修改。[C11标准,第7.17.3节,第9段。]

  • For atomic operations A and B on an atomic object M, where A modifies M and B takes its value, if there is a memory_order_seq_cst fence X such that A is sequenced-before X and B follows X in S, then B observes either the effects of A or a later modification of M in its modification order. [C11 standard, Section 7.17.3, paragraph 10.]

  • ​对于原子对象M上的原子操作A和B,其中A修改M,B取其值,如果存在一个memory_order_seq_cst围栏X,使得A在X之前排序,而B在S中在X之后,则B观察到A的效果或M的后续修改顺序。[C11标准,第7.17.3节,第10段。]

  • For atomic operations A and B on an atomic object M, where A modifies M and B takes its value, if there are memory_order_seq_cst fences X and Y such that A is sequenced-before XY is sequenced-before B, and X precedes Y in S, then B observes either the effects of A or a later modification of M in its modification order. [C11 standard, Section 7.17.3, paragraph 11.]

  • ​对于原子对象M上的原子操作A和B,其中A修改M,B取其值,如果存在memory_order_seq_cst栅栏X和Y,使得A在X之前排序,Y在B之前排序,并且X在S中在Y之前,则B观察到A的影响或M在其修改顺序中的后续修改。[C11标准,第7.17.3节,第11段。]

  • For atomic operations A and B on an atomic object M, if there are memory_order_seq_cst fences X and Y such that A is sequenced-before XY is sequenced-before B, and X precedes Y in S, then B occurs later than A in the modification order of M.

  • 对于原子对象M上的原子操作A和B,如果存在memory_order_seq_cst栅栏X和Y,使得A在X之前排序,Y在B之前排序,并且在S中X在Y之前,则B按M的修改顺序出现在A之后。

memory_order_seq_cst ensures sequential consistency only for a program that is (1) free of data races, and (2) exclusively uses memory_order_seq_cst synchronization operations. Any use of weaker ordering will invalidate this guarantee unless extreme care is used. In particular, memory_order_seq_cst fences ensure a total order only for the fences themselves. Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering specifications.

memory_order_seq_cst仅为(1)没有数据争用且(2)仅使用memory_order_seq_c斯特同步操作的程序确保顺序一致性。任何使用较弱顺序的行为都将使该保证失效,除非使用极端小心措施。特别是,memory_order_seq_cst围栏只确保围栏本身的总顺序。一般来说,围栏不能用于恢复排序规范较弱的原子操作的顺序一致性。

Atomic read-modify-write operations should always read the last value (in the modification order) stored before the write associated with the read-modify-write operation. [C11 standard, Section 7.17.3, paragraph 12.]

​原子读取-修改-写入操作应始终读取与读取-修改写入操作相关联的写入之前存储的最后一个值(按修改顺序)。[C11标准,第7.17.3节,第12段。]

Implementations should ensure that no "out-of-thin-air" values are computed that circularly depend on their own computation.

实现应确保不会计算出循环依赖于其自身计算的“凭空”值。

Note: Under the rules described above, and independent to the previously footnoted C++ issue, it is known that x == y == 42 is a valid final state in the following problematic example:

注意:根据上述规则,并且独立于前面脚注的C++问题,已知在以下有问题的示例中,x==y==42是有效的最终状态:

global atomic_int x = ATOMIC_VAR_INIT(0);
local atomic_int y = ATOMIC_VAR_INIT(0);

unit_of_execution_1:
... [execution not reading or writing x or y, leading up to:]
int t = atomic_load_explicit(&y, memory_order_acquire);
atomic_store_explicit(&x, t, memory_order_release);

unit_of_execution_2:
... [execution not reading or writing x or y, leading up to:]
int t = atomic_load_explicit(&x, memory_order_acquire);
atomic_store_explicit(&y, t, memory_order_release);

This is not useful behavior and implementations should not exploit this phenomenon. It should be expected that in the future this may be disallowed by appropriate updates to the memory model description by the OpenCL committee.

这不是有用的行为,实现不应该利用这种现象。应该预料到,在未来,这可能会被OpenCL委员会对内存模型描述的适当更新所禁止。

Implementations should make atomic stores visible to atomic loads within a reasonable amount of time. [C11 standard, Section 7.17.3, paragraph 16.]

​实现应该使原子存储在合理的时间内对原子负载可见。[C11标准,第7.17.3节,第16段。]

As long as the following conditions are met, a host program sharing SVM memory with a kernel executing on one or more OpenCL 2.x devices may use atomic and synchronization operations to ensure that its assignments, and those of the kernel, are visible to each other:

只要满足以下条件,与在一个或多个OpenCL 2.x设备上执行的内核共享SVM内存的主机程序就可以使用原子和同步操作来确保其分配和内核的分配彼此可见:

1.Either fine-grained buffer or fine-grained system SVM must be used to share memory. While coarse-grained buffer SVM allocations may support atomic operations, visibility on these allocations is not guaranteed except at map and unmap operations.

1.必须使用细粒度缓冲区或细粒度系统SVM来共享内存。虽然粗粒度缓冲区SVM分配可能支持原子操作,但除了映射和取消映射操作之外,这些分配的可见性是不保证的。

2.The optional OpenCL 2.x SVM atomic-controlled visibility specified by provision of the CL_MEM_SVM_ATOMICS flag must be supported by the device and the flag provided to the SVM buffer on allocation.

​2.由提供CL_MEM_SVM_ATOMICS标志指定的可选OpenCL 2.x SVM原子控制可见性必须由设备支持,并且该标志在分配时提供给SVM缓冲区。

3.The host atomic and synchronization operations must be compatible with those of an OpenCL kernel language. This requires that the size and representation of the data types that the host atomic operations act on be consistent with the OpenCL kernel language atomic types.

3.主机原子操作和同步操作必须与OpenCL内核语言的操作兼容。这要求主机原子操作所作用的数据类型的大小和表示与OpenCL内核语言原子类型一致。

If these conditions are met, the host operations will apply at all_svm_devices scope.

如果满足这些条件,主机操作将应用于all_svm_devices范围。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值