OpenCL™规范 3.3.6.原子和栅栏操作概述

最新推荐文章于 2024-02-19 10:48:01 发布

꧁白杨树下꧂

最新推荐文章于 2024-02-19 10:48:01 发布

阅读量183

点赞数

分类专栏： openCL 文章标签： opencl

openCL 专栏收录该内容

164 篇文章 12 订阅

订阅专栏

3.3.6. Overview of atomic and fence operations

3.3.6.原子和栅栏操作概述

OpenCL 2.x has a number of synchronization operations that are used to define memory order constraints in a program. They play a special role in controlling how memory operations in one unit of execution (such as work-items or, when using SVM a host thread) are made visible to another. There are two types of synchronization operations in OpenCL; atomic operations and fences.

OpenCL2.x有许多同步操作，用于定义程序中的内存顺序约束。它们在控制一个执行单元中的内存操作（如工作项或使用SVM时的主机线程）如何对另一个单元可见方面发挥着特殊作用。OpenCL中有两种类型的同步操作；原子操作和栅栏。

Atomic operations are indivisible. They either occur completely or not at all. These operations are used to order memory operations between units of execution and hence they are parameterized with the memory_order and memory_scope parameters defined by the OpenCL memory consistency model. The atomic operations for OpenCL kernel languages are similar to the corresponding operations defined by the C11 standard.

原子操作是不可分割的。它们要么完全发生，要么根本不发生。这些操作用于对执行单元之间的内存操作进行排序，因此使用OpenCL内存一致性模型定义的memory_order和memory_scope参数对它们进行参数化。OpenCL内核语言的原子操作类似于C11标准定义的相应操作。

The OpenCL 2.x atomic operations apply to variables of an atomic type (a subset of those in the C11 standard) including atomic versions of the int, uint, long, ulong, float, double, half, intptr_t, uintptr_t, size_t, and ptrdiff_t types. However, support for some of these atomic types depends on support for the corresponding regular types.

OpenCL 2.x原子操作适用于原子类型的变量（C11标准中的子集），包括int、uint、long、ulong、float、double、half、intptr_t、uintptr_t，size_t和ptrdiff_t类型的原子版本。然而，对其中一些原子类型的支持取决于对相应的正则类型的支持。

An atomic operation on one or more memory locations is either an acquire operation, a release operation, or both an acquire and release operation. An atomic operation without an associated memory location is a fence and can be either an acquire fence, a release fence, or both an acquire and release fence. In addition, there are relaxed atomic operations, which do not have synchronization properties, and atomic read-modify-write operations, which have special characteristics. [C11 standard, Section 5.1.2.4, paragraph 5, modified.]

对一个或多个内存位置的原子操作是acquire操作、release操作，或者是acquire和release操作。没有关联内存位置的原子操作是一个栅栏，可以是acquire栅栏、release栅栏，也可以是acquire和release栅栏。此外，还有不具有同步属性的松弛原子操作和具有特殊特性的原子读-修改-写操作。[C11标准，第5.1.2.4节，第5段，修改。]

The orders memory_order_acquire (used for reads), memory_order_release (used for writes), and memory_order_acq_rel (used for read-modify-write operations) are used for simple communication between units of execution using shared variables. Informally, executing a memory_order_release on an atomic object A makes all previous side effects visible to any unit of execution that later executes a memory_order_acquire on A. The orders memory_order_acquire, memory_order_release, and memory_order_acq_rel do not provide sequential consistency for race-free programs because they will not ensure that atomic stores followed by atomic loads become visible to other threads in that order.

顺序memory_order_aquire（用于读取）、memory_orden_release（用于写入）和memory_order_aq_rel（用于读取-修改-写入操作）用于使用共享变量的执行单元之间的简单通信。非正式地，在原子对象a上执行memory_order_release使所有先前的副作用对于随后在a上执行memory_order_aquire的任何执行单元可见，和memory_order_aq_rel不能为无争用程序提供顺序一致性，因为它们不能确保后面跟着原子加载的原子存储对按该顺序的其他线程可见。

The fence operation is atomic_work_item_fence, which includes a memory_order argument as well as the memory_scope and cl_mem_fence_flags arguments. Depending on the memory_order argument, this operation:

栅栏操作是atomic_work_item_fence，它包括memory_order参数以及memory_scope和cl_mem_fence_flags参数。根据memory_order参数，此操作：

has no effects, if memory_order_relaxed;
如果memory_order_relaxed，则没有任何效果；
is an acquire fence, if memory_order_acquire;
是一个acquire栅栏，如果memory_order_aquire；
is a release fence, if memory_order_release;
是release栅栏，如果memory_order_release；
is both an acquire fence and a release fence, if memory_order_acq_rel;
既是acquire栅栏又是release栅栏，如果memory_order_acq_rel；
is a sequentially-consistent fence with both acquire and release semantics, if memory_order_seq_cst.
是一个具有acquire和release语义的顺序一致的栅栏，如果是memory_order_seq_cst。

If specified, the cl_mem_fence_flags argument must be CLK_IMAGE_MEM_FENCE, CLK_GLOBAL_MEM_FENCE, CLK_LOCAL_MEM_FENCE, or CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE.

如果指定了cl_mem_fence_flags参数，则该参数必须是CLK_IMAGE_MEM_FENCE、CLK_GLOBAL_MEM_FENCE或CLK_GLOBAL_MEM_FENCE | CLK_LOCAL_MEM_FENCE。

The atomic_work_item_fence(CLK_IMAGE_MEM_FENCE, …) built-in function must be used to make sure that sampler-less writes are visible to later reads by the same work-item. Without use of the atomic_work_item_fence function, write-read coherence on image objects is not guaranteed: if a work-item reads from an image to which it has previously written without an intervening atomic_work_item_fence, it is not guaranteed that those previous writes are visible to the work-item.

atomic_work_item_fence（CLK_IMAGE_MEM_FENCE…)必须使用内置函数来确保同一工作项的以后读取可以看到无采样器写入。如果不使用atomic_work_item_fence函数，就不能保证图像对象上的写-读一致性：如果工作项从它以前写入的图像中读取，而没有中间的atomic_work_item_fence，则不能保证以前的写入对工作项可见。

The synchronization operations in OpenCL 2.x can be parameterized by a memory_scope. Memory scopes control the extent that an atomic operation or fence is visible with respect to the memory model. These memory scopes may be used when performing atomic operations and fences on global memory and local memory. When used on global memory visibility is bounded by the capabilities of that memory. When used on a fine-grained non-atomic SVM buffer, a coarse-grained SVM buffer, or a non-SVM buffer, operations parameterized with memory_scope_all_svm_devices will behave as if they were parameterized with memory_scope_device. When used on local memory, visibility is bounded by the work-group and, as a result, memory_scope with wider visibility than memory_scope_work_group will be reduced to memory_scope_work_group.

OpenCL 2.x中的同步操作可以通过memory_scope参数化。内存作用域控制原子操作或栅栏相对于内存模型可见的范围。当对全局内存和本地内存执行原子操作和栅栏时，可以使用这些内存范围。在全局内存上使用时，可见性受该内存的功能限制。当在细粒度的非原子SVM缓冲区、粗粒度的SVM缓冲区或非SVM缓冲区上使用时，用memory_scope_all_svm_devices参数化的操作将表现得就像用memory_scope_device参数化的一样。在本地内存上使用时，可见性受工作组的限制，因此，可见性比memory_scope_work_group更宽的memory_scope将减少为memory_scope_work_group。

Two actions A and B are defined to have an inclusive scope if they have the same scope P such that:

如果两个动作A和B具有相同的范围P，则它们被定义为具有包容性范围，使得：

P is memory_scope_sub_group and A and B are executed by work-items within the same sub-group.
P是memory_scope_sub_group，A和B由同一子组内的工作项执行。
P is memory_scope_work_group and A and B are executed by work-items within the same work-group.
P是memory_scope_work_group，A和B由同一工作组内的工作项执行。
P is memory_scope_device and A and B are executed by work-items on the same device when A and B apply to an SVM allocation or A and B are executed by work-items in the same kernel or one of its children when A and B apply to a cl_mem buffer.
P是memory_scope_device，当A和B应用于SVM分配时，A和B由相同设备上的工作项执行，或者当A和B应用于cl_mem缓冲区时。
P is memory_scope_all_svm_devices if A and B are executed by host threads or by work-items on one or more devices that can share SVM memory with each other and the host process.
P是memory_scope_all_svm_devices，如果A和B是由主机线程或由一个或多个设备上的工作项执行的，这些设备可以彼此和主机进程共享SVM存储器。