OpenCL™规范 3.3.5.OpenCL 2.x的内存一致性模型

꧁白杨树下꧂

于 2023-12-28 16:57:24 发布

阅读量107

点赞数

分类专栏： openCL 文章标签： opencl

openCL 专栏收录该内容

148 篇文章 11 订阅

订阅专栏

3.3.5. Memory Consistency Model for OpenCL 2.x

3.3.5.OpenCL 2.x的内存一致性模型

This memory consistency model is missing before version 2.0.

此内存一致性模型在2.0版本之前丢失。

The OpenCL 2.x memory model tells programmers what they can expect from an OpenCL 2.x implementation; which memory operations are guaranteed to happen in which order and which memory values each read operation will return. The memory model tells compiler writers which restrictions they must follow when implementing compiler optimizations; which variables they can cache in registers and when they can move reads or writes around a barrier or atomic operation. The memory model also tells hardware designers about limitations on hardware optimizations; for example, when they must flush or invalidate hardware caches.

OpenCL2.x内存模型告诉程序员他们可以从OpenCL2.x实现中得到什么；保证哪些存储器操作以何种顺序发生，以及每个读取操作将返回哪些存储器值。内存模型告诉编译器编写器在实现编译器优化时必须遵循哪些限制；他们可以将哪些变量缓存在寄存器中，以及何时可以围绕栅栏或原子操作进行读取或写入。内存模型还告诉硬件设计者关于硬件优化的限制；例如，当它们必须刷新或使硬件缓存无效时。

The memory consistency model in OpenCL 2.x is based on the memory model from the ISO C11 programming language. To help make the presentation more precise and self-contained, we include modified paragraphs taken verbatim from the ISO C11 international standard. When a paragraph is taken or modified from the C11 standard, it is identified as such along with its original location in the C11 standard.

OpenCL2.x中的内存一致性模型基于ISOC11编程语言中的内存模型。为了使演示更加精确和自成一体，我们包含了根据ISO C11国际标准逐字摘录的修改段落。当一个段落取自或修改自C11标准时，它与C11标准中的原始位置一起被识别为C11标准。

For programmers, the most intuitive model is the sequential consistency memory model. Sequential consistency interleaves the steps executed by each of the units of execution. Each access to a memory location sees the last assignment to that location in that interleaving. While sequential consistency is relatively straightforward for a programmer to reason about, implementing sequential consistency is expensive. Therefore, OpenCL 2.x implements a relaxed memory consistency model; i.e. it is possible to write programs where the loads from memory violate sequential consistency. Fortunately, if a program does not contain any races and if the program only uses atomic operations that utilize the sequentially consistent memory order (the default memory ordering for OpenCL 2.x), OpenCL programs appear to execute with sequential consistency.

对于程序员来说，最直观的模型是顺序一致性内存模型。顺序一致性交错由每个执行单元执行的步骤。对存储器位置的每次访问都会看到在该交织中对该位置的最后一次分配。虽然顺序一致性对于程序员来说是相对简单的，但实现顺序一致性是昂贵的。因此，OpenCL2.x实现了一个宽松的内存一致性模型；即，可以在从存储器加载违反顺序一致性的情况下写入程序。幸运的是，如果一个程序不包含任何种族，并且该程序只使用原子操作，这些原子操作使用顺序一致的内存顺序（OpenCL2.x的默认内存顺序），那么OpenCL程序的执行似乎具有顺序一致性。

Programmers can to some degree control how the memory model is relaxed by choosing the memory order for synchronization operations. The precise semantics of synchronization and the memory orders are formally defined in Memory Ordering Rules. Here, we give a high level description of how these memory orders apply to atomic operations on atomic objects shared between units of execution. OpenCL 2.x memory_order choices are based on those from the ISO C11 standard memory model. They are specified in certain OpenCL functions through the following enumeration constants:

程序员可以在一定程度上通过选择同步操作的内存顺序来控制内存模型是如何放松的。同步的精确语义和内存顺序是在内存排序规则中正式定义的。在这里，我们给出了这些内存顺序如何应用于执行单元之间共享的原子对象上的原子操作的高级描述。OpenCL 2.x memory_order的选择基于ISO C11标准内存模型中的选择。它们在某些OpenCL函数中通过以下枚举常量指定：

memory_order_relaxed: implies no order constraints. This memory order can be used safely to increment counters that are concurrently incremented, but it doesn’t guarantee anything about the ordering with respect to operations to other memory locations. It can also be used, for example, to do ticket allocation and by expert programmers implementing lock-free algorithms.
memory_order_relaxed：表示没有顺序约束。这个内存顺序可以安全地用于递增同时递增的计数器，但它不能保证对其他内存位置的操作的顺序。例如，它也可以用于进行票证分配，并由实现无锁算法的专业程序员使用。
memory_order_acquire: A synchronization operation (fence or atomic) that has acquire semantics "acquires" side-effects from a release operation that synchronises with it: if an acquire synchronises with a release, the acquiring unit of execution will see all side-effects preceding that release (and possibly subsequent side-effects.) As part of carefully-designed protocols, programmers can use an "acquire" to safely observe the work of another unit of execution.
memory_order_aquire：具有获取语义的同步操作（栅栏或原子操作）从与其同步的发布操作中“acquires”副作用：如果acquire与release同步，则执行的acquire单元将看到该release之前的所有副作用（以及可能的后续副作用）。作为精心设计的协议的一部分，程序员可以使用“acquire”来安全地观察另一个执行单元的工作。
memory_order_release: A synchronization operation (fence or atomic operation) that has release semantics "releases" side effects to an acquire operation that synchronises with it. All side effects that precede the release are included in the release. As part of carefully-designed protocols, programmers can use a "release" to make changes made in one unit of execution visible to other units of execution.
memory_order_release：具有release语义的同步操作（栅栏或原子操作）将副作用“releases”给与其同步的acquire操作。release之前的所有副作用都包含在release中。作为精心设计的协议的一部分，程序员可以使用“release”使一个执行单元中所做的更改对其他执行单元可见。

In general, no acquire must always synchronise with any particular release. However, synchronisation can be forced by certain executions. See the description of Fence Operations for detailed rules for when synchronisation must occur.

一般来说，任何acquire都必须始终与任何特定的release同步。但是，某些执行可能会强制同步。有关何时必须进行同步的详细规则，请参阅栅栏操作的说明。

memory_order_acq_rel: A synchronization operation with acquire-release semantics has the properties of both the acquire and release memory orders. It is typically used to order read-modify-write operations.
memory_order_aq_rel：具有acquire-release语义的同步操作同时具有acquire和release内存顺序的属性。它通常用于命令读取-修改-写入操作。
memory_order_seq_cst: The loads and stores of each unit of execution appear to execute in program (i.e., sequenced-before) order, and the loads and stores from different units of execution appear to be simply interleaved.
memory_order_seq_cst：每个执行单元的加载和存储似乎是按程序顺序执行的（即在之前排序），而来自不同执行单元的装载和存储似乎只是交错的。

Regardless of which memory_order is specified, resolving constraints on memory operations across a heterogeneous platform adds considerable overhead to the execution of a program. An OpenCL platform may be able to optimize certain operations that depend on the features of the memory consistency model by restricting the scope of the memory operations. Distinct memory scopes are defined by the values of the memory_scope enumeration constant:

无论指定了哪个memory_order，在异构平台上解决内存操作的约束都会给程序的执行增加相当大的开销。OpenCL平台可能能够通过限制存储器操作的范围来优化取决于存储器一致性模型的特征的某些操作。不同的内存作用域由memory_scope枚举常量的值定义：

memory_scope_work_item: memory-ordering constraints only apply within the work-item [1].
memory_scope_work_item:内存排序约束仅适用于工作项[1]。
memory_scope_sub_group: memory-ordering constraints only apply within the sub-group.
memory_scope_sub_group：内存排序约束仅适用于子组内。
memory_scope_work_group: memory-ordering constraints only apply to work-items executing within a single work-group.
memory_scope_work_group：内存排序约束仅适用于在单个工作组中执行的工作项。
memory_scope_device: memory-ordering constraints only apply to work-items executing on a single device
memory_scope_device：内存排序约束仅适用于在单个设备上执行的工作项
memory_scope_all_svm_devices: memory-ordering constraints apply to work-items executing across multiple devices and (when using SVM) the host. A release performed with memory_scope_all_svm_devices to a buffer that does not have the CL_MEM_SVM_ATOMICS flag set will commit to at least memory_scope_device visibility, with full synchronization of the buffer at a queue synchronization point (e.g. an OpenCL event).
memory_scope_all_svm_devices：内存排序约束适用于在多个设备和（当使用svm时）主机上执行的工作项。使用memory_scope_all_svm_devices对未设置CL_MEM_SVM_ATOMICS标志的缓冲区执行的释放将至少提交到memory_scope_device可见性，并在队列同步点对缓冲区进行完全同步（例如OpenCL事件）。
memory_scope_all_devices: an alias for memory_scope_all_svm_devices.
memory_scope_all_devices：memory_scope_all_svm_devices的别名。

These memory scopes define a hierarchy of visibilities when analyzing the ordering constraints of memory operations. For example if a programmer knows that a sequence of memory operations will only be associated with a collection of work-items from a single work-group (and hence will run on a single device), the implementation is spared the overhead of managing the memory orders across other devices within the same context. This can substantially reduce overhead in a program. All memory scopes are valid when used on global memory or local memory. For local memory, all visibility is constrained to within a given work-group and scopes wider than memory_scope_work_group carry no additional meaning.

在分析内存操作的排序约束时，这些内存作用域定义了可见性的层次结构。例如，如果程序员知道存储器操作序列将仅与来自单个工作组的工作项集合相关联（因此将在单个设备上运行），则该实现省去了在同一上下文中跨其他设备管理存储器顺序的开销。这可以大大减少程序中的开销。在全局内存或本地内存上使用时，所有内存作用域都是有效的。对于本地内存，所有可见性都被限制在给定的工作组内，并且比memory_scope_work_group更宽的范围没有其他意义。

In the following subsections (leading up to OpenCL Framework), we will explain the synchronization constructs and detailed rules needed to use OpenCL’s 2.x relaxed memory models. It is important to appreciate, however, that many programs do not benefit from relaxed memory models. Even expert programmers have a difficult time using atomics and fences to write correct programs with relaxed memory models. A large number of OpenCL programs can be written using a simplified memory model. This is accomplished by following these guidelines.

在下面的小节（通向OpenCL框架）中，我们将解释使用OpenCL的2.x宽松内存模型所需的同步结构和详细规则。然而，重要的是要认识到，许多程序并没有从松弛的内存模型中受益。即使是专业的程序员也很难使用原子和栅栏来编写具有宽松内存模型的正确程序。可以使用简化的内存模型编写大量的OpenCL程序。这是通过遵循这些准则来实现的。

Write programs that manage safe sharing of global memory objects through the synchronization points defined by the command-queues.
编写程序，通过命令队列定义的同步点管理全局内存对象的安全共享。
Restrict low level synchronization inside work-groups to the work-group functions such as barrier.
将工作组内部的低级别同步限制为工作组功能（如栅栏）。
If you want sequential consistency behavior with system allocations or fine-grain SVM buffers with atomics support, use only memory_order_seq_cst operations with the scope memory_scope_all_svm_devices.
如果想要具有系统分配的顺序一致性行为或具有原子支持的细粒度SVM缓冲区，请仅对作用域memory_scope_all_SVM_devices使用memory_order_seq_cst操作。
If you want sequential consistency behavior when not using system allocations or fine-grain SVM buffers with atomics support, use only memory_order_seq_cst operations with the scope memory_scope_device or memory_scope_all_svm_devices.
如果在不使用系统分配或支持原子的细粒度SVM缓冲区时想要顺序一致性行为，请仅对作用域memory_rope_device或memory_rope_all_SVM_devices使用memory_order_seq_cst操作。
Ensure your program has no races.
确保项目没有比赛。

If these guidelines are followed in your OpenCL programs, you can skip the detailed rules behind the relaxed memory models and go directly to OpenCL Framework.

如果在OpenCL程序中遵循这些准则，可以跳过宽松内存模型背后的详细规则，直接转到OpenCL框架。

꧁白杨树下꧂

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
OpenCL™规范 3.3.5.OpenCL 2.x的内存一致性模型

幸运的是，如果一个程序不包含任何种族，并且该程序只使用原子操作，这些原子操作使用顺序一致的内存顺序（OpenCL2.x的默认内存顺序），那么OpenCL程序的执行似乎具有顺序一致性。：具有获取语义的同步操作（栅栏或原子操作）从与其同步的发布操作中“acquires”副作用：如果acquire与release同步，则执行的acquire单元将看到该release之前的所有副作用（以及可能的后续副作用）。无论指定了哪个memory_order，在异构平台上解决内存操作的约束都会给程序的执行增加相当大的开销。
复制链接

扫一扫

专栏目录