OpenCL™规范 3.3.7.5. 主机端和设备端命令

꧁白杨树下꧂

于 2024-02-25 17:15:48 发布

阅读量79

点赞数

分类专栏： openCL 文章标签： opencl

原文链接：https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_API.html#_host_side_and_device_side_commands

版权

openCL 专栏收录该内容

148 篇文章 11 订阅

订阅专栏

3.3.7.5. Host-side and Device-side Commands

3.3.7.5. 主机端和设备端命令

This section describes how the OpenCL API functions associated with command-queues contribute to happens-before relations. There are two types of command-queues and associated API functions in OpenCL 2.x; host command-queues and device command-queues. The interaction of these command-queues with the memory model are for the most part equivalent. In a few cases, the rules only applies to the host command-queue. We will indicate these special cases by specifically denoting the host command-queue in the memory ordering rule. SVM memory consistency in such instances is implied only with respect to synchronizing host commands.

本节描述了与命令队列关联的OpenCL API函数如何对先发生后发生的关系做出贡献。OpenCL2.x中有两种类型的命令队列和相关的API函数；主机命令队列和设备命令队列。这些命令队列与内存模型的交互在很大程度上是等效的。在少数情况下，这些规则仅适用于主机命令队列。我们将通过在内存排序规则中专门表示主机命令队列来指示这些特殊情况。这种情况下的SVM内存一致性仅在同步主机命令方面是隐含的。

Memory ordering rules in this section apply to all memory objects (buffers, images and pipes) as well as to SVM allocations where no earlier, and more fine-grained, rules apply.

本节中的内存排序规则适用于所有内存对象（缓冲区、图像和管道），也适用于SVM分配，其中不适用更早、更细粒度的规则。

In the remainder of this section, we assume that each command C enqueued onto a command-queue has an associated event object E that signals its execution status, regardless of whether E was returned to the unit of execution that enqueued C. We also distinguish between the API function call that enqueues a command C and creates an event E, the execution of C, and the completion of C(which marks the event E as complete).

在本节的剩余部分中，我们假设排队到命令队列中的每个命令C都有一个相关的事件对象E，该对象表示其执行状态，而不管E是否返回到排队到C的执行单元。我们还区分API函数调用、C的执行和C的完成（将事件E标记为完成）。

The ordering and synchronization rules for API commands are defined as following:

API命令的排序和同步规则定义如下：

1.If an API function call X enqueues a command C, then X global-synchronizes-with C. For example, a host API function to enqueue a kernel global-synchronizes-with the start of that kernel-instances execution, so that memory updates sequenced-before the enqueue kernel function call will global-happen-before any kernel reads or writes to those same memory locations. For a device-side enqueue, global memory updates sequenced before X happens-before C reads or writes to those memory locations only in the case of fine-grained SVM.

1.如果API函数调用X使命令C入队，则X全局与C同步。例如，使内核入队的主机API函数在内核实例执行开始时全局同步，以便在任何内核读取或写入相同的内存位置之前，内存在入队内核函数调用之前按顺序进行更新。对于设备侧排队，仅在细粒度SVM的情况下，在C读取或写入这些内存位置之前，X之前排序的全局内存更新才会发生。

2.If E is an event upon which a command C waits, then E global-synchronizes-with C. In particular, if C waits on an event E that is tracking the execution status of the command C1, then memory operations done by C1 will global-happen-before memory operations done by C. As an example, assume we have an OpenCL program using coarse-grain SVM sharing that enqueues a kernel to a host command-queue to manipulate the contents of a region of a buffer that the host thread then accesses after the kernel completes. To do this, the host thread can call clEnqueueMapBuffer to enqueue a blocking-mode map command to map that buffer region, specifying that the map command must wait on an event signaling the kernels completion. When clEnqueueMapBuffer returns, any memory operations performed by the kernel to that buffer region will global- happen-before subsequent memory operations made by the host thread.

2.如果E是命令C等待的事件，则E全局与C同步。特别地，如果C等待正在跟踪命令C1的执行状态的事件E，则由C1完成的存储器操作将全局地发生在由C完成的存储器运算之前。举个例子，假设我们有一个使用粗粒度SVM共享的OpenCL程序，该程序将内核排入主机命令队列，以操作缓冲区区域的内容，然后主机线程在内核完成后访问该区域。为此，主机线程可以调用clEnqueueMapBuffer，将阻塞模式映射命令排队以映射该缓冲区，指定映射命令必须等待内核完成的事件。当clEnqueueMapBuffer返回时，内核对该缓冲区执行的任何内存操作都将全局发生在主机线程执行后续内存操作之前。

3.If a command C has an event E that signals its completion, then C global- synchronizes-with E.

3.如果命令C具有表示其完成的事件E，则C全局与E同步。

4.For a command C enqueued to a host-side command-queue, if C has an event E that signals its completion, then E global-synchronizes-with an API call X that waits on E. For example, if a host thread or kernel-instance calls the wait-for-events function on E (e.g. the clWaitForEvents function called from a host thread), then E global-synchronizes-with that wait-for-events function call.

4.对于排队到主机端命令队列的命令C，如果C具有表示其完成的事件E，则E与等待E的API调用X进行全局同步。例如，如果主机线程或内核实例调用E上的等待事件函数（例如，从主机线程调用的clWaitForEvents函数），则E全局与该等待事件函数调用同步。

5.If commands C and C1 are enqueued in that sequence onto an in-order command-queue, then the event (including the event implied between C and C1 due to the in-order queue) signaling C's completion global-synchronizes-with C1. Note that in OpenCL 2.x, only a host command-queue can be configured as an in-order queue.

5.如果命令C和C1按该顺序排队到有序命令队列中，则用信号通知C的完成全局的事件（包括由于有序队列而在C和C1之间隐含的事件）与C1同步。请注意，在OpenCL2.x中，只有一个主机命令队列可以配置为有序队列。

6.If an API call enqueues a marker command C with an empty list of events upon which C should wait, then the events of all commands enqueued prior to C in the command-queue global-synchronize-with C.

6.如果API调用将标记命令C与C应该等待的事件的空列表一起入队，则在命令队列中在C之前入队的所有命令的事件与C全局同步。

7.If a host API call enqueues a command-queue barrier command C with an empty list of events on which C should wait, then the events of all commands enqueued prior to C in the command-queue global-synchronize-with C. In addition, the event signaling the completion of C global-synchronizes-with all commands enqueued after C in the command-queue.

7.如果主机API调用将命令队列屏障命令C与C应该等待的事件的空列表一起排队，则在命令队列全局同步-与C一起的C之前排队的所有命令的事件。此外，用信号通知C全局完成的事件与命令队列中在C之后排队的所有命令同步。

8.If a host thread executes a clFinish call X, then the events of all commands enqueued prior to X in the command-queue global-synchronizes-with X.

8.如果主线程执行clFinish调用X，则命令队列全局中在X之前排队的所有命令的事件都与X同步。

9.The start of a kernel-instance K global-synchronizes-with all operations in the work-items of K. Note that this includes the execution of any atomic operations by the work-items in a program using fine-grain SVM.

9.内核实例K全局的启动与K的工作项中的所有操作同步。请注意，这包括使用细粒度SVM由程序中的工作项执行任何原子操作。

10.All operations of all work-items of a kernel-instance K global-synchronizes-with the event signaling the completion of K. Note that this also includes the execution of any atomic operations by the work-items in a program using fine-grain SVM.

10.内核实例K的所有工作项的所有操作全局同步于发出完成K的信号的事件。请注意，这还包括使用细粒度SVM由程序中的工作项执行任何原子操作。

11.If a callback procedure P is registered on an event E, then E global-synchronizes-with all operations of P. Note that callback procedures are only defined for commands within host command-queues.

11.如果回调过程P在事件E上注册，则E全局同步P的所有操作。请注意，回调过程仅为主机命令队列中的命令定义。

12.If C is a command that waits for an event E's completion, and API function call X sets the status of a user event E's status to CL_COMPLETE (for example, from a host thread using a clSetUserEventStatus function), then X global-synchronizes-with C.

12.如果C是等待事件E完成的命令，并且API函数调用X将用户事件E的状态设置为CL_COMPLETE（例如，从使用clSetUserEventStatus函数的主机线程），则X与C全局同步。

13.If a device enqueues a command C with the CLK_ENQUEUE_FLAGS_WAIT_KERNEL flag, then the end state of the parent kernel instance global-synchronizes with C.

13.如果设备将命令C与CLK_ENQUEUE_FLAGS_WAIT_KERNEL标志排入队列，则父内核实例全局的结束状态与C同步。

14.If a work-group enqueues a command C with the CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP flag, then the end state of the work-group global-synchronizes with C.

14.如果工作组将命令C与CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP标志一起排队，则工作组全局的结束状态与C同步。

When using an out-of-order command-queue, a wait on an event or a marker or command-queue barrier command can be used to ensure the correct ordering of dependent commands. In those cases, the wait for the event or the marker or barrier command will provide the necessary global-synchronizes-with relation.

当使用无序的命令队列时，可以使用等待事件或标记或命令队列屏障命令来确保相关命令的正确顺序。在这些情况下，等待事件或标记或屏障命令将提供与关系的必要全局同步。

In this situation:

在这种情况下：

access to shared locations or disjoint locations in a single cl_mem object when using atomic operations from different kernel instances enqueued from the host such that one or more of the atomic operations is a write is implementation-defined and correct behavior is not guaranteed except at synchronization points.
当使用来自从主机排队的不同内核实例的原子操作时，访问单个cl_mem对象中的共享位置或不相交位置，使得原子操作中的一个或多个是写入实现定义的，并且除了在同步点之外，不能保证正确的行为。
access to shared locations or disjoint locations in a single cl_mem object when using atomic operations from different kernel instances consisting of a parent kernel and any number of child kernels enqueued by that kernel is guaranteed under the memory ordering rules described earlier in this section.
当使用来自由父内核和该内核排队的任何数量的子内核组成的不同内核实例的原子操作时，根据本节前面描述的内存排序规则，可以保证对单个cl_mem对象中的共享位置或不相交位置的访问。
access to shared locations or disjoint locations in a single program scope global variable, coarse-grained SVM allocation or fine-grained SVM allocation when using atomic operations from different kernel instances enqueued from the host to a single device is guaranteed under the memory ordering rules described earlier in this section.
当使用从主机排队到单个设备的不同内核实例的原子操作时，对单个程序范围全局变量中的共享位置或不相交位置的访问、粗粒度SVM分配或细粒度SVM分配在本节前面描述的存储器排序规则下得到保证。

If fine-grain SVM is used but without support for the OpenCL 2.x atomic operations, then the host and devices can concurrently read the same memory locations and can concurrently update non-overlapping memory regions, but attempts to update the same memory locations are undefined. Memory consistency is guaranteed at the OpenCL synchronization points without the need for calls to clEnqueueMapBuffer and clEnqueueUnmapMemObject. For fine-grained SVM buffers it is guaranteed that at synchronization points only values written by the kernel will be updated. No writes to fine-grained SVM buffers can be introduced that were not in the original program.

如果使用细粒度SVM，但不支持OpenCL 2.x原子操作，则主机和设备可以同时读取相同的存储器位置，并可以同时更新不重叠的存储器区域，但更新相同存储器位置的尝试是未定义的。在OpenCL同步点保证内存一致性，而无需调用clEnqueueMapBuffer和clEnqueueUnmapMemObject。对于细粒度SVM缓冲区，可以保证在同步点只有内核写入的值才会更新。不能引入原始程序中没有的对细粒度SVM缓冲区的写入。

In the remainder of this section, we discuss a few points regarding the ordering rules for commands with a host command-queue.

在本节的剩余部分中，我们将讨论关于带有主机命令队列的命令的排序规则的几点。

In an OpenCL 1.x implementation a synchronization point is a kernel-instance or host program location where the contents of memory visible to different work-items or command-queue commands are the same. It also says that waiting on an event and a command-queue barrier are synchronization points between commands in command-queues. Four of the rules listed above (2, 4, 7, and 8) cover these OpenCL synchronization points.

在OpenCL1.x实现中，同步点是内核实例或主机程序位置，不同工作项或命令队列命令可见的内存内容相同。它还说，等待事件和命令队列屏障是命令队列中命令之间的同步点。上面列出的四个规则（2、4、7和8）涵盖了这些OpenCL同步点。

A map operation (clEnqueueMapBuffer or clEnqueueMapImage) performed on a non-SVM buffer or a coarse-grained SVM buffer is allowed to overwrite the entire target region with the latest runtime view of the data as seen by the command with which the map operation synchronizes, whether the values were written by the executing kernels or not. Any values that were changed within this region by another kernel or host thread while the kernel synchronizing with the map operation was executing may be overwritten by the map operation.

允许在非SVM缓冲区或粗粒度SVM缓冲区上执行的映射操作（clEnqueueMapBuffer或clEnqueueMappingImage）用映射操作同步的命令所看到的数据的最新运行时视图覆盖整个目标区域，无论这些值是否由执行内核写入。在执行与映射操作同步的内核时，其他内核或主机线程在此区域内更改的任何值都可能被映射操作覆盖。

Access to non-SVM cl_mem buffers and coarse-grained SVM allocations is ordered at synchronization points between host commands. In the presence of an out-of-order command-queue or a set of command-queues mapped to the same device, multiple kernel instances may execute concurrently on the same device.

对非SVM cl_mem缓冲区和粗粒度SVM分配的访问是在主机命令之间的同步点排序的。在存在无序命令队列或映射到同一设备的一组命令队列的情况下，多个内核实例可以在同一设备上同时执行。

꧁白杨树下꧂

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
OpenCL™规范 3.3.7.5. 主机端和设备端命令

7.如果主机API调用将命令队列屏障命令C与C应该等待的事件的空列表一起排队，则在命令队列全局同步-与C一起的C之前排队的所有命令的事件。当使用来自从主机排队的不同内核实例的原子操作时，访问单个cl_mem对象中的共享位置或不相交位置，使得原子操作中的一个或多个是写入实现定义的，并且除了在同步点之外，不能保证正确的行为。当使用来自由父内核和该内核排队的任何数量的子内核组成的不同内核实例的原子操作时，根据本节前面描述的内存排序规则，可以保证对单个cl_mem对象中的共享位置或不相交位置的访问。
复制链接

扫一扫

专栏目录