OpenCL™规范 3.2.2内核实例的执行

3.2.2. Execution of kernel-instances
3.2.2内核实例的执行

The work carried out by an OpenCL program occurs through the execution of kernel-instances on compute devices. To understand the details of OpenCL’s execution model, we need to consider how a kernel object moves from the kernel-enqueue command, into a command-queue, executes on a device, and completes.

OpenCL程序执行的工作是通过在计算设备上执行内核实例来完成的。为了理解OpenCL执行模型的细节,我们需要考虑内核对象如何从内核入队命令移动到命令队列,在设备上执行并完成。

A kernel object is defined as a function within the program object and a collection of arguments connecting the kernel to a set of argument values. The host program enqueues a kernel object to the command queue along with the NDRange and the work-group decomposition. These define a kernel-instance. In addition, an optional set of events may be defined when the kernel is enqueued. The events associated with a particular kernel-instance are used to constrain when the kernel-instance is launched with respect to other commands in the queue or to commands in other queues within the same context.

内核对象被定义为程序对象中的函数和将内核连接到一组参数值的参数集合。主机程序将内核对象与NDRange和工作组分解一起排入命令队列。这些定义了一个内核实例。此外,当内核入队时,可以定义一组可选的事件。与特定内核实例相关联的事件用于约束内核实例何时相对于队列中的其他命令或同一上下文中的其他队列中的命令启动。

A kernel-instance is submitted to a device. For an in-order command queue, the kernel instances appear to launch and then execute in that same order; where we use the term appear to emphasize that when there are no dependencies between commands and hence differences in the order that commands execute cannot be observed in a program, an implementation can reorder commands even in an in-order command queue. For an out of order command-queue, kernel-instances wait to be launched until:

内核实例被提交到设备。对于按顺序排列的命令队列,内核实例似乎以相同的顺序启动并执行;我们使用这个术语似乎是为了强调,当命令之间没有依赖关系,因此在程序中无法观察到命令执行顺序的差异时,即使在有序的命令队列中,实现也可以对命令进行重新排序。对于无序的命令队列,内核实例等待启动,直到:

  • Synchronization commands enqueued prior to the kernel-instance are satisfied.

  • 满足在内核实例之前排队的同步命令。

  • Each of the events in an optional event list defined when the kernel-instance was enqueued are set to CL_COMPLETE.

  • ​内核实例入队时定义的可选事件列表中的每个事件都设置为CL_COMPLETE。

Once these conditions are met, the kernel-instance is launched and the work-groups associated with the kernel-instance are placed into a pool of ready to execute work-groups. This pool is called a work-pool. The work-pool may be implemented in any manner as long as it assures that work-groups placed in the pool will eventually execute. The device schedules work-groups from the work-pool for execution on the compute units of the device. The kernel-enqueue command is complete when all work-groups associated with the kernel-instance end their execution, updates to global memory associated with a command are visible globally, and the device signals successful completion by setting the event associated with the kernel-enqueue command to CL_COMPLETE.

​一旦满足这些条件,就会启动内核实例,并将与内核实例相关联的工作组放入准备执行的工作组池中。此池称为工作池。工作池可以以任何方式实现,只要它确保放置在池中的工作组最终会执行即可。设备调度工作池中的工作组,以便在设备的计算单元上执行。当与内核实例相关联的所有工作组结束执行时,内核入队命令完成,对与命令相关联的全局内存的更新全局可见,并且设备通过将与内核入队指令相关联的事件设置为CL_complete来发出成功完成的信号。

While a command-queue is associated with only one device, a single device may be associated with multiple command-queues all feeding into the single work-pool. A device may also be associated with command queues associated with different contexts within the same platform, again all feeding into the single work-pool. The device will pull work-groups from the work-pool and execute them on one or several compute units in any order; possibly interleaving execution of work-groups from multiple commands. A conforming implementation may choose to serialize the work-groups so a correct algorithm cannot assume that work-groups will execute in parallel. There is no safe and portable way to synchronize across the independent execution of work-groups since once in the work-pool, they can execute in any order.

虽然命令队列仅与一个设备相关联,但单个设备可以与多个命令队列相关联,所有命令队列都馈送到单个工作池中。设备还可以与同一平台内与不同上下文相关联的命令队列相关联,同样所有命令队列都馈送到单个工作池中。该设备将从工作池中提取工作组,并以任何顺序在一个或多个计算单元上执行它们;可能交错执行来自多个命令的工作组。一致性实现可以选择序列化工作组,因此正确的算法不能假设工作组将并行执行。没有安全和可移植的方式来跨工作组的独立执行进行同步,因为一旦进入工作池,它们就可以按任何顺序执行。

The work-items within a single sub-group execute concurrently but not necessarily in parallel (i.e. they are not guaranteed to make independent forward progress). Therefore, only high-level synchronization constructs (e.g. sub-group functions such as barriers) that apply to all the work-items in a sub-group are well defined and included in OpenCL.

单个子组中的工作项同时执行,但不一定是并行的(即不能保证它们独立向前推进)。因此,只有应用于子组中所有工作项的高级同步结构(例如,子组功能,如栅栏)才被很好地定义并包含在OpenCL中。

Sub-groups are missing before version 2.1.

2.1版本之前缺少子组。

Sub-groups execute concurrently within a given work-group and with appropriate device support (see Querying Devices), may make independent forward progress with respect to each other, with respect to host threads and with respect to any entities external to the OpenCL system but running on an OpenCL device, even in the absence of work-group barrier operations. In this situation, sub-groups are able to internally synchronize using barrier operations without synchronizing with each other and may perform operations that rely on runtime dependencies on operations other sub-groups perform.

​子组在给定的工作组内同时执行,并具有适当的设备支持(请参阅查询设备),可以相对于彼此、相对于主机线程以及相对于OpenCL系统外部但在OpenCL设备上运行的任何实体进行独立的前向进程,即使在没有工作组栅栏操作的情况下也是如此。在这种情况下,子组能够使用栅栏操作进行内部同步,而无需彼此同步,并且可以执行依赖于其他子组执行的操作的运行时依赖性的操作。

The work-items within a single work-group execute concurrently but are only guaranteed to make independent progress in the presence of sub-groups and device support. In the absence of this capability, only high-level synchronization constructs (e.g. work-group functions such as barriers) that apply to all the work-items in a work-group are well defined and included in OpenCL for synchronization within the work-group.

单个工作组中的工作项同时执行,但只有在有子组和设备支持的情况下才能保证独立进行。在缺乏这种能力的情况下,只有应用于工作组中所有工作项的高级同步结构(例如,工作组功能,如栅栏)才能很好地定义并包含在OpenCL中,用于在工作组内进行同步。

In the absence of synchronization functions (e.g. a barrier), work-items within a sub-group may be serialized. In the presence of sub -group functions, work-items within a sub -group may be serialized before any given sub -group function, between dynamically encountered pairs of sub-group functions and between a work-group function and the end of the kernel.

在没有同步功能(例如栅栏)的情况下,子组内的工作项可以被序列化。在存在子组函数的情况下,子组内的工作项可以在任何给定的子组函数之前、在动态遇到的子组功能对之间以及在工作组函数和内核的末尾之间进行序列化。

In the absence of independent forward progress of constituent sub-groups, work-items within a work-group may be serialized before, after or between work-group synchronization functions.

在不存在组成子组的独立正向进度的情况下,工作组内的工作项可以在工作组同步功能之前、之后或之间序列化。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值