OpenCL™规范 3.2.3设备侧队列

3.2.3. Device-side enqueue
3.2.3设备侧队列

Device-side enqueue is missing before version 2.0.

​2.0版本之前缺少设备端队列。

Algorithms may need to generate additional work as they execute. In many cases, this additional work cannot be determined statically; so the work associated with a kernel only emerges at runtime as the kernel-instance executes. This capability could be implemented in logic running within the host program, but involvement of the host may add significant overhead and/or complexity to the application control flow. A more efficient approach would be to nest kernel-enqueue commands from inside other kernels. This nested parallelism can be realized by supporting the enqueuing of kernels on a device without direct involvement by the host program; so-called device-side enqueue.

算法在执行时可能需要生成额外的工作。在许多情况下,这种额外的工作量不能静态地确定;因此,与内核相关联的工作仅在运行时内核实例执行时出现。这种能力可以在主机程序内运行的逻辑中实现,但是主机的参与可能会给应用程序控制流增加显著的开销或复杂性。更有效的方法是从其他内核内部嵌套内核入队命令。这种嵌套的并行性可以通过支持内核在设备上的排队来实现,而无需主机程序的直接参与;所谓的设备侧队列。

Device-side kernel-enqueue commands are similar to host-side kernel-enqueue commands. The kernel executing on a device (the parent kernel) enqueues a kernel-instance (the child kernel) to a device-side command queue. This is an out-of-order command-queue and follows the same behavior as the out-of-order command-queues exposed to the host program. Commands enqueued to a device side command-queue generate and use events to enforce order constraints just as for the command-queue on the host. These events, however, are only visible to the parent kernel running on the device. When these prerequisite events take on the value CL_COMPLETE, the work-groups associated with the child kernel are launched into the devices work pool. The device then schedules them for execution on the compute units of the device. Child and parent kernels execute asynchronously. However, a parent will not indicate that it is complete by setting its event to CL_COMPLETE until all child kernels have ended execution and have signaled completion by setting any associated events to the value CL_COMPLETE. Should any child kernel complete with an event status set to a negative value (i.e. abnormally terminate), the parent kernel will abnormally terminate and propagate the childs negative event value as the value of the parents event. If there are multiple children that have an event status set to a negative value, the selection of which childs negative event value is propagated is implementation-defined.

​设备端内核入队命令类似于主机端内核入团命令。在设备上执行的内核(父内核)将内核实例(子内核)排入设备端命令队列。这是一个无序的命令队列,其行为与主机程序暴露的无序命令队列相同。排入设备端命令队列的命令生成并使用事件来强制执行顺序约束,就像主机上的命令队列一样。但是,这些事件仅对设备上运行的父内核可见。当这些先决条件事件的值为CL_COMPLETE时,与子内核相关联的工作组将启动到设备工作池中。然后,设备调度它们以便在设备的计算单元上执行。子内核和父内核异步执行。然而,直到所有子内核都结束了执行并通过将任何相关联的事件设置为值CL_complete来发出完成信号,父内核才会通过将其事件设置为CL_complete来指示其完成。如果任何子内核的事件状态设置为负值(即异常终止),则父内核将异常终止并将子内核的负事件值作为父事件的值传播。如果有多个子级的事件状态设置为负值,则传播哪个子级的负事件值的选择是由实现定义的。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值