OpenCL™规范 3.2.4同步

3.2.4. Synchronization
3.2.4同步

Synchronization refers to mechanisms that constrain the order of execution between two or more units of execution. Consider the following three domains of synchronization in OpenCL:

同步是指限制两个或多个执行单元之间执行顺序的机制。考虑OpenCL中的以下三个同步域:

  • Work-group synchronization: Constraints on the order of execution for work-items in a single work-group

  • 工作组同步:对单个工作组中工作项执行顺序的约束

  • Sub-group synchronization: Constraints on the order of execution for work-items in a single sub-group. Note: Sub-groups are missing before version 2.1

  • ​子组同步:对单个子组中工作项的执行顺序的约束。注:2.1版本之前缺少子组

  • Command synchronization: Constraints on the order of commands launched for execution

  • 命令同步:对为执行而启动的命令顺序的限制

Synchronization across all work-items within a single work-group is carried out using a work-group function. These functions carry out collective operations across all the work-items in a work-group. Available collective operations are: barrier, reduction, broadcast, prefix sum, and evaluation of a predicate. A work-group function must occur within a converged control flow; i.e. all work-items in the work-group must encounter precisely the same work-group function. For example, if a work-group function occurs within a loop, the work-items must encounter the same work-group function in the same loop iterations. All the work-items of a work-group must execute the work-group function and complete reads and writes to memory before any are allowed to continue execution beyond the work-group function. Work-group functions that apply between work-groups are not provided in OpenCL since OpenCL does not define forward-progress or ordering relations between work-groups, hence collective synchronization operations are not well defined.

使用工作组功能在单个工作组内的所有工作项之间进行同步。这些功能对工作组中的所有工作项执行集体操作。可用的集合操作有:栅栏、减少、广播、前缀和和断言的求值。工作组功能必须出现在聚合控制流中;即工作组中的所有工作项必须遇到完全相同的工作组功能。例如,如果工作组函数发生在循环中,则工作项必须在相同的循环迭代中遇到相同的工作组函数。工作组的所有工作项都必须执行工作组功能并完成对内存的读取和写入,然后才能允许任何工作项在工作组功能之外继续执行。OpenCL中没有提供适用于工作组之间的工作组功能,因为OpenCL没有定义工作组之间向前的进度或排序关系,因此没有很好地定义集体同步操作。

Synchronization across all work-items within a single sub-group is carried out using a sub-group function. These functions carry out collective operations across all the work-items in a sub-group. Available collective operations are: barrier, reduction, broadcast, prefix sum, and evaluation of a predicate. A sub-group function must occur within a converged control flow; i.e. all work-items in the sub-group must encounter precisely the same sub-group function. For example, if a work-group function occurs within a loop, the work-items must encounter the same sub-group function in the same loop iterations. All the work-items of a sub-group must execute the sub-group function and complete reads and writes to memory before any are allowed to continue execution beyond the sub-group function. Synchronization between sub-groups must either be performed using work-group functions, or through memory operations. Using memory operations for sub-group synchronization should be used carefully as forward progress of sub-groups relative to each other is only supported optionally by OpenCL implementations.

使用子组功能在单个子组内的所有工作项之间进行同步。这些功能在子组中的所有工作项上执行集体操作。可用的集合操作有:栅栏、减少、广播、前缀和和断言的求值。子组函数必须出现在收敛的控制流中;即子组中的所有工作项必须恰好遇到相同的子组功能。例如,如果工作组函数发生在循环中,则工作项必须在相同的循环迭代中遇到相同的子组函数。子组的所有工作项必须执行子组功能并完成对内存的读写,然后任何工作项才能在子组功能之外继续执行。子组之间的同步必须使用工作组功能或通过内存操作来执行。应谨慎使用用于子组同步的内存操作,因为只有OpenCL实现可选地支持子组相对于彼此的正向进度。

Command synchronization is defined in terms of distinct synchronization points. The synchronization points occur between commands in host command-queues and between commands in device-side command-queues. The synchronization points defined in OpenCL include:

命令同步是根据不同的同步点来定义的。同步点出现在主机命令队列中的命令之间以及设备端命令队列中命令之间。OpenCL中定义的同步点包括:

  • Launching a command: A kernel-instance is launched onto a device after all events that kernel is waiting-on have been set to CL_COMPLETE.

  • ​启动命令:在内核等待的所有事件都设置为CL_COMPLETE后,内核实例将启动到设备上。

  • Ending a command: Child kernels may be enqueued such that they wait for the parent kernel to reach the end state before they can be launched. In this case, the ending of the parent command defines a synchronization point.

  • 结束一个命令:子内核可以排队,这样它们就可以等待父内核达到结束状态,然后才能启动。在这种情况下,父命令的结束定义了一个同步点。

  • Completion of a command: A kernel-instance is complete after all of the work-groups in the kernel and all of its child kernels have completed. This is signaled to the host, a parent kernel or other kernels within command queues by setting the value of the event associated with a kernel to CL_COMPLETE.

  • ​命令完成:在内核中的所有工作组及其所有子内核完成后,内核实例就完成了。这是通过将与内核相关联的事件值设置为CL_COMPLETE来向主机、父内核或命令队列中的其他内核发出信号的。

  • Blocking Commands: A blocking command defines a synchronization point between the unit of execution that calls the blocking API function and the enqueued command reaching the complete state.

  • 阻塞命令:阻塞命令定义了调用阻塞API函数的执行单元与达到完整状态的排队命令之间的同步点。

  • Command-queue barrier: The command-queue barrier ensures that all previously enqueued commands have completed before subsequently enqueued commands can be launched.

  • 命令队列栅栏:命令队列栅栏确保所有先前排队的命令都已完成,然后才能启动后续排队的命令。

  • clFinish: This function blocks until all previously enqueued commands in the command queue have completed after which clFinish defines a synchronization point and the clFinish function returns.

  • clFinish:此函数将阻塞,直到命令队列中所有先前排队的命令都完成,之后clFinish定义一个同步点,clFinish函数将返回。

A synchronization point between a pair of commands (A and B) assures that results of command A happens-before command B is launched. This requires that any updates to memory from command A complete and are made available to other commands before the synchronization point completes. Likewise, this requires that command B waits until after the synchronization point before loading values from global memory. The concept of a synchronization point works in a similar fashion for commands such as a barrier that apply to two sets of commands. All the commands prior to the barrier must complete and make their results available to following commands. Furthermore, any commands following the barrier must wait for the commands prior to the barrier before loading values and continuing their execution.

一对命令(A和B)之间的同步点确保命令A的结果发生在命令B启动之前。这要求在同步点完成之前,完成命令A对内存的任何更新,并使其可用于其他命令。同样,这需要命令B等待,直到同步点之后才从全局内存加载值。同步点的概念以类似的方式适用于命令,例如应用于两组命令的栅栏。栅栏之前的所有命令都必须完成,并将其结果提供给以下命令。此外,在加载值并继续执行之前,栅栏之后的任何命令都必须等待栅栏之前的命令。

These happens-before relationships are a fundamental part of the OpenCL 2.x memory model. When applied at the level of commands, they are straightforward to define at a language level in terms of ordering relationships between different commands. Ordering memory operations inside different commands, however, requires rules more complex than can be captured by the high level concept of a synchronization point. These rules are described in detail in Memory Ordering Rules.

​这些发生在关系成为OpenCL2.x内存模型的基本部分之前。当应用于命令级别时,它们在语言级别上可以直接定义为不同命令之间的排序关系。然而,对不同命令中的内存操作进行排序需要比同步点的高级概念更复杂的规则。这些规则在内存排序规则中有详细描述。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值