OpenCL™规范 3.3.7.3. 工作组函数

本文详细解释了OpenCL中的工作组函数,特别是工作项间的同步机制,如barrier,它涉及到内存排序和同步规则。同时提及了扫描、缩减和广播等其他工作组功能,强调了它们在内核执行中的顺序依赖性。
摘要由CSDN通过智能技术生成
3.3.7.3. Work-group Functions
3.3.7.3. 工作组函数

The OpenCL kernel execution model includes collective operations across the work-items within a single work-group. These are called work-group functions, and include functions such as barriers, scans, reductions, and broadcasts. We will first discuss the work-group barrier function. Other work-group functions are discussed afterwards.

OpenCL内核执行模型包括单个工作组中跨工作项的集体操作。这些被称为工作组功能,包括栅栏、扫描、减少和广播等功能。我们将首先讨论工作组栅栏功能。随后将讨论其他工作组功能。

The barrier function provides a mechanism for a kernel to synchronize the work-items within a single work-group: informally, each work-item of the work-group must execute the barrier before any are allowed to proceed. It also orders memory operations to a specified combination of one or more address spaces such as local memory or global memory, in a similar manner to a fence.

barrier函数为内核提供了一种机制来同步单个工作组中的工作项:非正式地说,工作组的每个工作项都必须执行barrier,然后才允许任何工作项继续进行。它还以类似于栅栏的方式,将内存操作命令到一个或多个地址空间(如本地内存或全局内存)的指定组合。

To precisely specify the memory ordering semantics for barrier, we need to distinguish between a dynamic and a static instance of the call to a barrier. A call to a barrier can appear in a loop, for example, and each execution of the same static barrier call results in a new dynamic instance of the barrier that will independently synchronize a work-groups work-items.

为了准确地指定barrier的内存排序语义,我们需要区分对barrier调用的动态实例和静态实例。例如,对栅栏的调用可以出现在循环中,并且同一静态栅栏调用的每次执行都会导致栅栏的新的动态实例,该实例将独立地同步工作组工作项。

A work-item executing a dynamic instance of a barrier results in two operations, both fences, that are called the entry and exit fences. These fences obey all the rules for fences specified elsewhere in this chapter as well as the following:

执行栅栏的动态实例的工作项会导致两个操作,都是栅栏,称为入口围栏和出口围栏。这些围栏遵守本章其他地方规定的所有栅栏规则以及以下规定:

  • The entry fence is a release fence with the same flags and scope as requested for the barrier.

  • 入口栅栏是一个release栅栏,其标志和范围与要求的栅栏相同。

  • The exit fence is an acquire fence with the same flags and scope as requested for the barrier.

  • 出口栅栏是一个acquire栅栏,其标志和范围与要求的栅栏相同。

  • For each work-item the entry fence is sequenced before the exit fence.

  • 对于每个工作项目,入口栅栏在出口栅栏之前按顺序排列。

  • If the flags have CLK_GLOBAL_MEM_FENCE set then for each work-item the entry fence global-synchronizes-with the exit fence of all other work-items in the same work-group.

  • 如果标志设置了CLK_GLOBAL_MEM_FENCE,则对于每个工作项,全局入口栅栏与同一工作组中所有其他工作项的出口栅栏同步。

  • If the flags have CLK_LOCAL_MEM_FENCE set then for each work-item the entry fence local-synchronizes-with the exit fence of all other work-items in the same work-group.

  • 如果标志设置了CLK_LOCAL_MEM_FENCE,则对于每个工作项,本地入口栅栏与同一工作组中所有其他工作项的出口栅栏同步。

Other work-group functions include such functions as scans, reductions, and broadcasts, and are described in the kernel language and IL specifications. The use of these work-group functions implies sequenced-before relationships between statements within the execution of a single work-item in order to satisfy data dependencies. For example, a work-item that provides a value to a work-group function must behave as if it generates that value before beginning execution of that work-group function. Furthermore, the programmer must ensure that all work-items in a work-group must execute the same work-group function call site, or dynamic work-group function instance.

其他工作组功能包括扫描、缩减和广播等功能,并在内核语言和IL规范中进行了描述。这些工作组函数的使用意味着在单个工作项的执行中,语句之间的关系按顺序排列,以满足数据依赖性。例如,为工作组函数提供值的工作项必须表现为在开始执行该工作组函数之前生成该值。此外,程序员必须确保工作组中的所有工作项必须执行相同的工作组函数调用站点或动态工作组函数实例。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值