3.2.1. Mapping work-items onto an NDRange

The index space supported by OpenCL is called an NDRange. An NDRange is an N-dimensional index space, where N is one, two or three. The NDRange is decomposed into work-groups forming blocks that cover the Index space. An NDRange is defined by three integer arrays of length N:


  • The extent of the index space (or global size) in each dimension.

  • 每个维度中索引空间(或全局大小)的范围。

  • An offset index F indicating the initial value of the indices in each dimension (zero by default).

  • 偏移索引F指示每个维度中的索引的初始值(默认为零)。

  • The size of a work-group (local size) in each dimension.

  • 每个维度中工作组的大小(局部大小)。

Each work-items global ID is an N-dimensional tuple. The global ID components are values in the range from F, to F plus the number of elements in that dimension minus one.


Unless a kernel comes from a source that disallows it, e.g. OpenCL C 1.x or using -cl-uniform-work-group-size, the size of work-groups in an NDRange (the local size) need not be the same for all work-groups. In this case, any single dimension for which the global size is not divisible by the local size will be partitioned into two regions. One region will have work-groups that have the same number of work-items as was specified for that dimension by the programmer (the local size). The other region will have work-groups with less than the number of work items specified by the local size parameter in that dimension (the remainder work-groups). Work-group sizes could be non-uniform in multiple dimensions, potentially producing work-groups of up to 4 different sizes in a 2D range and 8 different sizes in a 3D range.

除非内核来自不允许它的源,例如OpenCL C1.x或使用-cl-uniform-work-group-size,否则NDRange中的工作组大小(本地大小)不必对所有工作组都相同。在这种情况下,全局大小不能被局部大小整除的任何单个维度都将被划分为两个区域。一个区域将具有与程序员为该维度指定的工作项数量相同的工作组(本地大小)。另一个区域的工作组的工作项数将少于该维度中本地大小参数指定的工作项数目(其余工作组)。工作组的大小在多个维度上可能是不均匀的,可能在2D范围内产生多达4个不同大小的工作组,在3D范围内产生8个不同尺寸的工作组。

Non-uniform work-group sizes are missing before version 2.0.


Each work-item is assigned to a work-group and given a local ID to represent its position within the work-group. A work-item’s local ID is an N-dimensional tuple with components in the range from zero to the size of the work-group in that dimension minus one.


Work-groups are assigned IDs similarly. The number of work-groups in each dimension is not directly defined but is inferred from the local and global NDRanges provided when a kernel-instance is enqueued. A work-group’s ID is an N-dimensional tuple with components in the range 0 to the ceiling of the global size in that dimension divided by the local size in the same dimension. As a result, the combination of a work-group ID and the local-ID within a work-group uniquely defines a work-item. Each work-item is identifiable in two ways; in terms of a global index, and in terms of a work-group index plus a local index within a work-group.


For example, consider the 2-dimensional index space shown below. We input the index space for the work-items (Gx, Gy), the size of each work-group (Sx, Sy) and the global ID offset (Fx, Fy). The global indices define an Gx by Gy index space where the total number of work-items is the product of Gx and Gy. The local indices define an Sx by Sy index space where the number of work-items in a single work-group is the product of Sx and Sy. Given the size of each work-group and the total number of work-items we can compute the number of work-groups. A 2-dimensional index space is used to uniquely identify a work-group. Each work-item is identified by its global ID (gx, gy) or by the combination of the work-group ID (wx, wy), the size of each work-group (Sx,Sy) and the local ID (sx, sy) inside the work-group such that


  • (gx, gy) = (wx × Sx + sx + Fx, wy × Sy + sy + Fy)

The number of work-groups can be computed as:


  • (Wx, Wy) = (ceil(Gx / Sx), ceil(Gy / Sy))

Given a global ID and the work-group size, the work-group ID for a work-item is computed as:


  • (wx, wy) = ( (gx - sx - Fx) / Sx, (gy - sy - Fy) / Sy )

Figure 3. An example of an NDRange index space showing work-items, their global IDs and their mapping onto the pair of work-group and local IDs. In this case, we assume that in each dimension, the size of the work-group evenly divides the global NDRange size (i.e. all work-groups have the same size) and that the offset is equal to zero.


Within a work-group work-items may be divided into sub-groups. The mapping of work-items to sub-groups is implementation-defined and may be queried at runtime. While sub-groups may be used in multi-dimensional work-groups, each sub-group is 1-dimensional and any given work-item may query which sub-group it is a member of.


Sub-groups are missing before version 2.1.


Work-items are mapped into sub-groups through a combination of compile-time decisions and the parameters of the dispatch. The mapping to sub-groups is invariant for the duration of a kernels execution, across dispatches of a given kernel with the same work-group dimensions, between dispatches and query operations consistent with the dispatch parameterization, and from one work-group to another within the dispatch (excluding the trailing edge work-groups in the presence of non-uniform work-group sizes). In addition, all sub-groups within a work-group will be the same size, apart from the sub-group with the maximum index which may be smaller if the size of the work-group is not evenly divisible by the size of the sub-groups.


In the degenerate case, a single sub-group must be supported for each work-group. In this situation all sub-group scope functions are equivalent to their work-group level equivalents.






