source:
https://blog.csdn.net/zhouxuanyuye/article/details/80445076
global ID: imagine all the work items (AMD) or threads (NV) is placed in a one-dim array, the global ID for a certain work item or thread is the index of it in that array.
in a GPU, the 'one-dim array' mentioned above is divided into blocks (NV) or work group (AMD), and each block or work group is further divided into threads (NV) or work items (AMD) from the hardware point of view, and the threads or work items in each block share the memory resources.
This post is my notepad while figuring out how OpenCL handles assigning work item ids.
Important links: The basics:- A Kernel is invoked once for each work item. Each work item has private memory.
- Work items are grouped into a work group. Each work group shares local memory
- The total number of all work items is specified by the global work size. global and constants memory is shared across all work work items of all work groups.
in openCL, the size of the index space is specified by 'NDRange', which can be 1, 2, 3-dimensional.
这里之所以要用"global id"这个词,是因为它描述的是全局/完整的工作空间。其实,work-item还可以组织成work-group。在平台模型中我们学过,一个OpenCL设备可能由多个计算单元组成,而每个计算单元由多个处理单元组成;一个"work item",是运行在某个处理单元上的kernel的实例。那么,同一个计算单元上的多个处理单元上的work-item组成一个work-group是就很自然的事情了。定义了work-group之后,需要一个work-group ID来标识每个work-group。同时在每个work-group内部,可以给每个work-item一个local id来(在内部)标识它们。这样每个work-item就有两种方式来标识它了,第一,global id;第二,它所在work-group的group id和它在那个work-group内部的local id。既然这样,这些ID之间肯定是可以互相转换的了。
Here’s the standard picture for a two dimensional work space. each rectangle represents a work item and each of the grouped rectangles represents a work group. (all the groups together form the grid, note the capital S and lower case s used here for different meanings)
OpenCL works with the notion of dimension, that means you can declare your number of work items by giving them dimensional indices. In the above example, the size of a work group Sx=4 and Sy=4. How many dimensions you use is up to you, however there’s a physical limit on the maximum number of total work items per group as well as globally.Inside a kernel, you can query the position of the work item this kernel instance is executing relative to the group or global.
Querying the global position is done using get_global_id(dim) where dim is the dimension index (0 for first, 1 for second dimension etc.) The above call is equivalent to get_local_size(dim)*get_group_id(dim) + get_local_id(dim). get_local_size(dim) is the group size in dim, get_group_id(dim) is the group position in dim relative to all other groups (globally) and get_local_id(dim) is the position of a work item relative to the group. You can see this in the following annotated figure:
Since the OpenCL APIs only require you to specify global size (total number of work items in a dimension) and local size (number of work items per group) this means that the number of groups is inferred from that data.