OpenCL Programming Model
OpenCL编程模型
To understand how to program OpenCL in more detail let's consider the Platform, Execution and Memory Models. The three models interact and define OpenCL's essential operation.
为了更详细地了解如何对OpenCL进行编程,让我们考虑平台、执行和内存模型。这三个模型相互作用并定义了OpenCL的基本操作。
Platform Model
平台模型
The OpenCL Platform Model describes how OpenCL understands the compute resources in a system to be topologically connected.
OpenCL平台模型描述了OpenCL如何理解要拓扑连接的系统中的计算资源。
A host is connected to one or more OpenCL compute devices. Each compute device is collection of one or more compute units where each compute unit is composed of one or more processing elements. Processing elements execute code with SIMD (Single Instruction Multiple Data) or SPMD (Single Program Multiple Data) parallelism.
主机连接到一个或多个OpenCL计算设备。每个计算设备是一个或多个计算单元的集合,其中每个计算单元由一个或更多个处理元件组成。处理元件执行具有SIMD(单指令多数据)或SPMD(单程序多数据)并行性的代码。

OpenCL Platform Model
OpenCL平台模型
For example, a compute device could be a GPU. Compute units would then correspond to the streaming multiprocessors (SMs) inside the GPU, and processing elements correspond to individual streaming processors (SPs) inside each SM. Processors typically group processing elements into compute units for implementation efficiency through sharing instruction dispatch and memory resources, and increasing local inter-processor communication.
例如,计算设备可以是GPU。然后,计算单元将对应于GPU内的流式多处理器(SM),处理单元对应于每个SM内的单个流式处理器(SP)。处理器通常将处理单元分组为计算单元,通过共享指令调度和内存资源以及增加本地处理器间通信来提高实现效率。
Execution Model
执行模型
OpenCL's clEnqueueNDRangeKernel command enables a single kernel program to be initiated to operate in parallel across an N-dimensional data structure. Using a two-dimensional image as a example, the size of the image would be the NDRange, and each pixel is called a work-item that a copy of kernel running on a single processing element will operate on.
OpenCL的clEnqueueNDRangeKernel命令使单个内核程序能够在N维数据结构中并行运行。以二维图像为例,图像的大小将是NDRange,每个像素都被称为工作项,运行在单个处理元素上的内核副本将对其进行操作。
As we saw in the Platform Model section above, it is common for processors to group processing elements into compute units for execution efficiency. Therefore, when using the clEnqueueNDRangeKernel command, the program specifies a work-group size that represents groups of individual work-items in an NDRange that can be accommodated on a compute unit. Work-items in the same work-group are able to share local memory, synchronize more easily using work-group barriers, and cooperate more efficiently using work-group functions such as async_work_group_copy that are not available between work-items in separate work-groups.
正如我们在上面的平台模型部分中看到的,为了提高执行效率,处理器通常将处理元素分组到计算单元中。因此,当使用clEnqueueNDRangeKernel命令时,程序会指定一个工作组大小,该大小表示NDRange中可以容纳在计算单元上的单个工作项的组。同一工作组中的工作项能够共享本地内存,使用工作组屏障更容易地同步,并使用工作组功能(如async_Work_group_copy)更有效地协作,这些功能在不同工作组中工作项之间不可用。

A 2D Image as an Example NDRange
以2D图像为例NDRange
Memory Model
内存模型
OpenCL has a hierarchy of memory types:
OpenCL具有内存类型的层次结构:
-
Host memory - available to the host CPU
-
主机内存-可用于主机CPU
-
Global/Constant memory - available to all compute units in a compute device
-
全局/常量内存-可用于计算设备中的所有计算单元
-
Local memory - available to all the processing elements in a compute unit
-
本地内存-可用于计算单元中的所有处理元素
-
Private memory - available to a single processing element
-
专用内存-可用于单个处理元件

OpenCL Memory Model
OpenCL内存模型
OpenCL memory management is explicit. None of the above memories are automatically synchronized and so the application explicitly moves data between memory types as needed.
OpenCL内存管理是明确的。上述存储器都不是自动同步的,因此应用程序根据需要在存储器类型之间显式地移动数据。
5312

被折叠的 条评论
为什么被折叠?



