OpenCL编程模型

OpenCL Programming Model

OpenCL编程模型

To understand how to program OpenCL in more detail let's consider the Platform, Execution and Memory Models. The three models interact and define OpenCL's essential operation.

为了更详细地了解如何对OpenCL进行编程,让我们考虑平台、执行和内存模型。这三个模型相互作用并定义了OpenCL的基本操作。

Platform Model

平台模型

The OpenCL Platform Model describes how OpenCL understands the compute resources in a system to be topologically connected.

OpenCL平台模型描述了OpenCL如何理解要拓扑连接的系统中的计算资源。

host is connected to one or more OpenCL compute devices. Each compute device is collection of one or more compute units where each compute unit is composed of one or more processing elements. Processing elements execute code with SIMD (Single Instruction Multiple Data) or SPMD (Single Program Multiple Data) parallelism.

主机连接到一个或多个OpenCL计算设备。每个计算设备是一个或多个计算单元的集合,其中每个计算单元由一个或更多个处理元件组成。处理元件执行具有SIMD(单指令多数据)或SPMD(单程序多数据)并行性的代码。



OpenCL Platform Model

OpenCL平台模型

For example, a compute device could be a GPU. Compute units would then correspond to the streaming multiprocessors (SMs) inside the GPU, and processing elements correspond to individual streaming processors (SPs) inside each SM. Processors typically group processing elements into compute units for implementation efficiency through sharing instruction dispatch and memory resources, and increasing local inter-processor communication.

例如,计算设备可以是GPU。然后,计算单元将对应于GPU内的流式多处理器(SM),处理单元对应于每个SM内的单个流式处理器(SP)。处理器通常将处理单元分组为计算单元,通过共享指令调度和内存资源以及增加本地处理器间通信来提高实现效率。

Execution Model

执行模型

OpenCL's clEnqueueNDRangeKernel command enables a single kernel program to be initiated to operate in parallel across an N-dimensional data structure. Using a two-dimensional image as a example, the size of the image would be the NDRange, and each pixel is called a work-item that a copy of kernel running on a single processing element will operate on.

OpenCL的clEnqueueNDRangeKernel命令使单个内核程序能够在N维数据结构中并行运行。以二维图像为例,图像的大小将是NDRange,每个像素都被称为工作项,运行在单个处理元素上的内核副本将对其进行操作。

As we saw in the Platform Model section above, it is common for processors to group processing elements into compute units for execution efficiency. Therefore, when using the clEnqueueNDRangeKernel command, the program specifies a work-group size that represents groups of individual work-items in an NDRange that can be accommodated on a compute unit. Work-items in the same work-group are able to share local memory, synchronize more easily using work-group barriers, and cooperate more efficiently using work-group functions such as async_work_group_copy that are not available between work-items in separate work-groups.

正如我们在上面的平台模型部分中看到的,为了提高执行效率,处理器通常将处理元素分组到计算单元中。因此,当使用clEnqueueNDRangeKernel命令时,程序会指定一个工作组大小,该大小表示NDRange中可以容纳在计算单元上的单个工作项的组。同一工作组中的工作项能够共享本地内存,使用工作组屏障更容易地同步,并使用工作组功能(如async_Work_group_copy)更有效地协作,这些功能在不同工作组中工作项之间不可用。



A 2D Image as an Example NDRange

以2D图像为例NDRange

Memory Model

内存模型

OpenCL has a hierarchy of memory types:

OpenCL具有内存类型的层次结构:

  • Host memory - available to the host CPU

  • 主机内存-可用于主机CPU

  • Global/Constant memory - available to all compute units in a compute device

  • 全局/常量内存-可用于计算设备中的所有计算单元

  • Local memory - available to all the processing elements in a compute unit

  • 本地内存-可用于计算单元中的所有处理元素

  • Private memory - available to a single processing element

  • 专用内存-可用于单个处理元件



OpenCL Memory Model

OpenCL内存模型

OpenCL memory management is explicit. None of the above memories are automatically synchronized and so the application explicitly moves data between memory types as needed.

OpenCL内存管理是明确的。上述存储器都不是自动同步的,因此应用程序根据需要在存储器类型之间显式地移动数据。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值