OpenCL学习笔记(1)

最新推荐文章于 2023-11-19 13:38:03 发布

小瘦马

最新推荐文章于 2023-11-19 13:38:03 发布

阅读量1.1k

点赞数

分类专栏： OpenCL 文章标签： OpenCL

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/fuxingwe/article/details/9719621

版权

OpenCL 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

现在在美国多核实习，接触的项目都需要OpenCL的知识，目前正在全力学习OpenCL，下面是一些我记的零散的笔记，都是我感觉比较重要的知识点。

在GPU并行计算中，通常我们实现两类并行计算

任务并行：把一个问题分解为能够同时执行的多个任务。
数据并行：同一个任务内，它的各个部分同时执行

OpenCL平台API：平台API定义了宿主机程序发现OpenCL设备所用的函数以及这些函数的功能，另外还定义了为OpenCL应用创建上下文的函数。

OpenCL运行时API：这个API管理上下文来创建命令队列以及运行时发生的其他操作。例如，将命令提交到命令队列的函数就来自OpenCL运行时API。
OpenCL编程语言：这是用来编写内核代码的编程语言。它基于ISO C99标准的一个扩展子集，因此通常称为 OpenCL C编程语言。

openMP TBB： Thread Building Blocks 线程构建模块

多核cpu就适合基于任务的并行编程，而GPU更适应于数据并行编程

• 在分布式系统中，我们用Message Passing Interface (MPI)来实现SPMD（the Single Program Multiple Data）。

• 在共享内存并行系统中，我们用POSIX线程来实现SPMD。

• 在GPU中，我们就是用Kernel来显现SPMD。

在现代CPU上，创建回一个线程的开销还是很大的，如果要在CPU上实现SPMD，每个线程处理的数据块就要尽量

大点，做更多的事情，以便减少平均线程开销。但在GPU上，都是轻量级的线程，创建、调度线程的开销比较小，所以我们

可以做到把循环完全展开，一个线程处理一个数据。

OpenCL可以实现混合设备的并行计算，这些设备包括CPU，GPU，以及其它处理器，比如Cell处理器，DSP等

Step 1. Get platform.

Query the available platforms, and choose an appropriate one. For information about

the platforms, used clGetPlatformIDs and clGetPlatformInfo.

Step 2. Query devices.

Use clGetDeviceIDs to query the platform, and choose the first GPU device. If there

is no GPU, use the CPU.

2 of 3 Hello World

Step 3. Create context.

Use clCreateContext to create a context using the first device. This can be a GPU or

CPU, depending on the available devices on the system.

Step 4. Create command queue.

Use clCreateCommandQueue to create a command queue on the context for the device.

Step 5. Create program.

Use clCreateProgramWithSource to create the program that uses the kernel file.

Step 6. Build program.

Use clBuildProgram to build the program.

Step 7. Create memory objects.

Define the initial input and output buffers for the host, and create memory objects for

the kernel. Use clCreateBuffer to create cl_mem objects.

Step 8. Create kernel object.

Use clCreateKernel to create a kernel for the device.

Step 9. Set kernel arguments.

Use clSetKernelArg to set arguments for the kernel.

Step 10. Run the kernel.

Use clEnqueueNDRangeKernel to run the kernel.

Step 11. Read the output back to host memory.

Use clEnqueueReadBuffer to read the results of the executed kernel back to host buffer.

Step 12. Release the resources used by OpenCL.

a. Using API clReleaseKernel to release kernel.

b. Using API clReleaseProgram to release program.

c. Using API clReleaseMemObject to release buffer.

d. Using API clReleaseCommandQueue to release command queue.

e. Using API clReleaseContext to release context.

Use free or delete to free the resources used by the host.

If successful, the errcode_ret is set to CL_SUCCESS; otherwise, a different error codes is

returned.

从程序入口直到得到该平台的context都属于平台初始化过程，我们将在平台初始化层中详细介

绍。

平台初始化之后至clEnqueueReadBuffer 之前统称为运行时（R u n t i m e）

c o m m a n d q u e u e中的 k e r n e l s 都是运行

在 O p e n C L设备上的，那么如何控制乱序执行c o m m a nd q u e u e中相关命令的执行顺序呢？O p e n C L 提供

e v e n t (s) 来解决同步问题。c l E n q u e u e N DR a n g e K e r n e l的参数e v e n t _w a i t _ l i s t 是c l _ e v e n t 指针数组，若设

置为 N U L L 则表示该函数会在被调用的同时立刻在执行的O p e n C L设备上执行。若用户设定了一组有

效指针，则该函数会等待该参数指向的 n u m _ e v e n t s _ i n _ w a i t _ l i s t 个e v e n t s 全部触发后才会执行。该函

数的的参数 e v e n t则会返回当前 k e r ne l的e v e n t 对象指针，当 k e r ne l在O p e n C L设备上执行完毕后即该命

令执行完毕后，该e v e n t 会被触发。

O p e n C L内存对象是指在主机（ H o s t）中被创建，可以在 O p e n C L k e r n e l中被使用的内存对象类

型。按维度可以分为两类，一类为 b u e r 对象，一类为 i m a g e 对象

其中 b u e r 对象是一维的，i m a g e 对

象可以是二维、三维的t e x t u r e

g e t _ g l o b a l _ i d(u i n t d i m i n d x)，此函数会返回当前工作节点在 d i m i n d x 维度（以 0

作为起始值，依次递加1）上的位置（以 0 作为起始值，依次递加 1）。O p e n C L 最多支持三维工作空间，

所以 d i m i n d x可以是0，1或2

OpenCL优化：数据传输优化，内存访问的优化技术和计算及控制流优化

如果需要在G r o u p内进行数据共享，最好的方法是尽量使用L o c a l M e m o r y，这是因为

L o c a l M e m o r y具有更高的访问速度。

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
OpenCL学习笔记(1)

在GPU并行计算中，通常我们实现两类并行计算任务并行：把一个问题分解为能够同时执行的多个任务。数据并行：同一个任务内，它的各个部分同时执行OpenCL平台API：平台API定义了宿主机程序发现OpenCL设备所用的函数以及这些函数的功能，另外还定义了为OpenCL应用创建上下文的函数。OpenCL运行时API：这个API管理上下文来创建命令队列以及运行时发生的其他操作。例如，将命令
复制链接

扫一扫

专栏目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。