OPENCL之学习手记（chapter2）以及平台搭建和程序执行步骤

最新推荐文章于 2023-03-31 10:48:24 发布

VIP文章 Snail_Walker

最新推荐文章于 2023-03-31 10:48:24 发布

阅读量2k

点赞数 2

分类专栏： Embedded System 文章标签：学习手记 opencl

本文链接：https://blog.csdn.net/c602273091/article/details/43898003

版权

OpenCL标准（OpenCL standard）：

OpenCL定义了含有C++的C的API接口，可以用于绑定第三方语言：Java，python，.NET。OpenCL C严格执行了C99的标准！

OpenCL说明（OpenCL specification）：

OpenCL包括四个部分：

1、平台模型：有主机和设备（设备上就是跑OpenCL）。在设备上跑的OpenCL代码就叫做kernels。

2、执行模型：定义了OpenCL在主机上的环境配置以及kernel在设备上如何执行。包括了诸如在主机上创建一个OpenCL项目，提供主机-设备交互，并且定义了在设备上执行kernel时的并发模型。

3、内存模型：定义了kernel使用的抽象内存层次，即使在真的内存架构下。内存模型与GPU内存层次很接近。

4、编程模型：定义了映射到物理硬件的并发模型。

OpenCL的执行模型和kernel：

一个普通的向量加法：

// Perform an element-wise addition of A and B and store in C.
// There are N elements per array.
void vecadd(int *C, int* A, int *B, int N) {
for(int i ¼ 0; i < N; i++) {
C[i] = A[i] + B[i];
}
}

对于多核处理器的做法是：（strip mining）

// Perform and element-wise addition of A and B and store in C.
// There are N elements per array and NP CPU cores.
void vecadd(int *C, int* A, int *B, int N, int NP, int tid) {
int ept ¼ N/NP; // elements per thread
for(int i = tid*ept; i < (tid+1)*ept; i++) {
C[i] = A[i] + B[i];
}
}

OpenCL的做法：（work-item，n维排序）

// Perform an element-wise addition of A and B and store in C
// N work-items will be created to execute this kernel.
__kernel
void vecadd(__global int *C, __global int* A, __global int *B) {
int tid ¼ get_global_id(0); // OpenCL intrinsic function
C[tid] ¼ A[tid] + B[tid];
}

size_t indexSpaceSize[3]={1024, 1, 1}; //size_t就是定义了该维的大小

Work item swith in a work group have a special relationship with one another:They can perform barrier operations to synchronize and they have access to a shared memory address space.

size_t workGroupSize[3]={64, 1, 1};如果每个数组的大小是1024，那么就会有1024/64 = 16个group

最低0.47元/天解锁文章

Snail_Walker

关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
OPENCL之学习手记（chapter2）以及平台搭建和程序执行步骤

OpenCL标准（OpenCL standard）：OpenCL说明（OpenCL specification）：OpenCL的执行模型和kernel：OpenCL的平台和设备：OpenCL的执行环境：OpenCL的内存模型：OpenCL的kernel的书写：附上整个向量加法的代码：
复制链接

扫一扫