OpenCL 2.0 SVM

一、SVM出现是为了解决什么问题?

                                        Fig. 1 OpenCL 1.2中主机与设备端地址空间

 

                                          Fig. 2 OpenCL 2.0中主机与设备端地址空间

    从Fig.1可以看到,主机与设备具有不同的地址空间,各自需要对各自的内存进行管理。彼此之间不能直接访问对方的地址空间。所以,两者之间数据需要通信的话,只能把数据在主机与设备间来回拷贝,或者把设备端地址空间map /unmap到主机端。对于这样一种模式下,如果我们要想在设备端处理主机端的链表、树之类的数据。我们只能鞭长莫及!对于异构平台,我们就真的没办法愉快地处理链表之类的数据么?技术是发展的,有需求就必有技术来解决!

     从CUDA6以后,GPU与CPU之间支持统一寻址(Unified Memory)  ,GPU与CPU间可以直接访问彼此的地址空间,不需要我们人为的数据拷贝。这给异构计算又带入了一个新的高度,我们可以处理链表数据啦!既然CUDA都开始支持了,OpenCL也不能落后呀。在OpenCL2.0中,增加了共享虚拟内存(shared virtual memory)

二、SVM操作对象和使用方法

OpenCL 2.0 中三种类型的 SVM:
1.Coarse-Grained buffer SVM: Sharing occurs at the granularity of regions of OpenCL buffer memory objects. Cross-device atomics are not supported.
2.Fine-Grained buffer SVM:Sharing occurs at the granularity of individual loads and stores within OpenCL buffer memory objects. Cross-device atomics are optional.
3.Fine-Grained system SVM:Sharing occurs at the granularity of individual loads/stores occurring anywhere within the host memory. Cross-device atomics are optional.
前两种模式需要使用OpenCL clSVMAlloc函数显式分配SVM缓冲区,并且当将该缓冲区的指针传递给内核时,必须将其显式声明为SVM指针。相反,细粒度的系统SVM提供了最高级别的抽象,因为设备上的每个内核都可以访问任何指针:使用clSVMAlloc和 由常规malloc / new函数返回的那些。

 Coarse-Grained SVM buffer 使用示例——Map/unmap is required:

float* p = (float*)clSVMAlloc(…);

clEnqueueSVMMap(…,
    CL_TRUE,  // block until map is done
    p, …);


// Initialize SVM buffer
p[i] = …;

clEnqueueSVMUnmap(…, p, …);

clEnqueueNDRange(…);

clEnqueueSVMMap(…,
    CL_TRUE,  // block until map is done
    p, …);

// Read the data produced by the kernel
… = p[i];

clEnqueueSVMUnmap(…, p, …);

fine-grained SVM buffer 使用示例——Map-free:

float* p = (float*)clSVMAlloc(…);







// Initialize SVM buffer
p[i] = …;



clEnqueueNDRange(…);

clFinish(…);




// Read the data produced by the kernel
… = p[i]; 

Fine-Grained system SVM:

float* p = (float*)malloc(…);







// Initialize SVM buffer
p[i] = …;



clEnqueueNDRange(…);

clFinish(…);




// Read the data produced by the kernel
… = p[i]; 

三、SVM能力查询

cl_device_svm_capabilities svm;
clGetDeviceInfo(*device,CL_DEVICE_SVM_CAPABILITIES,sizeof(svm),&svm,NULL);
if(svm&CL_DEVICE_SVM_FINE_GRAIN_SYSTEM)
    printf("CL_DEVICE_SVM_FINE_GRAIN_SYSTEM\n");
if(svm&CL_DEVICE_SVM_FINE_GRAIN_BUFFER)
    printf("CL_DEVICE_SVM_FINE_GRAIN_BUFFER\n");
if(svm&CL_DEVICE_SVM_COARSE_GRAIN_BUFFER)
    printf("CL_DEVICE_SVM_COARSE_GRAIN_BUFFER\n");


 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
Heterogeneous Computing with OpenCL 2.0 teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs). This fully-revised edition includes the latest enhancements in OpenCL 2.0 including: • Shared virtual memory to increase programming flexibility and reduce data transfers that consume resources • Dynamic parallelism which reduces processor load and avoids bottlenecks • Improved imaging support and integration with OpenGL Designed to work on multiple platforms, OpenCL will help you more effectively program for a heterogeneous future. Written by leaders in the parallel computing and OpenCL communities, this book explores memory spaces, optimization techniques, extensions, debugging and profiling. Multiple case studies and examples illustrate high-performance algorithms, distributing work across heterogeneous systems, embedded domain-specific languages, and will give you hands-on OpenCL experience to address a range of fundamental parallel algorithms. Updated content to cover the latest developments in OpenCL 2.0, including improvements in memory handling, parallelism, and imaging support Explanations of principles and strategies to learn parallel programming with OpenCL, from understanding the abstraction models to thoroughly testing and debugging complete applications Example code covering image analytics, web plugins, particle simulations, video editing, performance optimization, and more

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值