编程：CUDA NOTE

最新推荐文章于 2024-05-29 07:45:00 发布

lealzhan

最新推荐文章于 2024-05-29 07:45:00 发布

阅读量197

点赞数

文章标签： cuda c++

本文链接：https://blog.csdn.net/lealzhan/article/details/115609905

版权

詹令
lealzhan@126.com
2015-8-17

关于GPU的提速效能

基于这篇文章, GPU对于求解线性方程组常用的preconditioned Conjugate Gradient method的提速性还不到一个数量级，而且用于比较的CPU只有双核。再考虑到编程的复杂性，可见在求解线性方程组方面，GPU相对于CPU并没有什么革命性的突破，充其量只是NVDIA的一个噱头。下图来自 CUDA 6.5 Performance Report (September 2014) ⇨只有5X的提速。。。

Double Precision for GPU

还不是太清楚GPU是怎么在硬件上处理双精度的（待更新）。从CUDA 6.5 Performance Report可以看出，当前双精度的处理能力只有单精度的一半左右（早期更低）。但是科学计算需要双精度。

有篇文章提出单双混合精度共轭梯度来最高效率的用GPU来解决有限元这类的科学计算[Zhang, Jianfei, 2013]。

Thrust

functor(仿函数)在thust中是个很有用的东西。transform, transform_reduce等，都以functor定义对大量元素的并行操作。

cuSPARSE

cuSPARSE用于sparse matrices的线性方程组求解。与CUSP基于THRUST不同，cuSPARSE是直接基于cuda runtime的，所以其具有更强的独立性，不会受制于THRUST的发展版本，但是使用起来更加繁杂。

cuBLAS

To use the cuBLAS API, the application must allocate the required matrices and vectors in the GPU memory space, fill them with data, call the sequence of desired cuBLAS functions, and then upload the results from the GPU memory space back to the host.

CUSP (out of date)

CUSP is very easy to use, algorithms like conjugate gradient, have already been implemented, and can be used just by call it in one line.
CUSP is based on Thrust, which means you have to include Thrust library of the right version. However, CUSP is not supported anymore, which can cause lots of compiling problems.

在这里插入图片描述

MAGMA

在这里插入图片描述

CUDA VS 2010 32 bit Setup

Win7， Quadro K1000M，VS2010， CUDA 6.5

生成自定义
CUDA runtime
链接器 Linker 附加库目录：$(CudaToolkitLibDir);
附加依赖项
cublas.lib
cusparse.lib
cudart_static.lib
kernel32.lib
user32.lib
可能还需要添加以下如今到VC++包含目录，以便helper_funcitons.h调用
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v6.5\common\inc