转载自http://soft.zdnet.com.cn/software_zone/2009/1127/1527418.shtml
1、软件需求:
cudadriver_2.3_winvista_64_190.38_general
cudatoolkit_2.3_win_64
cudasdk_2.3_win_64
VS2008
安装前将之前安装的sdk、toolkit、driver等卸载,再依次安装上述软件。如果开发平台没有支持CUDA的显卡,则不需要安装cudadriver_2.3_winvista_64_190.38_general。
2、 安装检查
在cmd下执行nvcc –V可以查看当前版本号
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2009 NVIDIA Corporation
Built on Mon_Aug__3_19:43:55_PDT_2009
Cuda compilation tools, release 2.3, V0.2.1221
执行bandwidthtest查看配置是否正常
进入ProgramDataNVIDIA CorporationNVIDIA GPU Computing SDKCbinwin64Release>目录,执行
.bandwidthTest.exe --memory=pinned --mode=range --start=10240000 --end=10240000 -increment=10240000
若正常会有类似信息
Running on......
device 0:Quadro FX 580
Range Mode
Host to Device Bandwidth for Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
10240000 5101.1
Range Mode
Device to Host Bandwidth for Pinned memory
Transfer Size (Bytes) Bandwidth(MB/s)
10240000 4650.8
Range Mode
Device to Device Bandwidth
Transfer Size (Bytes) Bandwidth(MB/s)
10240000 14812.5
&&&& Test PASSED
Press ENTER to exit...
执行deviceQuery.exe查看显卡具体型号
. deviceQuery.exe
若正常会有类似信息
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA
Device 0: "Quadro FX 580"
CUDA Driver Version: 2.30
CUDA Runtime Version: 2.30
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 536870912 bytes
Number of multiprocessors: 4
Number of cores: 32
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.13 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: No
Compute mode: Default (multiple host threads can use this device simultaneously)
Test PASSED
Press ENTER to exit...
根据信息可以推算显卡的单精度浮点处理性能为3*32*1.13=108.48Gflops
3、 设置系统环境变量
将安装的CUDA的sdk的路径加到系统环境变量中:
例如C:ProgramDataNVIDIA CorporationNVIDIA GPU Computing SDKCbinwin64
下的
├─Debug
├─EmuDebug
└─EmuRelease
└─Release
几个目录都加入到系统环境变量PATH中,这样才能在运行程序的时候找到相应的dll库。(做法:添加%CUDARelease%到path中,CUDARelease为设置的系统变量名称)
将编译需要的头文件放到vs2008环境中
复制C:ProgramDataNVIDIA CorporationNVIDIA GPU Computing SDKCcommon目录到C:UsersdawningDocumentsVisual Studio 2008下
4、 VS2008建立CUDA简单的工程
将模板项目C:ProgramDataNVIDIA CorporationNVIDIA GPU Computing SDKCsrc template拷贝到vs2008项目目录C:UsersdawningDocumentsVisual Studio 2008Projects
打开vs2008,打开模板项目template_vc90