(转)Basic configuration of CUDA

转载自http://soft.zdnet.com.cn/software_zone/2009/1127/1527418.shtml

 

1、软件需求:

      cudadriver_2.3_winvista_64_190.38_general

  cudatoolkit_2.3_win_64

  cudasdk_2.3_win_64

  VS2008

  安装前将之前安装的sdk、toolkit、driver等卸载,再依次安装上述软件。如果开发平台没有支持CUDA的显卡,则不需要安装cudadriver_2.3_winvista_64_190.38_general。

  2、 安装检查

  在cmd下执行nvcc –V可以查看当前版本号

  nvcc: NVIDIA (R) Cuda compiler driver

  Copyright (c) 2005-2009 NVIDIA Corporation

  Built on Mon_Aug__3_19:43:55_PDT_2009

  Cuda compilation tools, release 2.3, V0.2.1221

  执行bandwidthtest查看配置是否正常

  进入ProgramDataNVIDIA CorporationNVIDIA GPU Computing SDKCbinwin64Release>目录,执行

  .bandwidthTest.exe --memory=pinned --mode=range --start=10240000 --end=10240000 -increment=10240000

  若正常会有类似信息

  Running on......

  device 0:Quadro FX 580

  Range Mode

  Host to Device Bandwidth for Pinned memory

  Transfer Size (Bytes) Bandwidth(MB/s)

  10240000 5101.1

  Range Mode

  Device to Host Bandwidth for Pinned memory

  Transfer Size (Bytes) Bandwidth(MB/s)

  10240000 4650.8

  Range Mode

  Device to Device Bandwidth

  Transfer Size (Bytes) Bandwidth(MB/s)

  10240000 14812.5

  &&&& Test PASSED

  Press ENTER to exit...

  执行deviceQuery.exe查看显卡具体型号

  . deviceQuery.exe

  若正常会有类似信息

  CUDA Device Query (Runtime API) version (CUDART static linking)

  There is 1 device supporting CUDA

  Device 0: "Quadro FX 580"

  CUDA Driver Version: 2.30

  CUDA Runtime Version: 2.30

  CUDA Capability Major revision number: 1

  CUDA Capability Minor revision number: 1

  Total amount of global memory: 536870912 bytes

  Number of multiprocessors: 4

  Number of cores: 32

  Total amount of constant memory: 65536 bytes

  Total amount of shared memory per block: 16384 bytes

  Total number of registers available per block: 8192

  Warp size: 32

  Maximum number of threads per block: 512

  Maximum sizes of each dimension of a block: 512 x 512 x 64

  Maximum sizes of each dimension of a grid: 65535 x 65535 x 1

  Maximum memory pitch: 262144 bytes

  Texture alignment: 256 bytes

  Clock rate: 1.13 GHz

  Concurrent copy and execution: Yes

  Run time limit on kernels: No

  Integrated: No

  Support host page-locked memory mapping: No

  Compute mode: Default (multiple host threads can use this device simultaneously)

  Test PASSED

  Press ENTER to exit...

  根据信息可以推算显卡的单精度浮点处理性能为3*32*1.13=108.48Gflops

  3、 设置系统环境变量

  将安装的CUDA的sdk的路径加到系统环境变量中:

  例如C:ProgramDataNVIDIA CorporationNVIDIA GPU Computing SDKCbinwin64

  下的

  ├─Debug

  ├─EmuDebug

  └─EmuRelease

      └─Release

     

  几个目录都加入到系统环境变量PATH中,这样才能在运行程序的时候找到相应的dll库。(做法:添加%CUDARelease%到path中,CUDARelease为设置的系统变量名称)

 

  将编译需要的头文件放到vs2008环境中

  复制C:ProgramDataNVIDIA CorporationNVIDIA GPU Computing SDKCcommon目录到C:UsersdawningDocumentsVisual Studio 2008下

 

  4、 VS2008建立CUDA简单的工程

  将模板项目C:ProgramDataNVIDIA CorporationNVIDIA GPU Computing SDKCsrc template拷贝到vs2008项目目录C:UsersdawningDocumentsVisual Studio 2008Projects

  打开vs2008,打开模板项目template_vc90

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值