windows 10 安装CUDA 9.1和PyCUDA

最新推荐文章于 2024-08-13 09:41:52 发布

liangjiubujiu

最新推荐文章于 2024-08-13 09:41:52 发布

阅读量4.3k

点赞数

搭建环境之前自己曾在网上搜索过相关博客，能找到的中文资料确实不敢恭维，主要存在的问题是版本过低。因此参考了别人的很多经验，才有了这篇相对版本较新的中文资料，供大家参考交流。

1. 安装Visual Studio 2013

VS2013的安装破解比较简单，在网盘搜索引擎里很容易就能找到安装包和密钥。

安装完成后，记得在系统变量path（system PATH）中添加以下两项：

C:\ProgramFiles (x86)\Microsoft Visual Studio 12.0\VC\bin\;C:\Program Files(x86)\Microsoft Visual Studio 12.0\Common7\IDE

（PS: 如上你需要根据自己的VS的安装路径做出更改，如果只是针对Theano的话，其实可以不用在这里指明cl.exe的位置，因为接下来我们会在.theanorc文件中再定义一次，为了方便其他应用，最好还是写进去吧）。

2. 安装CUDA toolkit

进入NVIDA的官网，下载完CUDA toolkit之后，一键傻瓜式安装。但是，安装之前还是要提醒你检查一下自己机器的显卡是不是NVIDA的产品。详细安装步骤参考：http://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html#axzz410A2xbq6

3. 安装Python

可以尝试使用Anaconda（下载地址：https://www.continuum.io/downloads），自动的就会帮助你安装完成Python基本环境，并配置好了诸如numpy，scipy等等python科学计算的常用工具包。如果想安装别的包也很方便，在Anaconda Prompt中键入命令：“conda install [工具包名]”即可。

安装完成Anaconda之后，需要安装依赖项。

 
      [html]  
      view plaincopy
conda install mingw libpython  

5. 安装PyCUDA

下载地址：http://www.lfd.uci.edu/~gohlke/pythonlibs/#pycuda

我的Python版本是2.7，所以我下载的是pycuda‑2015.1.3+cuda7518‑cp27‑none‑win_amd64.whl

安装

 
      [plain]  
      view plaincopy
pip install pycuda‑2015.1.3+cuda7518‑cp27‑none‑win_amd64.whl  

6. 测试Theano和PyCUDA

最简单的import测试：

 
      [python]  
      view plaincopy
import theano  

输出：

Using gpu device0: GeForce GT 640M (CNMeM is disabled)

根据theano的文档（文档地址：http://deeplearning.net/software/theano/tutorial/using_gpu.html）示例snippet测试：

 
      [python]  
      view plaincopy
from theano import function, config, shared, sandbox  
import theano.tensor as T  
import numpy  
import time  
   
vlen = 10 * 30 *768  # 10 x #cores x # threads per core  
iters = 1000  
   
rng = numpy.random.RandomState(22)  
x =shared(numpy.asarray(rng.rand(vlen), config.floatX))  
f = function([],T.exp(x))  
printf.maker.fgraph.toposort()  
t0 = time.time()  
for i inxrange(iters):  
    r = f()  
t1 = time.time()  
print 'Looping%d times took' % iters, t1 - t0, 'seconds'  
print 'Resultis', r  
ifnumpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):  
    print 'Used the cpu'  
else:  
    print 'Used the gpu'  

得到的结果如下：

Using gpu device0: GeForce GT 630M (CNMeM is disabled)

[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32,vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]

Looping 1000times took 1.42199993134 seconds

Result is [1.23178029 1.61879349 1.52278066 ..., 2.20771813 2.29967761

1.62323296]

Used the gpu

测试PyCUDA（文档地址: http://documen.tician.de/pycuda/index.html）

 
      [python]  
      view plaincopy
import pycuda.autoinit  
import pycuda.driver as drv  
import numpy  
   
from pycuda.compiler import SourceModule  
mod = SourceModule(""" 
__global__  void multiply_them(float *dest, float *a, float *b) 
{ 
  const int i = threadIdx.x; 
  dest[i] = a[i] * b[i]; 
} 
""")  
   
multiply_them =mod.get_function("multiply_them")  
   
a =numpy.random.randn(400).astype(numpy.float32)  
b =numpy.random.randn(400).astype(numpy.float32)  
   
dest = numpy.zeros_like(a)  
multiply_them(  
        drv.Out(dest), drv.In(a), drv.In(b),  
        block=(400,1,1), grid=(1,1))  
   
print (dest-a*b )