win10/CUDA10.2下theano配置GPU加速及一系列问题

最新推荐文章于 2024-08-22 17:12:21 发布

i-LucAs

最新推荐文章于 2024-08-22 17:12:21 发布

阅读量1.6k

点赞数 1

分类专栏：深度学习文章标签： cuda 深度学习 gpu python anaconda

本文链接：https://blog.csdn.net/qq_40344897/article/details/105368607

版权

深度学习专栏收录该内容

6 篇文章 0 订阅

订阅专栏

已配置好theano 1.0.4、pygpu 0.7.6、CUDA 10.2、CUDNN等

1. 测试目前theano使用cpu还是gpu

from theano import function, config, shared, tensor
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
              ('Gpu' not in type(x.op).__name__)
              for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

结果：
在这里插入图片描述
原因：theano的默认配置中不是使用GPU而是CPU

2. 配置.theanorc.txt文件

[global]
device = cuda
floatX=float32

[nvcc]
flags=--machine=64

[cuda]
root = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2

[lib]
cnmem=100

[dnn]
enabled = True
include_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include
library_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib\x64

结果：
在这里插入图片描述

3. 更新pygpu版本

conda upgrade pygpu

结果：
在这里插入图片描述
原因：pygpu 0.7.6不支持CUDA 10.2

4. 手动编译

VS编译gpuarray.vcxproj重新生成gpuarray.dll，编译好的下载地址
覆盖C:/Anaconda3/Library/bin 目录下的同名文件
将 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\目录下的 nvrtc64_102_0.dll 文件复制到D:/Anaconda3/Library/bin 目录下，并更名为 nvrtc64_102.dll
分别为CUDA路径和Anaconda3路径
将 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\目录下的 cublas64_10.dll 文件 copy 到D:/Anaconda3/Library/bin 目录下，并更名为 cublas64_102.dll
将 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\目录下的 nvrtc-builtins64_102.dll 文件 copy 到D:/Anaconda3/Library/bin 目录下
结果：

可以看到在这个例子中使用GPU比CPU快十几倍
步骤4 基本上就是把下载好的CUDA的文件，转移到Anaconda中，需要转移的文件可能有所不同，建议每转移一次就执行一下步骤1中的代码，根据错误提示进行下一步的操作