python和cuda交互:Pycuda安装(填坑)

首先确认安装好cuda,cudnn
本人电脑:
cuda9.0
cudnn7.3

查看cuda版本(nvcc -V也可以)

cat /usr/local/cuda/version.txt
查看cudnn版本
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

注意:如果可以直接使用pip安装

pip install pycuda==2017.1.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

如果出现问题不知如何是好(1、cuda.h问题,还有root权限问题)

下载安装

https://pypi.tuna.tsinghua.edu.cn/packages/b3/30/9e1c0a4c10e90b4c59ca7aa3c518e96f37aabcac73ffe6b5d9658f6ef843/pycuda-2017.1.1.tar.gz

下载链接

下载解压,然后在解压目录下:

./configure.py --python-exe=/usr/bin/python3 --cuda-root=/usr/local/cuda-9.0 --cudadrv-lib-dir=/usr/lib/x86_64-linux-gnu --boost-inc-dir=/usr/include --boost-lib-dir=/usr/lib --boost-python-libname=boost_python-py35 --boost-thread-libname=boost_thread --no-use-shipped-boost


python3 configure.py --cuda-root=/usr/local/cuda-9.0

sudo python3 setup.py install
make -j 8 
sudo pip3 install .

安装完成测试:

import pycuda.autoinit
import pycuda.driver as drv
import numpy
 
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
  const int i = threadIdx.x;
  dest[i] = a[i] * b[i];
}
""")
 
multiply_them = mod.get_function("multiply_them")
 
a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)
 
dest = numpy.zeros_like(a)
multiply_them(
        drv.Out(dest), drv.In(a), drv.In(b),
        block=(400,1,1), grid=(1,1))
 
print ( dest-a*b )

这时可能出现错误:

command: nvcc --cubin -arch sm_75 -I/home/user/anaconda3/lib/python3.6/site-packages/pycuda/cuda kernel.cu]

此时可能原因时你的gpu支持的算力不匹配(本人1660ti,尝试sm_70,没有问题)

尝试指令

nvcc --cubin -arch sm_70 -I(path)
path为报错的那块
只要不出现同样的错误,应该是通过了

然后修改源码/usr/local/lib/python3.5/dist-packages/pycuda/compiler.py


#arch = "sm_%d%d" % Context.get_device().compute_capability()
arch = 'sm_70'
将上边的屏蔽,改成下边的直接赋值,前提是sm_后边的数字适合你的电脑

再次运行上边的例子如果通过会出现下边

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

再运行第二个例子;

import pycuda.autoinit
import pycuda.driver as drv
import numpy as np
from timeit import default_timer as timer
 
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void func(float *a, float *b, size_t N)
{
 const int i = blockIdx.x * blockDim.x + threadIdx.x;
 if (i >= N)
 {
  return;
 }
 float temp_a = a[i];
 float temp_b = b[i];
 a[i] = (temp_a * 10 + 2 ) * ((temp_b + 2) * 10 - 5 ) * 5;
 // a[i] = a[i] + b[i];
}
""")
 
func = mod.get_function("func")
 
def test(N):
  # N = 1024 * 1024 * 90  # float: 4M = 1024 * 1024
 
  print("N = %d" % N)
 
  N = np.int32(N)
 
  a = np.random.randn(N).astype(np.float32)
  b = np.random.randn(N).astype(np.float32)
  # copy a to aa
  aa = np.empty_like(a)
  aa[:] = a
  # GPU run
  nTheads = 256
  nBlocks = int( ( N + nTheads - 1 ) / nTheads )
  start = timer()
  func(
      drv.InOut(a), drv.In(b), N,
      block=( nTheads, 1, 1 ), grid=( nBlocks, 1 ) )
  run_time = timer() - start
  print("gpu run time %f seconds " % run_time)
  # cpu run
  start = timer()
  aa = (aa * 10 + 2 ) * ((b + 2) * 10 - 5 ) * 5
  run_time = timer() - start
 
  print("cpu run time %f seconds " % run_time)
 
  # check result
  r = a - aa
  print( min(r), max(r) )
 
def main():
 for n in range(1, 10):
  N = 1024 * 1024 * (n * 10)
  print("------------%d---------------" % n)
  test(N)
 
if __name__ == '__main__':
  main()

这个例子运行可能出现

[command: nvcc --cubin -arch sm_70 -I/usr/local/lib/python3.5/dist-packages/pycuda/cuda kernel.cu]
[stderr:
gcc: error trying to exec 'cc1plus': execvp: 没有那个文件或目录
]

此时可能原因gcc和g++匹配,换成相同版本即可,改一个就好。

gcc -v查看(sudo apt-get install gcc-4.9)

g++ -v查看(sudo apt-get install g++-4.9)

cd /usr/bin
sudo  ln  -sf  g++-4.9  g++
sudo  ln  -sf  g++-4.9  x86_64-linux-gnu-g++
sudo  ln  -sf  gcc-4.9  gcc
sudo  ln  -sf  gcov-4.9 gcov
sudo  ln  -sf  gcc  x86_64-linux-gnu-gcc

再次运行:

 

------------1---------------
N = 10485760
gpu run time 0.026409 seconds 
cpu run time 0.069521 seconds 
-0.0014648438 0.0014648438
------------2---------------
N = 20971520
gpu run time 0.042998 seconds 
cpu run time 0.113894 seconds 
-0.0009765625 0.0014648438
------------3---------------
N = 31457280
gpu run time 0.063510 seconds 
cpu run time 0.164649 seconds 
-0.0014648438 0.0014648438
------------4---------------
N = 41943040
gpu run time 0.092244 seconds 
cpu run time 0.215194 seconds 
-0.0014648438 0.0014648438
------------5---------------
N = 52428800
gpu run time 0.107248 seconds 
cpu run time 0.267497 seconds 
-0.0014648438 0.0014648438
------------6---------------
N = 62914560
gpu run time 0.132429 seconds 
cpu run time 0.316675 seconds 
-0.0014648438 0.001953125
------------7---------------
N = 73400320
gpu run time 0.200182 seconds 
cpu run time 0.451194 seconds 
-0.0014648438 0.0014648438
------------8---------------
N = 83886080
gpu run time 0.251288 seconds 
cpu run time 0.701949 seconds 
-0.0014648438 0.0014648438
------------9---------------
N = 94371840
gpu run time 0.209741 seconds 
cpu run time 0.492387 seconds 
-0.0014648438 0.0014648438

至此应该没有问题了。

参考文献:

1、https://www.cnblogs.com/demo-deng/p/10470734.html

2、https://blog.csdn.net/u014365862/article/details/85338619

3、https://www.linuxidc.com/Linux/2016-08/134546.htm



 

  • 5
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 9
    评论
评论 9
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值