首先确认安装好cuda,cudnn
本人电脑:
cuda9.0
cudnn7.3
查看cuda版本(nvcc -V也可以)
cat /usr/local/cuda/version.txt
查看cudnn版本
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
注意:如果可以直接使用pip安装
pip install pycuda==2017.1.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
如果出现问题不知如何是好(1、cuda.h问题,还有root权限问题)
下载安装
下载解压,然后在解压目录下:
./configure.py --python-exe=/usr/bin/python3 --cuda-root=/usr/local/cuda-9.0 --cudadrv-lib-dir=/usr/lib/x86_64-linux-gnu --boost-inc-dir=/usr/include --boost-lib-dir=/usr/lib --boost-python-libname=boost_python-py35 --boost-thread-libname=boost_thread --no-use-shipped-boost
python3 configure.py --cuda-root=/usr/local/cuda-9.0
sudo python3 setup.py install
make -j 8
sudo pip3 install .
安装完成测试:
import pycuda.autoinit
import pycuda.driver as drv
import numpy
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
const int i = threadIdx.x;
dest[i] = a[i] * b[i];
}
""")
multiply_them = mod.get_function("multiply_them")
a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)
dest = numpy.zeros_like(a)
multiply_them(
drv.Out(dest), drv.In(a), drv.In(b),
block=(400,1,1), grid=(1,1))
print ( dest-a*b )
这时可能出现错误:
command: nvcc --cubin -arch sm_75 -I/home/user/anaconda3/lib/python3.6/site-packages/pycuda/cuda kernel.cu]
此时可能原因时你的gpu支持的算力不匹配(本人1660ti,尝试sm_70,没有问题)
尝试指令
nvcc --cubin -arch sm_70 -I(path)
path为报错的那块
只要不出现同样的错误,应该是通过了
然后修改源码/usr/local/lib/python3.5/dist-packages/pycuda/compiler.py
#arch = "sm_%d%d" % Context.get_device().compute_capability()
arch = 'sm_70'
将上边的屏蔽,改成下边的直接赋值,前提是sm_后边的数字适合你的电脑
再次运行上边的例子如果通过会出现下边
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
再运行第二个例子;
import pycuda.autoinit
import pycuda.driver as drv
import numpy as np
from timeit import default_timer as timer
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void func(float *a, float *b, size_t N)
{
const int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i >= N)
{
return;
}
float temp_a = a[i];
float temp_b = b[i];
a[i] = (temp_a * 10 + 2 ) * ((temp_b + 2) * 10 - 5 ) * 5;
// a[i] = a[i] + b[i];
}
""")
func = mod.get_function("func")
def test(N):
# N = 1024 * 1024 * 90 # float: 4M = 1024 * 1024
print("N = %d" % N)
N = np.int32(N)
a = np.random.randn(N).astype(np.float32)
b = np.random.randn(N).astype(np.float32)
# copy a to aa
aa = np.empty_like(a)
aa[:] = a
# GPU run
nTheads = 256
nBlocks = int( ( N + nTheads - 1 ) / nTheads )
start = timer()
func(
drv.InOut(a), drv.In(b), N,
block=( nTheads, 1, 1 ), grid=( nBlocks, 1 ) )
run_time = timer() - start
print("gpu run time %f seconds " % run_time)
# cpu run
start = timer()
aa = (aa * 10 + 2 ) * ((b + 2) * 10 - 5 ) * 5
run_time = timer() - start
print("cpu run time %f seconds " % run_time)
# check result
r = a - aa
print( min(r), max(r) )
def main():
for n in range(1, 10):
N = 1024 * 1024 * (n * 10)
print("------------%d---------------" % n)
test(N)
if __name__ == '__main__':
main()
这个例子运行可能出现
[command: nvcc --cubin -arch sm_70 -I/usr/local/lib/python3.5/dist-packages/pycuda/cuda kernel.cu]
[stderr:
gcc: error trying to exec 'cc1plus': execvp: 没有那个文件或目录
]
此时可能原因gcc和g++匹配,换成相同版本即可,改一个就好。
gcc -v查看(sudo apt-get install gcc-4.9)
g++ -v查看(sudo apt-get install g++-4.9)
cd /usr/bin
sudo ln -sf g++-4.9 g++
sudo ln -sf g++-4.9 x86_64-linux-gnu-g++
sudo ln -sf gcc-4.9 gcc
sudo ln -sf gcov-4.9 gcov
sudo ln -sf gcc x86_64-linux-gnu-gcc
再次运行:
------------1---------------
N = 10485760
gpu run time 0.026409 seconds
cpu run time 0.069521 seconds
-0.0014648438 0.0014648438
------------2---------------
N = 20971520
gpu run time 0.042998 seconds
cpu run time 0.113894 seconds
-0.0009765625 0.0014648438
------------3---------------
N = 31457280
gpu run time 0.063510 seconds
cpu run time 0.164649 seconds
-0.0014648438 0.0014648438
------------4---------------
N = 41943040
gpu run time 0.092244 seconds
cpu run time 0.215194 seconds
-0.0014648438 0.0014648438
------------5---------------
N = 52428800
gpu run time 0.107248 seconds
cpu run time 0.267497 seconds
-0.0014648438 0.0014648438
------------6---------------
N = 62914560
gpu run time 0.132429 seconds
cpu run time 0.316675 seconds
-0.0014648438 0.001953125
------------7---------------
N = 73400320
gpu run time 0.200182 seconds
cpu run time 0.451194 seconds
-0.0014648438 0.0014648438
------------8---------------
N = 83886080
gpu run time 0.251288 seconds
cpu run time 0.701949 seconds
-0.0014648438 0.0014648438
------------9---------------
N = 94371840
gpu run time 0.209741 seconds
cpu run time 0.492387 seconds
-0.0014648438 0.0014648438
至此应该没有问题了。
参考文献:
1、https://www.cnblogs.com/demo-deng/p/10470734.html
2、https://blog.csdn.net/u014365862/article/details/85338619
3、https://www.linuxidc.com/Linux/2016-08/134546.htm