环境:
ubuntu 18.04 (docker base image)
tensorflow-gpu 1.15 (pip3安装)
cuda 10.0 (官网下载安装包)
cudnn 7.6.5 (官网下载安装包)
报错1: 不能导入cuda相关的库文件
Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
参考博客:解决Could not load dynamic library 'libcudart.so.10.0’的问题
报错2:不能导入cudnn库文件
Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
参考博客:Ubuntu16.04安装cuda10.0和cudnn7.4
报错3:cuDNN failed to initialize
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node encoder/conv2d/Conv2D (defined at /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
版本问题,此处用7.6.5而不是7.4就能解决。
整体环境搭建流程,参考博客:Ubuntu18.04安装CUDA10.0+cuDNN7.6.5+Tensorflow1.15教程