环境:
- linux 18.04
- anaconda3
- cuda 10.1
- cudnn 7.6.5
- tensorflow-gpu 2.3.0
问题:
我先按照 官网指示 ,按照如下命令安装相关工具,以为这些命令很牛逼,运行完环境直接就能用,花了好一会儿安装完之后发现我 naive 了。
# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update
# Install NVIDIA driver, recommand to add '--fix-missing'
sudo apt-get install --no-install-recommends nvidia-driver-450 --fix-missing
# Reboot. Check that GPUs are visible using the command: nvidia-smi
# Install development and runtime libraries (~4GB), recommand to add '--fix-missing'
sudo apt-get install --no-install-recommends \
cuda-10-1 \
libcudnn7=7.6.5.32-1+cuda10.1 \
libcudnn7-dev=7.6.5.32-1+cuda10.1 --fix-missing
# Install TensorRT. Requires that libcudnn7 is installed above, recommand to add '--fix-missing'.
sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \
libnvinfer-dev=6.0.1-1+cuda10.1 \
libnvinfer-plugin6=6.0.1-1+cuda10.1 --fix-missing
安装完我啪的一下打开 notebook,很快啊,开始测试,按照官网指示那当然是运行 :
tf.config.list_physical_devices()
或者是
tf.test.is_gpu_available()
然而,问题来了,提示我找不到 "libcublas.so.10",报错:
Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory;
找了一堆帖子,重装重启都试过了,屁用没有,最后发现一个给力老哥的解决方案,在此感谢老哥!
原帖:https://github.com/tensorflow/tensorflow/issues/44312
解决方案:
libcublas.so.10 藏在安装 cuda 10.1 的时候同时安装的 cuda 10.2 的文件夹里,通过下面命令能找到:
$ dpkg -L libcublas10
/.
/usr
/usr/local
/usr/local/cuda-10.2
/usr/local/cuda-10.2/targets
/usr/local/cuda-10.2/targets/x86_64-linux
/usr/local/cuda-10.2/targets/x86_64-linux/lib
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublas.so.10.2.2.214
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublasLt.so.10.2.2.214
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvblas.so.10.2.2.214
/usr/share
/usr/share/doc
/usr/share/doc/libcublas10
/usr/share/doc/libcublas10/changelog.Debian.gz
/usr/share/doc/libcublas10/copyright
/usr/local/cuda-10.2/lib64
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublas.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcublasLt.so.10
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libnvblas.so.10
那我们要做的就是把 libcublas.so.10 添加到 .bashrc 里,添加后我的环境变量是这样的:
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda-10.1/lib64:/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH:/usr/local/cuda-10.2/targets/x86_64-linux/lib
灵魂就在添加了路径:/usr/local/cuda-10.2/targets/x86_64-linux/lib
测试后终于么有问题了,圆满撒花~