tensorflow is_gpu_available总是False
记录一下防脱发, 哦对了, 不然以后哪天又更新个什么鬼的用不了前列腺很容易发炎.
问题描述:
因为要用tensorRT,不得已更新了最新的cuda-11.0, 于是在使用tensorflow的时候查看是否可以用GPU的函数
import tensorflow as tf
tf.test.is_gpu_available()
总是False
, 而且还报如下错误:
2020-08-31 17:17:04.380095: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2020-08-31 17:17:04.380368: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2020-08-31 17:17:04.398758: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2020-08-31 17:17:04.399120: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2020-08-31 17:17:04.399388: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2020-08-31 17:17:04.399645: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/ros/melodic/lib:/usr/local/cuda-11.0/lib64:/usr/local/cuda-11.0/lib64
2020-08-31 17:17:05.132994: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-31 17:17:05.133087: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1662] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
成因:
- Cuda 或者 cudnn 没安装对
- 环境变量没配好
tensorflow 版本不支持CUDA-11.0+cudnn_v8
(本文属于这种原因)
没耐心看bb, 解决
conda install cudatoolkit
conda install cudnn
了事
有耐心慢慢看bb, 解决
对于成因1
检查CUDA, 命令行
nvcc --version
nvidia-smi
没问题就基本ok
检查cudnn 有点麻烦, 命令行
cd /usr/local/cuda/include/
如果有`cudnn.h’文件就基本ok, 如果没有可以在这里找找,
ls /usr/include/cudnn*
一般会出现如下文件:
cudnn_adv_infer.h cudnn_adv_train.h cudnn_backend.h cudnn_cnn_infer.h cudnn_cnn_train.h cudnn.h cudnn_ops_infer.h cudnn_ops_train.h cudnn_version.h
如果还没有那么可以在/usr/include/x86_64-linux-gnu$
这里再找找, 因为估计你安装了很多个版本,然后人家自动帮你覆盖了:
ls /usr/include/x86_64-linux-gnu/cudnn*
有的话你可以看到文件的如下:
cudnn_adv_infer_v8.h cudnn_adv_train_v8.h cudnn_backend_v8.h cudnn_cnn_infer_v8.h cudnn_cnn_train_v8.h cudnn_ops_infer_v8.h cudnn_ops_train_v8.h cudnn_v7.h cudnn_v8.h cudnn_version_v8.h
我这里就是连续安装了三个cudnn, v7,v8,v8, 在这些地方的cudnn_version_v8.h
可以看到版本, 选一个cudnn.h文件复制到/usr/local/cuda/include/
里面就好了
成因2, 环境变量
命令行:
sudo gedit ~/.bashrc
看看有没有这几行,没有的自己添加, 其中cuda-11.0这玩意儿可以自己在local文件夹下选.
export PATH=$PATH:/snap/bin
#export PATH=/usr/local/cuda-10.2/bin${PATH:+:$PATH}}
export PATH=$PATH:~/.local/bin
export PATH=/usr/local/cuda-11.0/bin${PATH:+:$PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
然后
sudo gedit /etc/profile
加上这几行
export PATH=/usr/local/cuda-11.0/bin${PATH:+:$PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
最后
source /etc/profile
source ~/.bashrc
sudo lbconfig
成因3 本文的情况
我是这么检查的, 在一个有pytorch的虚拟环境里:
conda activate 有pytorch的虚拟环境
python
import torch
torch.cuda.is_available()
返回True,说明和前面原因没啥关系了, 就tm tensorflow自己的问题.
版本不支持不要紧,下载个支持的就好了, conda
还会自己帮你选
conda install cudatoolkit
conda install cudnn
调用的自然就是合适的版本了,里面的一些依赖文件自然也会有.
参考文献
居然在 里你敢信
https://blog.csdn.net/u012388993/article/details/102573117
https://www.cnblogs.com/sddai/p/11135941.html
https://blog.csdn.net/roxxo/article/details/105138007
https://github.com/tensorflow/tensorflow/issues/26182