CUDA9.1上安装tensorflow-gpu的过程

最新推荐文章于 2024-05-20 19:24:35 发布

ltshan139

最新推荐文章于 2024-05-20 19:24:35 发布

阅读量1.5k

点赞数 1

分类专栏：深度学习

本文链接：https://blog.csdn.net/avideointerfaces/article/details/108230396

版权

深度学习专栏收录该内容

76 篇文章 67 订阅

订阅专栏

前言

在讲tensorflow gpu版本的安装过程前，先吐槽一下，将pytorch deeplabv3+模型转换到onnx真是太坑人了，先是说pytorch1.1.0版本有问题，尔后好不容易切换到pytorch1.0.1或1.2.0，又遇到ONNX协议不支持算法模型里面的算子。来回折腾好几天，实在不行只好尝试用tensorflow来训练deeplabv3+模型。PS：如果谁有pytorch deeplabv3+转onnx的经验，请教我一下：）

安装

本人平台是ubuntu18.0.4+CUDA9.1。

输入命令： pip3 install /work/xxx/tensorflow_gpu-1.8.0-cp36-cp36m-manylinux1_x86_64.whl可以很顺利把tensorflow gpu版本安装完，但是import tensorflow会遇到找不到libcublas.so.9.0/libcudart.so.9.0的错误。原因是tensorflow_gpu1.8.0版本只能安装在CUDA9.0上，它在运行时会去找cuda9.0库。

经过试验发现，tensorflow_gpu1.6到1.8都只能在CUDA9.0上运行。网上很多建议将CUDA9.X版本卸掉来重新安装CUDA9.0，这里面其实挺麻烦的，还要涉及到gcc版本的降低等。

我这里发现， tensorflow_gpu-1.14.0-cp36-cp36m-manylinux1_x86_64.whl就可以在CUDA9.1上运行，当然我们需要事先去这里下载https://files.pythonhosted.org/packages/76/04/43153bfdfcf6c9a4c38ecdb971ca9a75b9a791bb69a764d652c359aca504/tensorflow_gpu-1.14.0-cp36-cp36m-manylinux1_x86_64.whl

安装完后，import tensorflow能成功，但是会遇到下面所示的警告。

  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/bc311/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/bc311/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])

这是查看本人平台的numpy版本，为1.19.1。

bcxxx@bcxxx-ai1:~$ pip3 show numpy
Name: numpy
Version: 1.19.1
Summary: NumPy is the fundamental package for array computing with Python.

应该是numpy版本太高所致，所以重新安装它，其版本号为1.14.0。这样tensorflow就能完全加载成功了。

 pip3 install numpy==1.14.0

后续补充，虽然import tensorflow没问题了，但是发现1.14 tensorflow gpu版本实际上还是会调用cuda10.0版本库。判断tensorflow能否正常的调用gpu device，可以使用下面这条命令：

tensorflow.test.gpu_device_name()

在我的平台上会出现gpu device load失败的问题，这样意味着只能使用cpu来训练数据了。

tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7

W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...

再后续补充，安装tensorflow1.7.0版本貌似可以使用cuda9.1了，其安装命令如下：

pip3 install ./tensorflow-1.7.0-cp36-cp36m-linux_x86_64.whl

cuda测试命令及其结果如下：

>>> import tensorflow
>>> tensorflow.test.gpu_device_name()
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:65:00.0
totalMemory: 10.92GiB freeMemory: 10.65GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/device:GPU:0 with 10310 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:65:00.0, compute capability: 6.1)
'/device:GPU:0'

ltshan139

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
CUDA9.1上安装tensorflow-gpu的过程

前言安装富特文 _np_quint16 = np.dtype([("quint16", np.uint16, 1)])/home/bc311/.local/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a fut.
复制链接

扫一扫