Tensorflow-gpu 运行在 cpu 母机的问题

tensorflow-gpu 的镜像当然运行在 GPU 的母机上了,但是如果容器被调度到没有 GPU 的母机上呢?

# 导入 tensorflow
# python -c "import tensorflow"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 22, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

如果是 tensorflow-gpu 的镜像,正常来说应该是需要 GPU 的,但是有可能用户想要运行在 CPU 上呢?虽然需求是不太合理的,既然使用了 tensorflow-gpu 就应该运行在 GPU 上,不然跑在 CPU 上干啥呢?

目前的调度逻辑,对于此类任务,会被调度到只有 CPU 的机器上,而这些机器不仅没有安装 CUDA 的库,并且也没有使用 nvidia-docker,那么在 import tensorflow 的时候,这类 GPU 的镜像就必然找不到 CUDA 的库,从而报错了。

# 运行这个命令
# LD_DEBUG=libs python -c "import tensorflow"
ib/x86_64:/usr/lib		(system search path)
       475:	  trying file=/lib/x86_64-linux-gnu/tls/x86_64/libcuda.so.1
       475:	  trying file=/lib/x86_64-linux-gnu/tls/libcuda.so.1
       475:	  trying file=/lib/x86_64-linux-gnu/x86_64/libcuda.so.1
       475:	  trying file=/lib/x86_64-linux-gnu/libcuda.so.1
       475:	  trying file=/usr/lib/x86_64-linux-gnu/tls/x86_64/libcuda.so.1
       475:	  trying file=/usr/lib/x86_64-linux-gnu/tls/libcuda.so.1
       475:	  trying file=/usr/lib/x86_64-linux-gnu/x86_64/libcuda.so.1
       475:	  trying file=/usr/lib/x86_64-linux-gnu/libcuda.so.1
       475:	  trying file=/lib/tls/x86_64/libcuda.so.1
       475:	  trying file=/lib/tls/libcuda.so.1
       475:	  trying file=/lib/x86_64/libcuda.so.1
       475:	  trying file=/lib/libcuda.so.1
       475:	  trying file=/usr/lib/tls/x86_64/libcuda.so.1
       475:	  trying file=/usr/lib/tls/libcuda.so.1
       475:	  trying file=/usr/lib/x86_64/libcuda.so.1
       475:	  trying file=/usr/lib/libcuda.so.1
       475:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 22, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

可能更合理的做法应该是避免用户使用 GPU 的 tensorflow 的镜像,而又要运行到 CPU 的机器上。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值