Ubuntu14.04下源码安装tensorflow 0.12
昨晚手痒用pip将tensorflow更新到了0.12,更新完发现问题来了:pip安装最新版的tensorflow默认只支持CUDA8.0和CuDNN v5,由于我的机器是CUDA7.5和CuDNN v4,所以必须从源码安装,下面是安装过程,希望给大家一些帮助。
一、首先是一些准备工作
1、下载tensorflow源码:
$ git clone https://github.com/tensorflow/tensorflow
2、准备Linux安装环境,包括:Install Bazel、Install other dependencies、CUDA、CuDNN等,详细过程请参考tensorflow官网 ,这里不再赘述。
二、Configure the installation
这里以我的机器为例,
$ cd tensorflow
$ ./confogure
Do you wish to build TensorFlow with OpenCL support? [y/N] y
OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] y
CUDA support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 7.5
Please specify the location where CUDA 7.5 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-7.5
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 4
Please specify the location where cuDNN 4 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-7.5]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 5.2
Please specify which C++ compiler should be used as the host C++ compiler. [Default is ]: /usr/bin/g++
Please specify which C compiler should be used as the host C compiler. [Default is ]: /usr/bin/g++
Please specify the location where ComputeCpp for SYCL 1.2 is installed. [Default is /usr/local/computecpp]:
其中,GPU的计算能力值可以从这里查到,我的GTX 980Ti是5.2:
如果在配置时将选择OpenCL support为Y,则在最后一项可能会提示没有找到computecpp,需要安装computecpp,官网:
三、遇到坑
以上信息配置好之后,配置程序会自动下载一些文件,在下载过程中遇到了墙,提示下列信息:
Timeout connecting to https://cdnjs.cloudflare.com/ajax/libs/numeroc/1.2.6/numeric.min.js
怎么解决呢?方法一:科学上网;方法二:修改tensorflow配置文件,这里只介绍方法二。
方法二:修改tensorflow配置文件
上面的网站被墙了,没办法,那就找一个代替的网站。我们首先在源码文件夹里找到配置文件:
$ grep numeric.min.js *
找到配置参数在WORKSPACE文件中,用vim打开WORKSPACE,修改配置参数,将url的值改成下面的网址:
http_file(
name = "numericjs_numeric_min_js",
url = "http://www.numericjs.com/lib/numeric-1.2.6.min.js",
)
重新./configure,应该可以了。
四、numpy、six版本问题
上述过程配置好之后,安装过程没有出现问题,导入tensorflow时出错了:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
RuntimeError: module compiled against API version 0xa but this version of numpy is 0x9
---------------------------------------------------------------------------
SystemError Traceback (most recent call last)
<ipython-input-1-a649b509054f> in <module>()
----> 1 import tensorflow
/usr/local/lib/python3.4/dist-packages/tensorflow/__init__.py in <module>()
22
23 # pylint: disable=wildcard-import
---> 24 from tensorflow.python import *
25 # pylint: enable=wildcard-import
26
/usr/local/lib/python3.4/dist-packages/tensorflow/python/__init__.py in <module>()
59 _default_dlopen_flags = sys.getdlopenflags()
60 sys.setdlopenflags(_default_dlopen_flags | ctypes.RTLD_GLOBAL)
---> 61 from tensorflow.python import pywrap_tensorflow
62 sys.setdlopenflags(_default_dlopen_flags)
63 else:
/usr/local/lib/python3.4/dist-packages/tensorflow/python/pywrap_tensorflow.py in <module>()
26 fp.close()
27 return _mod
---> 28 _pywrap_tensorflow = swig_import_helper()
29 del swig_import_helper
30 else:
/usr/local/lib/python3.4/dist-packages/tensorflow/python/pywrap_tensorflow.py in swig_import_helper()
22 if fp is not None:
23 try:
---> 24 _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
25 finally:
26 fp.close()
/usr/lib/python3.4/imp.py in load_module(name, file, filename, details)
241 return load_dynamic(name, filename, opened_file)
242 else:
--> 243 return load_dynamic(name, filename, file)
244 elif type_ == PKG_DIRECTORY:
245 return load_package(name, filename)
SystemError: initialization of _pywrap_tensorflow raised unreported exception
猜测是numpy版本问题,于是卸载numpy,重新安装(试了直接升级,然并软。。):
$ sudo pip3 uninstall numpy
$ sudo pip3 install numpy
重新导入tf,又出错了。查了一下,是six包的原因,于是又将six卸载重装,搞定!
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-a649b509054f> in <module>()
----> 1 import tensorflow
/usr/local/lib/python3.4/dist-packages/tensorflow/__init__.py in <module>()
22
23 # pylint: disable=wildcard-import
---> 24 from tensorflow.python import *
25 # pylint: enable=wildcard-import
26
/usr/local/lib/python3.4/dist-packages/tensorflow/python/__init__.py in <module>()
122 from tensorflow.python.platform import resource_loader
123 from tensorflow.python.platform import sysconfig
--> 124 from tensorflow.python.platform import test
125
126 from tensorflow.python.util.all_util import remove_undocumented
/usr/local/lib/python3.4/dist-packages/tensorflow/python/platform/test.py in <module>()
67 # pylint: disable=g-bad-import-order
68 from tensorflow.python.client import device_lib as _device_lib
---> 69 from tensorflow.python.framework import test_util as _test_util
70 from tensorflow.python.platform import googletest as _googletest
71 from tensorflow.python.util.all_util import remove_undocumented
/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/test_util.py in <module>()
41 from tensorflow.python.framework import random_seed
42 from tensorflow.python.framework import versions
---> 43 from tensorflow.python.platform import googletest
44 from tensorflow.python.platform import tf_logging as logging
45 from tensorflow.python.util import compat
/usr/local/lib/python3.4/dist-packages/tensorflow/python/platform/googletest.py in <module>()
31
32 from tensorflow.python.platform import app
---> 33 from tensorflow.python.platform import benchmark # pylint: disable=unused-import
34
35 Benchmark = benchmark.TensorFlowBenchmark # pylint: disable=invalid-name
/usr/local/lib/python3.4/dist-packages/tensorflow/python/platform/benchmark.py in <module>()
115
116
--> 117 class Benchmark(six.with_metaclass(_BenchmarkRegistrar, object)):
118 """Abstract class that provides helper functions for running benchmarks.
119
/usr/lib/python3/dist-packages/six.py in with_metaclass(meta, *bases)
615 def with_metaclass(meta, *bases):
616 """Create a base class with a metaclass."""
--> 617 return meta("NewBase", bases, {})
618
619 def add_metaclass(metaclass):
/usr/local/lib/python3.4/dist-packages/tensorflow/python/platform/benchmark.py in __new__(mcs, clsname, base, attrs)
110 newclass = super(mcs, _BenchmarkRegistrar).__new__(
111 mcs, clsname, base, attrs)
--> 112 if not newclass.is_abstract():
113 GLOBAL_BENCHMARK_REGISTRY.add(newclass)
114 return newclass
AttributeError: type object 'NewBase' has no attribute 'is_abstract'
再次导入tensorflow,没有出现问题,程序也能够正常运行。若还出现问题,可以参考这里。
In [1]: import tensorflow
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.4 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.7.5 locally
五、总结
折腾了一上午,终于将tensorflow成功升级到0.12,希望对遇到同样问题的人有些帮助。