环境:ubuntu16.04+cuda9.2+cudnn7.4
刚开始是采用pip install tensorflow-gpu==1.10
进行安装,在import tensorflow
时总是出现下面的错误:ImportError: libcublas.so.9.0: cannot open shared object file: No such file or director
,发现它要找的是libcublas.so.9.0,而相应路径下只有libcublas.so.9.2。寻找解决办法未果,只能自己编译tensorflow。具体过程如下:
cuda和cudnn的安装就不说了。
1、安装bazel
sudo apt-get install openjdk-8-jdk
wget https://github.com/bazelbuild/bazel/releases/download/0.13.1/bazel_0.15.0-linux-x86_64.deb
sudo dpkg -i bazel_0.15.0-linux-x86_64.deb
2、下载编译tensorflow
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git pull
git checkout r1.10
进行相应的配置,在tensorflow目录下,运行./configure
Please specify the location of python. [Default is /usr/bin/python]:
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: Y
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: Y
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: Y
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: Y
Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: Y
Do you wish to build TensorFlow with XLA JIT support? [y/N]: N
Do you wish to build TensorFlow with GDR support? [y/N]: N
Do you wish to build TensorFlow with VERBS support? [y/N]: N
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N
Do you wish to build TensorFlow with CUDA support? [y/N]: Y
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.2
Please specify the location where CUDA 9.2 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-9.2
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.4
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-9.2]: /usr/local/cuda
Do you wish to build TensorFlow with TensorRT support? [y/N]: N
Please specify the NCCL version you want to use. [Leave empty to default to NCCL 2.2]: 1.3 #注意这里不选1.3的话需要自己安装NCCL
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.5]
Do you want to use clang as CUDA compiler? [y/N]: N
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /usr/bin/gcc
Do you wish to build TensorFlow with MPI support? [y/N]: N
Please specify optimization flags to use during compilation when bazel option “–config=opt” is specified [Default is -march=native]: -march=native
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:N
然后开始编译,大约半小时:
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
打包成pip可安装的whl文件:
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
再进行安装:
pip3 install /tmp/tensorflow_pkg/*.whl
最后测试安装是否成功,import tensorflow
若报错:ImportError: cannot import name ‘build_info’,解决办法的跳出tensorflow目录,在根目录下运行就没错了。