Ubuntu14下cuda-7.5、tensorflow安装(enabling cuda 3.0)
本文主要介绍在Ubuntu 14.04(64-bit)下安装支持NVIDIA GPU计算能力3.0的 tensorflow(版本0.7.1),如果计算能力在(3.5-5.2)直接按照官方教程安装。
我在百度云 上传了一份官方0.7.1的whl安装包,密码:dfmg ; 以及一份我自己编译的安装包,密码:h4y9,支持GPU计算能力3.0
1. 配置依赖环境
本文以python2为例
# Ubuntu/Linux 64-bit
$ sudo apt-get install build-essential python-pip python-dev
2. 安装Cuda-7.5 下载cuda
本文使用.run文件安装,安装cuda会出现ubuntu无法进入桌面的问题,王云飞师兄发现安装cuda时不装openGL可以解决这个问题,目前没发现不装openGL会造成问题。cuda包中自带NVIDIA显卡驱动,故不需要在此安装驱动。
- 禁止nouveau 参考官网
$ sudo gedit /usr/lib/modprobe.d/blacklist-nouveau.conf
# 在打开的编辑器中输入下面两行,保存并关闭
blacklist nouveau
options nouveau modeset=0
# Regenerate the kernel initramfs
$ sudo update-initramfs -u
- 开始安装
注意询问你安装 openGL时候,选择no,这样就不会发生装完lightdm无法启动的问题了
ctrl+alt+F1进入tty1
# 关闭lightdm
$ sudo service lightdm stop
# 进入cuda的下载目录(本文是Downloads)
$ cd Downloads/
# 更新
$ sudo apt-get update
# 安装(空格或pagedown都可往下读协议)
# 安装 openGL 时候,选择no 其他默认即可
$ sudo sh cuda_7.5.18_linux.run
# 安装完成后启动lightdm
$ sudo service lightdm start
# 启动后 ctrl+alt+F7 进入tty7
3. 环境变量设置
这里参考官方教程 6.Post-installation Actions
The
PATH
variable needs to include/usr/local/cuda-7.5/bin
TheLD_LIBRARY_PATH
variable needs to contain/usr/local/cuda-7.5/lib64
on a 64-bit system, and/usr/local/cuda-7.5/lib
on a 32-bit system
#To change the environment variables for 64-bit operating systems:
$ export PATH=/usr/local/cuda-7.5/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH
4. 安装cuDNN
下载cuDNN。本文使用cuDNN-V4
# 进入cuda的下载目录(本文是Downloads)
$ cd Downloads/
$ tar xvzf cudnn-7.0-linux-x64-v4.0-prod.tgz
$ cd cuda
$ sudo cp include/cudnn.h /usr/local/cuda/include/
$ sudo cp lib64/libcudnn* /usr/local/cuda/lib64/
$ sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
5. 编译支持cuda3.0的tensorflow
- 安装bazel 下载
$ cd Downloads
$ chmod +x bazel-0.2.1-installer-linux-x86_64.sh
$ ./bazel-0.2.1-installer-linux-x86_64.sh --user
- 从github下载tensorflow
$ git clone --recurse-submodules https://github.com/tensorflow/tensorflow --depth=1
–recurse-submodules is required to fetch the protobuf library that TensorFlow depends on. Note that these instructions will install the latest master branch of tensorflow. If you want to install a specific branch (such as a release branch), pass -b to the git clone command.
这里如果碰到问题可以尝试下面命令
$ git submodule update --init --recursive
- configure
$ cd tensorflow
$ TF_UNOFFICIAL_SETTING=1 ./configure
WARNING: You are configuring unofficial settings in TensorFlow. Because some
external libraries are not backward compatible, these settings are largely
untested and unsupported.
Please specify a list of comma-separated Cuda compute capabilities you want to
build with. You can find the compute capability of your device at:
https://developer.nvidia.com/cuda-gpus.
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave
empty to use system default]: 7.5
Please specify the location where CUDA 7.5 toolkit is installed. Refer to
README.md for more details. [default is: /usr/local/cuda]: /usr/local/cuda
Please specify the Cudnn version you want to use. [Leave empty to use system
default]: 4
Please specify the location where the cuDNN 4 library is installed. Refer to
README.md for more details. [default is: /usr/local/cuda]:
Please note that each additional compute capability significantly increases
your build time and binary size. [Default is: "3.5,5.2"]: 3.0
Setting up Cuda include
Setting up Cuda lib64
Setting up Cuda bin
Setting up Cuda nvvm
Configuration finished
- 编译tensorflow
$ bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
# To build with GPU support
# 如果要编译不用GPU的版本,去掉--config=cuda
$ bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
# 安装 The name of the .whl file will depend on your platform.
$ pip install /tmp/tensorflow_pkg/tensorflow-0.7.1-py2-none-linux_x86_64.whl
6.测试
$ python -m tensorflow.models.image.mnist.convolutional
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so.4.0.7 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so.7.5 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so.7.5 locally
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:900] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 1.085
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.98GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
Initialized!
Step 0 (epoch 0.00), 8.1 ms
Minibatch loss: 12.054, learning rate: 0.010000
Minibatch error: 90.6%
Validation error: 84.6%
Step 100 (epoch 0.12), 67.6 ms
Minibatch loss: 3.278, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 7.0%
......
......
Step 8500 (epoch 9.89), 67.6 ms
Minibatch loss: 1.618, learning rate: 0.006302
Minibatch error: 3.1%
Validation error: 0.8%
Test error: 0.8%
7.Common Problems
https://www.tensorflow.org/versions/r0.7/get_started/os_setup.html#common-problems