软件:ubuntu16.04 LTS+GeForce GTX 1070+TensorFlow
1、准备阶段
电脑配置: CPU: intel 酷睿 i7-7700k、显卡: GTX1070、内存:16G、硬盘:256G SSD + 2TB机械硬盘
软件: ubuntu16.04+cuda8+cudnn8+Anaconda2+tensorflow
2、安装NVIDIA GTX1070显卡驱动 (驱动也可以事先下载后再装)
如果ubuntu上曾经安装过驱动,则首先删除之前的驱动,然后再进行下面的操作。如果没有安装过,则直接进入下面的步骤。
1)sudo add-apt-repository ppa:graphics-drivers/ppa
2)sudo apt-get update
3)sudo apt-get install nvidia-384
4)sudo apt-get install mesa-common-dev
出现问题:“E: 无法定位软件包 mesa-commmon-dev”
解决办法:我当初在输入mesa后按Tab键自动补全就好了(很奇怪:为啥直接输入的名字却识别不了)。如果这样也不行的话,可能需要更新镜像源了。
出现问题2:E: 无法获得锁 /var/lib/dpkg/lock - open (11: 资源暂时不可用)
E: 无法锁定管理目录(/var/lib/dpkg/),是否有其他进程正占用它?
解决方法:sudo rm /var/cache/apt/archives/lock
sudo rm /var/lib/dpkg/lock
5)sudo apt-get install freeglut3-dev
注:当然也可以先下载显卡驱动,然后再装驱动。
6)测试显卡驱动是否安装成功
命令:nvidia-smi
出现问题:NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
解决方法:重启系统就好了(原因位置),重新输入命令后出现下面的结果,则表面cuda安装成功。
3、安装cuda
下载链接:https://developer.nvidia.com/cuda-downloads
官方安装教程:http://docs.nvidia.com/cuda/cuda-quick-start-guide/index.html
1)sudo sh cuda_8.0.44_linux.run
注:下面这句一定要写'n',不然前面装的驱动不是白装了
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
-------------------------------------------------------------
Do you accept the previously read EULA?
accept/decline/quit: accept
Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 384.81?
(y)es/(n)o/(q)uit: n
Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y
Enter Toolkit Location
[ default is /usr/local/cuda-8.0 ]:
Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y
Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y
Enter CUDA Samples Location
[ default is /home/xw ]:
Installing the CUDA Toolkit in /usr/local/cuda-8.0 ...
Missing recommended library: libXi.so
Missing recommended library: libXmu.so
Installing the CUDA Samples in /home/xw ...
Copying samples to /home/xw/NVIDIA_CUDA-8.0_Samples now...
Finished copying samples.
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-8.0
Samples: Installed in /home/xw, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-8.0/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin
Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run -silent -driver
Logfile is /tmp/cuda_install_4749.log
2)配置环境变量
2.1)命令:vim ~/.bashrc
添加以下内容:
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
3)编译实例,验证cuda是否安装成功
$ cd NVIDIA_CUDA-8.0_Samples/
$ make
$ cd bin/x86_64/linux/release/
$ ./deviceQuery
4、安装cuDNN(cudnn7)
从Tar文件安装
$ tar -zxvf cudnn-8.0-linux-x64-v6.0.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
5、安装Anaconda2
$ bash Anaconda2-5.1.0-Linux-x86_64.sh
进入以下界面:
Do you approve the license terms? [yes|no]
>>> yes
...
Do you wish the installer to prepend the Anaconda2 install location
to PATH in your /home/xw/.bashrc ? [yes|no]
[no] >>> yes
$ python
6、在Anconda2中安装tensorflow
$ conda create -n tensorflow
$ source activate tensorflow
激活tensorflow: source activate tensorflow
关闭tensorflow: source deactivate tensorflow
安装tensorflow:
(tensorflow) xw@xw-System-Product-Name:~$ pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.6.0-cp36-cp36m-linux_x86_64.whl
上面会发生超时现象,不予推荐。
使用科大镜像:
(tensorflow) xw@xw-System-Product-Name:~$ pip install tensorflow-gpu -i https://pypi.mirrors.ustc.edu.cn/simple
测试tensorflow是否安装成功
$ python
>>> import tensorflow
出现下面的问题:
ImportError: libcublas.so.8.0: cannot open shared object file: No such file or directory
上面的问题说明tensorflow-gpu版本高了,我这个装了最新的tensorflow-gpu 1.6的,所以我改成安装1.4
(tensorflow) xw@xw-System-Product-Name:~$ pip install tensorflow-gpu==1.4 -i https://pypi.mirrors.ustc.edu.cn/simple
如果还是不行,就再降低tensorflow-gpu的版本。
再次测试后:
卸载命令:
卸载cuda:$ sudo /usr/local/cuda-9.0/bin/uninstall_cuda_9.0.pl
卸载驱动程序:
第1种方式: $ sudo /usr/bin/nvidia-uninstall
第2种方式:$ sudo apt-get remove --purge nvidia-* (推荐)
卸载tensorflow:
第1种:sudo rm -rf tensorflow/
第2种:conda remove -n tensorflow --all
第3种:pip uninstall tensorflow-gpu (推荐)
参考文献:http://blog.csdn.net/swiftfake/article/details/79429539
官方安装文档:https://www.tensorflow.org/install/install_linux