Ubuntu17.10 安装CUDA-9.1 + cuDNN-7.1 + tensorflow

最近由于突然断电,导致机器的显卡驱动出了问题,所以需要重新安装CUDA。正好就在这里总结一遍,顺便把cuDNN和tensorflow也一起安装了。

工作平台:AMD Ryzen 1800X + Ubuntu17.10 + Nvidia 1060 6G

安装之前,务必提醒:请仔细参阅Nvidia的官方安装说明https://docs.nvidia.com/cuda/,这个比网上的教程都有用。

一. 安装Nvidia显卡驱动。

1. 移除Ubuntu 自带的Nouveau驱动

1)查看nouveau驱动

lsmod | grep nouveau

如果有显示的话,需要移除。

2)移除nouveau驱动

sudo emacs /etc/modprobe.d/blacklist-nouveau.conf #创建nouveau.conf文件

blacklist nouveau
options nouveau modeset=0 #输入这两行

然后在终端输入

sudo update-initramfs –u

再次运行 lsmod | grep nouveau 检查是否禁用成功,如果运行后没有任何输出,则代表禁用成功。

3)禁用X服务

sudo service lightdm stop
2. 查看并下载GPU对应的驱动版本。

去官网(http://www.nvidia.com/Download/index.aspx?lang=en-us)查看GPU对应的驱动版本。

我下载的是 NVIDIA-Linux-x86_64-390.25.run。

chmod +x NVIDIA-Linux-x86_64-390.25.run
sudo ./NVIDIA-Linux-x86_64-390.25.run
#注意如果是双显卡(intel集显+Nvidia独显),则应该安装命令需要改为
sudo ./NVIDIA-Linux-x86_64-390.25 -no-opengl-libs
#这样做的目的是只让集显负责输出图像,独显不输出,所以到时候显示器要连集显

3. 安装CUDA-Toolkit

到官网(https://developer.nvidia.com/cuda-downloads?target_os=Linux)查找适用于GPU的CUDA版本,我下载的是CUDA9.1版本。需要说明的是,网上好多文章都是选择.deb(network)方式安装,但是我始终没有成功,所以我这里选择了.runfile文件(cuda_9.1.85_387.26_linux.run)本地安装。

另外需要特别注意的是,cuda9.1的安装不支持gcc7.0,而Ubuntu17.10本身自带的gcc版本是7.0,所以需要先安装版本较低的gcc(我安装了gcc-5),然后将gcc可执行文件重新软链接到gcc-5,

sudo apt-get install gcc-5 g++-5
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50
然后就是安装.run文件即可。
chmod +x cuda_9.1.85_387.26_linux.run
sudo ./cuda_9.1.85_387.26_linux.run

接下来安装过程中程序会问一系列的问题,需要特别注意的是CUDA安装包里已经包含了Nvidia显卡的驱动程序,由于在步骤2中已经安装了驱动,所以这里就不需要再次选择安装驱动了。安装完后,将CUDA的安装路径写入环境变量中

export PATH=/usr/local/cuda-9.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_HOME=/usr/local/cuda

重启机器后,检查驱动是否安装完好

nvidia-smi
cat /proc/driver/nvidia/version
检查CUDA安装版本
nvcc -V

重启图形化界面

sudo service lightdm start 
至此CUDA-Toolkit安装完毕。


二. 安装cuDNN

在安装TensorFlow之前,需要安装cdDNN框架。

去官网(https://developer.nvidia.com/rdp/cudnn-download)下载cuDNN。

参见这边文章(https://blog.csdn.net/lengconglin/article/details/77506386),下载三个文件后,依次安装即可

sudo dpkg -i libcudnn7_7.1.3.16-1+cuda9.1_amd64.deb
sudo dpkg -i libcudnn7-dev_7.1.3.16-1+cuda9.1_amd64.deb
sudo dpkg -i libcudnn7-doc_7.1.3.16-1+cuda9.1_amd64.deb


或者下载对应的.tgz文件,然后解压安装,

sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h
/usr/local/cuda/lib64/libcudnn*

这样cuDNN就安装完了。

安装完毕后参考官网教程(https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#axzz4qYJp45J2)进行测试。

cp -r /usr/src/cudnn_samples_v7/ $HOME
cd  $HOME/cudnn_samples_v7/mnistCUDNN
make clean && make
./mnistCUDNN

显示结果为

Test Passed!

三. 安装TensorFlow(TF)

1. 安装准备。参考官方安装文档(https://www.tensorflow.org/install/install_linux?hl=zh-cn#InstallingNativePip),我要安装支持GPU的TF版本。之前CUDA和cuDNN已经安装完毕,还需要安装libcupti-dev库

sudo apt-get install cuda-command-line-tools-9-1

然后把路径添加到动态链接库环境变量里

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64

2. 原生pip安装TF

根据教程,我选择了原生pip安装TF。

首先安装python3.4+和pip3。由于我已经自行安装了python3.6,所以只需要安装pip3即可。

sudo apt-get install python3-pip python3-dev 

接下来通过pip3安装TF即可

pip3 install tensorflow-gpu

这个过程非常慢。

或者可以通过tfBinaryURL安装,tfBinaryURL是指对应Python版本的TF安装包,即

pip3 install --upgrade tfBinaryURL #请参考官网说明下载对应Python版本的TensorFlow安装包

安装完后,参考官网教程进行验证即可。


至此,CUDA9.1+cuDNN+TensorFlow安装完毕。


2018.5.12日更新:CUDA9.1与tensorflow1.7似乎不兼容

由于GPU版本的TF下载速度很慢,今天一早才安装完毕。我尝试做了TF的测试,发现报错:

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

经过查询,很可能是因为CUDA9.1与通过pip3安装的TF版本不兼容造成的(https://github.com/tensorflow/tensorflow/issues/15604),我会再考察一下如何解决这一问题。

已经解决此问题:

造成上面问题的原因是TF的版本不支持CUDA9.1,根据官网安装说明只支持到CUDA9.0,所以需要另外安装CUDA9.0版本,相应的cuDNN的版本也需要安装7.0。大家可以参考这篇文章(https://blog.csdn.net/tunhuzhuang1836/article/details/79545625),同时安装不同版本的CUDA和cuDNN即可。

然后先卸载TF

pip3 uninstall tensorflow-gpu

然后安装上面的文章步骤重新安装TF的GPU版本即可,经测试没有问题。


自编译tensorflow: 1.python3.5,tensorflow1.12; 2.支持cuda10.0,cudnn7.3.1,TensorRT-5.0.2.6-cuda10.0-cudnn7.3; 3.支持mkl,无MPI; 软硬件硬件环境:Ubuntu16.04,GeForce GTX 1080 配置信息: hp@dla:~/work/ts_compile/tensorflow$ ./configure WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.19.1 installed. Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3 Found possible Python library paths: /usr/local/lib/python3.5/dist-packages /usr/lib/python3/dist-packages Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0 Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.3.1 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: Do you wish to build TensorFlow with TensorRT support? [y/N]: y TensorRT support will be enabled for TensorFlow. Please specify the location where TensorRT is installed. [Default is /usr/lib/x86_64-linux-gnu]:/home/hp/bin/TensorRT-5.0.2.6-cuda10.0-cudnn7.3/targets/x86_64-linux-gnu Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1,6.1,6.1]: Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished 编译: hp@dla:~/work/ts_compile/tensorflow$ bazel build --config=opt --config=mkl --verbose_failures //tensorflow/tools/pip_package:build_pip_package 卸载已有tensorflow: hp@dla:~/temp$ sudo pip3 uninstall tensorflow 安装自己编译的成果: hp@dla:~/temp$ sudo pip3 install tensorflow-1.12.0-cp35-cp35m-linux_x86_64.whl
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值