安装gpu driver cuda cuDNN unbuntu16.04

最新推荐文章于 2024-05-09 11:00:58 发布

aijava1

最新推荐文章于 2024-05-09 11:00:58 发布

阅读量3.2k

点赞数 4

分类专栏： gpu 文章标签： cuda gpu

本文链接：https://blog.csdn.net/aijava1/article/details/103109787

版权

gpu 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

文章目录

1.Install NVIDIA Graphics Driver via runfile

参考：https://gist.github.com/wangruohui/df039f0dc434d6486f5d4d098aa52d07#install-nvidia-graphics-driver-via-runfile

1.1卸载之前的老版本：

zutnlp@YQ2:/opt/nvidia$ sudo apt-get purge nvidia*

1.2下载cuda Driver

https://www.nvidia.com/Download/index.aspx?lang=en-us
找到对应的版本
应为之前已经有机器下载了，只需要scp过去即可：

zutnlp@YQ1:~/.ssh$ sudo scp zutnlp@10.63.3.31:~/Downloads/NVIDIA-Linux-x86_64-418.87.01.run zutnlp@10.63.3.32:/opt/nvidia

如果遇到 scp 权限问题，请修改提示文件的权限为777即：

zutnlp@YQ2:/opt$ sudo chmod  777 nvidia/
[sudo] password for zutnlp:

1.3禁用Nouveau Driver

因为之前操作过进行过设置，所以几乎不需要任何操作，如果不是下列形式，请使用vim 进行编辑并且添加内容如下。

zutnlp@YQ2:~$ cat  /etc/modprobe.d/blacklist-nouveau.conf 
blacklist nouveau
options nouveau modset=0

1.4关闭x-server 相关服务

sudo service lightdm stop

1.5执行run file

不知道什么原因-dkms 报一下错误，但是如果不再添加这个参数-dkms ,以后kenel更新就会造成驱动失效，需要重新装驱动，最后还是找到问题了，由于缺少依赖造成的：

zutnlp@YQ2:/opt/nvidia$ sudo sh  NVIDIA-Linux-x86_64-418.87.01.run -s --dkms   --no-opengl-files
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 418.87.01..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

ERROR: Failed to find dkms on the system!
ERROR: Failed to install the kernel module through DKMS. No kernel module was
       installed; please try installing again without DKMS, or check the DKMS
       logs for more information.


ERROR: Installation has failed.  Please see the file
       '/var/log/nvidia-installer.log' for details.  You may find suggestions
       on fixing installation problems in the README available on the Linux
       driver download page at www.nvidia.com.

最后得知是少了一些依赖，所以保险起见增加一些依赖库如下：

 sudo apt-get update 
 sudo apt-get install dkms build-essential linux-headers-generic
 sudo apt-get install gcc-multilib xorg-dev
 sudo apt-get install freeglut3-dev libx11-dev libxmu-dev install libxi-dev  libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

然后重新执行后成功：

zutnlp@YQ2:/opt/nvidia$ sudo sh  NVIDIA-Linux-x86_64-418.87.01.run -s --dkms   --no-opengl-files

参数说明：
--no-opengl-files：表示只安装驱动文件，不安装OpenGL文件。这个参数不可省略，否则会导致登陆界面死循环，英语一般称为”login loop”或者”stuck in login”。 **必选参数解释**：因为NVIDIA的驱动默认会安装OpenGL，而Ubuntu的内核本身也有OpenGL、且与GUI显示息息相关，一旦NVIDIA的驱动覆写了OpenGL，在GUI需要动态链接OpenGL库的时候就引起问题。
–no-x-check：表示安装驱动时不检查X服务，非必需，我们已经禁用图形界面。
–no-nouveau-check：表示安装驱动时不检查nouveau，非必需，我们已经禁用驱动。
-Z, –disable-nouveau：禁用nouveau。此参数非必需，因为之前已经手动禁用了nouveau。
-A：查看更多高级选项。
-dkms（ 建议开启 ）  在 kernel 自行更新时将驱动程序安装至模块中，从而阻止驱动程序重新安装。** 在 kernel 更新期间，dkms 触发驱动程序重编译至新的 kernel 模块堆栈。
-s   is used for silent installation which should used for batch installation. For installation on a single computer, this option should be turned off for more installtion information.

以下证明驱动安装成功：

zutnlp@YQ2:/opt/nvidia$ nvidia-smi
Sun Nov 17 18:47:07 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:04:00.0 Off |                    0 |
| N/A   34C    P0    29W / 250W |      0MiB / 12198MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:82:00.0 Off |                    0 |
| N/A   35C    P0    24W / 250W |      0MiB / 12198MiB |      6%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

2.install cuda

2.1 具体操作步骤

https://developer.nvidia.com/cuda-downloads
cuda 安装比较简单，只需要执行run 脚本即可：
选择安装方式

wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run
sudo sh cuda_10.1.243_418.87.00_linux.run

应为文件比较大，所以建议从舆情1拷贝到这个机器是比较快的：

zutnlp@YQ1:~/.ssh$ scp /usr/local/cuda_10.1.243_418.87.00_linux.run  zutnlp@10.63.3.32:/opt/nvidia

以下是这个过程：

────────────────────────────────────────────────────────────────────────────┐
│  End User License Agreement                                                  │
│  --------------------------                                                  │
│                                                                              │
│                                                                              │
│  Preface                                                                     │
│  -------                                                                     │
│                                                                              │
│  The Software License Agreement in Chapter 1 and the Supplement              │
│  in Chapter 2 contain license terms and conditions that govern               │
│  the use of NVIDIA software. By accepting this agreement, you                │
│  agree to comply with all the terms and conditions applicable                │
│  to the product(s) included herein.                                          │
│                                                                              │
│                                                                              │
│  NVIDIA Driver                                                               │
│                                                                              │
│                                                                              │
│  Description                                                                 │
│                                                                              │
│  This package contains the operating system driver and                       │
│──────────────────────────────────────────────────────────────────────────────│
│ Do you accept the above EULA? (accept/decline/quit):                         │
│ accept                                                                       │

空格选中，前方有x 代表选中，此处只选中cuda toolkit 即可：

│ CUDA Installer                                                               │
│ - [ ] Driver                                                                 │
│      [ ] 418.87.00                                                           │
│ + [X] CUDA Toolkit 10.1                                                      │
│   [ ] CUDA Samples 10.1                                                      │
│   [ ] CUDA Demo Suite 10.1                                                   │
│   [ ] CUDA Documentation 10.1                                                │
│   Options                                                                    │
│   Install

选中yes

──────────────────────────────────────────────────────────────────────────────┐
│ A symlink already exists at /usr/local/cuda. Update to this installation?    │
│ Yes                                                                          │
│ No

等待一会出现如下代表安装完成：

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-10.1/
Samples:  Not Selected

Please make sure that
 -   PATH includes /usr/local/cuda-10.1/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-10.1/lib64, or, add /usr/local/cuda-10.1/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.1/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.1/doc/pdf for detailed information on setting up CUDA.
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 418.00 is required for CUDA 10.1 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log

输入以下命令验证是否成功：

zutnlp@YQ2:/opt/nvidia$ cat /usr/local/cuda/version.txt 
CUDA Version 10.1.243

2.2 配置运行库

sudo bash -c "echo /usr/local/cuda/lib64/ > /etc/ld.so.conf.d/cuda.conf"
sudo ldconfig

把 /usr/local/cuda/bin 添加到系统的环境变量path 中，使用以下命令：

zutnlp@YQ2:/opt/nvidia$ vim /etc/environment 
zutnlp@YQ2:/opt/nvidia$ sudo vim /etc/environment 
zutnlp@YQ2:/opt/nvidia$ source /etc/environment

2.3 解决nvcc -V 没有起作用

发现已经升级了cuda 但是nvcc 还是显示以前的cuda 9.0:

zutnlp@YQ2:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

所以需要如下代码添加到 /etc/profile中末尾中，操作完成后如下：

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

操作完成后可以查看如下：

zutnlp@YQ2:~$ cat  /etc/profile
# /etc/profile: system-wide .profile file for the Bourne shell (sh(1))
# and Bourne compatible shells (bash(1), ksh(1), ash(1), ...).

if [ "$PS1" ]; then
  if [ "$BASH" ] && [ "$BASH" != "/bin/sh" ]; then
    # The file bash.bashrc already sets the default PS1.
    # PS1='\h:\w\$ '
    if [ -f /etc/bash.bashrc ]; then
      . /etc/bash.bashrc
    fi
  else
    if [ "`id -u`" -eq 0 ]; then
      PS1='# '
    else
      PS1='$ '
    fi
  fi
fi

if [ -d /etc/profile.d ]; then
  for i in /etc/profile.d/*.sh; do
    if [ -r $i ]; then
      . $i
    fi
  done
  unset i
fi

xrandr --newmode 1920x1080 173.00  1920 2048 2248 2576  1080 1083 1088 1120 -hsync +vsync
xrandr --addmode VGA-1 1920x1080
xrandr --output VGA-1 --mode 1920x1080

export PYTHONPATH=/home/yuqing/data/tf/models:/home/zutnlp/quanyou.chang/models/tf/slim
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

操作步骤如下：

zutnlp@YQ2:~$ vim /etc/profile
zutnlp@YQ2:~$ sudo vim /etc/profile
[sudo] password for zutnlp: 
zutnlp@YQ2:~$ source /etc/profile
Can't open display 
Can't open display 
Can't open display

这时可以测试是否配置成功如下：

zutnlp@YQ2:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

3.install cuDnn

引用这个文章的install cuDNN部分：https://medium.com/repro-repo/install-cuda-and-cudnn-for-tensorflow-gpu-on-ubuntu-79306e4ac04e

3.1到官网下载选择对应的版本，注意有三个包：

https://developer.nvidia.com/rdp/cudnn-download
在这里插入图片描述

zutnlp@YQ1:~/wueryong/nvidia/cuDNNN$ ll
total 336524
drwxrwxr-x 2 zutnlp zutnlp      4096 11月 17 15:50 ./
drwxrwxr-x 3 zutnlp zutnlp      4096 11月 17 15:50 ../
-rw-r--r-- 1 zutnlp zutnlp 180962466 11月 16 21:19 libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
-rw-r--r-- 1 zutnlp zutnlp 159185716 11月 16 21:18 libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb
-rw-r--r-- 1 zutnlp zutnlp   4428908 11月 16 21:03 libcudnn7-doc_7.6.5.32-1+cuda10.1_amd64.deb

3.2 执行安装命令

zutnlp@YQ1:~/wueryong/nvidia/cuDNNN$ sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb 
Selecting previously unselected package libcudnn7.
(Reading database ... 351236 files and directories currently installed.)
Preparing to unpack libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb ...
Unpacking libcudnn7 (7.6.5.32-1+cuda10.1) ...
Setting up libcudnn7 (7.6.5.32-1+cuda10.1) ...
Processing triggers for libc-bin (2.23-0ubuntu11) ...
zutnlp@YQ1:~/wueryong/nvidia/cuDNNN$ sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb 
Selecting previously unselected package libcudnn7-dev.
(Reading database ... 351242 files and directories currently installed.)
Preparing to unpack libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb ...
Unpacking libcudnn7-dev (7.6.5.32-1+cuda10.1) ...
Setting up libcudnn7-dev (7.6.5.32-1+cuda10.1) ...
update-alternatives: using /usr/include/x86_64-linux-gnu/cudnn_v7.h to provide /usr/include/cudnn.h (libcudnn) in auto mode
zutnlp@YQ1:~/wueryong/nvidia/cuDNNN$ sudo dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.1_amd64.deb 
Selecting previously unselected package libcudnn7-doc.
(Reading database ... 351248 files and directories currently installed.)
Preparing to unpack libcudnn7-doc_7.6.5.32-1+cuda10.1_amd64.deb ...
Unpacking libcudnn7-doc (7.6.5.32-1+cuda10.1) ...
Setting up libcudnn7-doc (7.6.5.32-1+cuda10.1) ...

3.3进项编译测试：

zutnlp@YQ1:/usr/src/nvidia-418.87.01$ cp -r /usr/src/cudnn_samples_v7/ ~
zutnlp@YQ1:/usr/src/nvidia-418.87.01$ cd ~/cudnn_samples_v7/mnistCUDNN
zutnlp@YQ1:~/cudnn_samples_v7/mnistCUDNN$ make clean && make
rm -rf *o
rm -rf mnistCUDNN
Linking agains cublasLt = true
CUDA VERSION: 10010
TARGET ARCH: x86_64
HOST_ARCH: x86_64
TARGET OS: linux
SMS: 30 35 50 53 60 61 62 70 72 75
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o fp16_dev.o -c fp16_dev.cu
g++ -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include   -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include   -o mnistCUDNN.o -c mnistCUDNN.cpp
/usr/local/cuda/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include  -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcublasLt -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm

zutnlp@YQ1:~/cudnn_samples_v7/mnistCUDNN$ ./mnistCUDNN
cudnnGetVersion() : 7605 , CUDNN_VERSION from cudnn.h : 7605 (7.6.5)
Host compiler version : GCC 4.9.4
There are 2 CUDA capable devices on your machine :
device 0 : sms 56  Capabilities 6.0, SmClock 1328.5 Mhz, MemSize (Mb) 12198, MemClock 715.0 Mhz, Ecc=1, boardGroupID=0
device 1 : sms 56  Capabilities 6.0, SmClock 1328.5 Mhz, MemSize (Mb) 12198, MemClock 715.0 Mhz, Ecc=1, boardGroupID=1
Using device 0

Testing single precision
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.031264 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.068768 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.079424 time requiring 57600 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.095104 time requiring 203008 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.101568 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm ...
Fastest algorithm is Algo 1
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.024544 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.059488 time requiring 3464 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.068256 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.076192 time requiring 203008 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.090816 time requiring 2057744 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

4.官方参考文档

官方文档：https://docs.nvidia.com/deeplearning/sdk/cudnn-install/

aijava1

关注

4
点赞
踩
13

收藏

觉得还不错? 一键收藏
打赏
0
评论
安装gpu driver cuda cuDNN unbuntu16.04

install cuDnn1.到官网下载三个包分别如下：https://developer.nvidia.com/rdp/cudnn-downloadzutnlp@YQ1:~/wueryong/nvidia/cuDNNN$ lltotal 336524drwxrwxr-x 2 zutnlp zutnlp 4096 11月 17 15:50 ./drwxrwxr-x 3 zu...
复制链接

扫一扫