Ubuntu20.04安装NVIDIA显卡驱动、CUDA、CUDNN及突破NVENC并发限制、多版本CUDA切换

BetterJason

已于 2025-02-26 15:02:04 修改

阅读量2.1w

点赞数 43

分类专栏： FFMPEG 文章标签： ubuntu

于 2022-08-24 16:17:40 首次发布

本文链接：https://blog.csdn.net/m0_38101947/article/details/126499811

版权

FFMPEG 专栏收录该内容

4 篇文章 1 订阅

订阅专栏

1、查看当前系统版本

cat /proc/version

显示为：

2、查看当前显卡型号：

sudo lshw -numeric -C display

显示我的显卡型号为：GM107M[GeForce GTX 950M]

3、安装GCC G++ make等工具

sudo apt-get install build-essential
gcc --version 验证gcc是否安装成功

sudo apt-get install g++
sudo apt-get install make

4、卸载旧驱动

#使用apt方式安装的驱动

sudo apt-get remove --purge nvidia*

#使用run方式安转的驱动

sudo ./NVIDIA-Linux-x86_64-495.46.run --uninstall

其中NVIDIA-Linux-x86_64-495.46.run是原驱动的安装文件

#通吃的卸载方式：

sudo apt purge nvidia-*
sudo apt purge xserver-xorg-video-nouveau
sudo apt autoremove
sudo apt nvidia-uninstall

5、根据显卡型号，下载对应的显卡驱动

下载驱动网址：Official Drivers | NVIDIA

我们可以选择版本下载，点击Official Advanced Driver Search | NVIDIA

然后，根据自己的显卡型号，查找驱动，我这里选择510.68.02

sudo wget https://us.download.nvidia.cn/XFree86/Linux-x86_64/510.68.02/NVIDIA-Linux-x86_64-510.68.02.run

6、安装驱动之前,需要禁用nouveau（nouveau 是专门用来逆向工程 NVIDIA 闭源驱动的开源驱动项目）,因为ubuntu安装时会默认安装nouveau驱动，如果不禁用，会阻碍NVIDIA官方驱动的安装。

禁用nouveau

sudo vim /etc/modprobe.d/blacklist.conf

在文件末尾添加

blacklist nouveau
options nouveau modeset=0
####
blacklist vga16fb
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
####
blacklist lbm-nouveau
alias nouveau off
alias lbm-nouveau off

保存文件

对刚才修改的文件进行更新(下面三个命令选择适合自己的命令)

//更新当前内核 sudo update-initramfs -u

//为所有已安装的内核更新 sudo update-initramfs -c -k all

//为特定版本的内核更新或创建 initramfs sudo update-initramfs -c -k <kernel-version>

重启计算机，查看nouveau是否禁用成功

sudo reboot

lsmod | grep nouveau

执行完这句，如果没有任何输出，表示禁用成功。

修改 /etc/default/grub 文件,防止安装驱动后黑屏问题
sudo vim /etc/default/grub 弹出了Grub文件，

将 GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"替换为 GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash acpi_osi=linux”

wq保存

sudo reboot

7、关闭图形界面

进入 tty模式

ctrl + alt+ F1 或 ctrl + alt+ F2 或 ctrl + alt+ F3，视情况

输入：

sudo service lightdm stop

8、进入驱动所在目录，给驱动文件付权限，然后安装
sudo chmod +x NVIDIA-Linux-x86_64-510.68.02.run
sudo sh NVIDIA-Linux-x86_64-510.68.02.run -no-opengl-files -no-x-check -no-nouveau-check

参数说明：

-no-x-check：安装驱动时关闭 X 服务
-no-nouveau-check：安装驱动时禁用 nouveau
-no-opengl-files：只安装驱动文件，不安装 OpenGL 文件

Would you like to register the kernel module souces with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later? 选择 No 继续。

Nvidia's 32-bit compatibility libraries? 选择 No 继续。

Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up. 选择 Yes 继续

9、挂载 Nvidia 驱动

sudo modprobe nvidia

10、查看驱动是否安装成功

nvidia-smi

参考网址:

Ubuntu下安装nvidia显卡驱动 - AI菌的个人空间 - OSCHINA - 中文开源技术交流社区

深度学习环境配置——ubuntu20.04装nvidia驱动_我与nano的博客-CSDN博客_ubuntu20.04安装nvidia驱动

-----------------------------------------------------------华丽分割线-------------------------------------------------------

CUDA安装

1、查看自己电脑的驱动程序版本

nvidia-smi可以看到最高版本的cuda不能大于11.6

具体驱动程序对应的cuda版本可以通过官网查看

Release Notes :: CUDA Toolkit Documentation

2、下载cuda,官网地址：

CUDA Toolkit Archive | NVIDIA Developer

因为我的驱动是510.68.02,所以我下载CUDA Toolkit 11.6.2

选择runfile[local]方式安装.

PS：因为有的版本没有20.04就选择最高的，比如18.04

输入：

sudo sh cuda_11.6.2_510.47.03_linux.run

出现如下界面：

在这一步，输入 accept

因为驱动程序已经安装好了，所以按空格键，不要选中Driver这一项，然后选择Install，按回车键，开始安装.....

耐心等待中ing.......

安装完成后，会有如下提示：

配置环境变量

sudo vim /etc/profile

在文件末尾添加：

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.6/lib64
export PATH=/usr/local/cuda-11.6/bin:$PATH

保存文件

刷新profile文件

source /etc/profile

看看cuda安装是否正常

nvcc -V

参考网址:

ubuntu20.04安装cuda库_是大小姐H啊的博客-CSDN博客_ubuntu20.04安装cuda Ubuntu 20.04安装CUDA & CUDNN 手把手带你撸_哈希Map的博客-CSDN博客_ubuntu20.04安装cuda

-----------------------------------------------------------华丽分割线-------------------------------------------------------

cudnn安装

1、下载cudnn(需要登录)

cuDNN 9.7.1 Downloads | NVIDIA Developer

因为我的cuda版本是11.6所以，下载for cuda11.x

2、安装cudnn

点击这里查看安装说明Installation Guide :: NVIDIA Deep Learning cuDNN Documentation

安装步骤：

Download the Debian local repository installation package. Before issuing the following commands, you must replace X.Y and 8.x.x.x with your specific CUDA and cuDNN versions.

Procedure

Navigate to your <cudnnpath> directory containing the cuDNN Debian local installer file.

Enable the local repository.

sudo dpkg -i cudnn-local-repo-${OS}-8.x.x.x_1.0-1_amd64.deb

sudo dpkg -i cudnn-local-repo-${OS}-8.x.x.x_1.0-1_arm64.deb

Import the CUDA GPG key.

sudo cp /var/cudnn-local-repo-*/cudnn-local-*-keyring.gpg /usr/share/keyrings/

Refresh the repository metadata.
```
sudo apt-get update
```

Install the runtime library.

sudo apt-get install libcudnn8=8.x.x.x-1+cudaX.Y

Install the developer library.

sudo apt-get install libcudnn8-dev=8.x.x.x-1+cudaX.Y

Install the code samples and the cuDNN library documentation.
```
sudo apt-get install libcudnn8-samples=8.x.x.x-1+cudaX.Y
```

PS：

1、我在安装cudnn的时候，上述5、6、7的步骤中

libcudnn8=8.x.x.x-1+cudaX.Y中俄cudaX.Y没有cuda11.6

只有cuda11.7,如：第5步骤中的，应该为

sudo apt-get install libcudnn8=8.5.0.96-1+cuda11.7

而非：

sudo apt-get install libcudnn8=8.5.0.96-1+cuda11.6，虽然我的cuda的驱动是11.6.

截止发稿时，通过官网说明得知：cudaX.X只有两种一种 cuda10.2 or cuda11.7

最近安装cudnn9.7发现上述安装方法已经不适用了。请按照官方提示安装，如下:

检查是否安装成功：

To verify that cuDNN is installed and is running properly, compile the mnistCUDNN sample located in the /usr/src/cudnn_samples_v8 directory in the Debian file.

Procedure

Copy the cuDNN samples to a writable path.
```
$cp -r /usr/src/cudnn_samples_v8/ $HOME
```
Go to the writable path.
```
$ cd  $HOME/cudnn_samples_v8/mnistCUDNN
```
Compile the mnistCUDNN sample.
```
$make clean && make
```

Run the mnistCUDNN sample.

$ ./mnistCUDNN

If cuDNN is properly installed and running on your Linux system, you will see a message similar to the following:

Test passed!

如果在运行make指令的时候，出现了不能编译，提示缺少FreeLmage.h与资源库的问题
则运行下面指令安装FreeLmage相关文件：

sudo apt-get install libfreeimage3 libfreeimage-dev

-----------------------------------------------------------华丽分割线-------------------------------------------------------

突破NVENC的并发限制：

NVIDIA的编码器并发时有限制，官网：https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new#Encoder

突破限制的方法，给驱动程序打补丁：（非官方）

GitHub - keylase/nvidia-patch: This patch removes restriction on maximum number of simultaneous NVENC video encoding sessions imposed by Nvidia to consumer-grade GPUs.

选择相应的驱动程序，打补丁

我的驱动程序版本是：510.68.02

下载对应的补丁。

1、sudo git clone https://github.com/keylase/nvidia-patch.git

2、sudo bash ./patch.sh

3、回滚（如果出问题可以回滚到原始版本）：

bash ./patch.sh -r

打补丁后进行测试，发现原先的报错消失了。多个视频均能够被正常编码。

参考网址：解决NVIDIA GeForce系列显卡NVENC并发Session数目限制问题_TracelessLe的博客-CSDN博客

===================华丽分割线=====================================

用的好好的nidia驱动找不到啦，

输入 nvidia-smi 提示：

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

在终端输入nvcc -V 提示cuda还在

解决方法：

1、sudo apt-get install dkms

2、sudo dkms install -m nvidia -v 510.68.02

其中510.68.02是原来的驱动版本号，如果不知道

cd /usr/src

然后再次输入 nvidia-smi

驱动回来啦！

===================华丽分割线=====================================

使用h264_nvenc编码，如下提示：

要看到上述提示需要关闭，关闭ffmpeg的日志级别，使用默认级别

[h264_nvenc @ 0x187e640] Driver does not support the required nvenc API version. Required: 12.0 Found: 11.1

原因：nv-codec-headers的版本太高了，nvidia driver的版本太低了，可以查看nv-codec-headers目录下的README:

这个是linux下nvidia驱动要求的最低版本

解决方法：1、升级nidia的驱动程序

2、找到适合nvidia驱动的nv-codec-headers,然后重新编译ffmpeg

==================华丽分割线=================================

多版本CUDA切换

1、查看当前使用的CUDA版本

nvcc -V

2、查看当前版本CUDA的位置

cd /usr/local

ls -lah

可以看到有个软连接cuda指向cuda-12.1

3、安装新的版本的CUDA,我这里安装CUDA 11.6

安装完cuda-11.6后，可以看到软连接cuda指向了cuda-11.6的目录

4、修改环境变量

sudo vim /etc/profile

上述如果已经是cuda目录就不用修改了

5、重启机器

sudo reboot

6、如果日后想切回到CUDA12.1版本，只需要把cuda这个软连接删除，新建一个cuda软连接即可

sudo rm -rf /usr/local/cuda

sudo ln -s /usr/local/cuda-12.1 /usr/local/cuda

7、重启机器后，重新查看当前CUDA版本，发现已经切换到了cuda11.6的版本

=========================华丽分割线============================

安装驱动出错，提示

ERROR: An NVIDIA kernel module 'nvidia-uvm' appears to already be loaded in your kernel.  This may be because it is 
in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if 
your kernel was configured without support for module unloading.  Please be sure to exit any programs that may be us
ing the GPU(s) before attempting to upgrade your driver.  If no GPU-based programs are running, you know that your k
ernel supports module unloading, and you still receive this message, then an error may have occurred that has corrup
ted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer.

查看内核模块：

lsmod | grep nvidia

卸载nvidia模块 rmmod nvidia

提示正在使用

查看那个模块再用

lsof /dev/nvidia*

kill掉，然后即可重新安装程序

========================华丽分割线======================================

如果因为内核升级导致驱动无法使用，有两种解决方法

1、升级显卡驱动程序

2、降低系统内核，可参考Ubuntu 内核版本降级-CSDN博客

========================华丽分割线======================================

Dell R730 安装GPU显卡（我的显卡是:RTX2080TI-A）后风扇转速高，噪声很大的解决方法

参考:https://www.dell.com/community/zh/conversations/poweredge%E6%9C%8D%E5%8A%A1%E5%99%A8/dell-r730xd-%E5%8A%A0%E8%A3%85pcie%E5%9B%BA%E6%80%81%E7%A1%AC%E7%9B%98-%E9%A3%8E%E6%89%87%E9%97%AE%E9%A2%98/647f6f17f4ccf8a8dec81a25

关闭第三方 PCI-E 检测（降低风扇转速）

具体方法：

1、sudo apt-get install ipmitool

2、关闭或者打开第三方检测PCE-E（主要是降低服务器风扇转速）

关闭:（降低风扇转速）

sudo ipmitool raw 0x30 0xce 0x00 0x16 0x05 0x00 0x00 0x00 0x05 0x00 0x01 0x00 0x00

打开: （打开全部风扇转速）

sudo ipmitool raw 0x30 0xce 0x00 0x16 0x05 0x00 0x00 0x00 0x05 0x00 0x00 0x00 0x00

更加详细的设置转速:

#关闭自动调速
sudo ipmitool raw 0x30 0x30 0x01 0x00

#开启风扇自动调节
sudo ipmitool raw 0x30 0x30 0x00 0x00

#设置为10%转速
sudo ipmitool  raw 0x30 0x30 0x02 0xff 0x0a
#设置为15%转速
sudo ipmitool  raw 0x30 0x30 0x02 0xff 0x0f
#设置为20%转速
sudo ipmitool  raw 0x30 0x30 0x02 0xff 0x14
#设置为25%转速
sudo ipmitool  raw 0x30 0x30 0x02 0xff 0x19
#设置为30%转速
sudo ipmitool  raw 0x30 0x30 0x02 0xff 0x1e
#设置为35%转速
sudo ipmitool  raw 0x30 0x30 0x02 0xff 0x23
#设置为40%转速
sudo ipmitool  raw 0x30 0x30 0x02 0xff 0x28
#设置为45%转速
sudo ipmitool  raw 0x30 0x30 0x02 0xff 0x2d
#设置为50%转速
sudo ipmitool  raw 0x30 0x30 0x02 0xff 0x32

# 最后的0x0a表示转速的百分比的十六进制，0a表示10%，0f表示15%。
# 设置百分比只需要修改后面两位就行 0x0f
0x00-0x09 这个表示1%-9%
# 如果设置更高只需要将最后两位数转换为16进制即可
# 比如设置为 58% 的转速，只需要修改为 0x3a 即可
# 具体计算可以百度下十进制转换十六进制