ubuntu 22.04下面安装cuda、cudnn等的配置过程

最新推荐文章于 2024-05-20 18:41:21 发布

peihexian

最新推荐文章于 2024-05-20 18:41:21 发布

阅读量339

点赞数 5

文章标签： ubuntu linux 运维

本文链接：https://blog.csdn.net/peihexian/article/details/138714384

版权

一、正常安装ubuntu 22.04系统，安装以后sudo apt update,sudo apt upgrade更新软件到最新版。

二、安装cuda

到下面的地址去下载cuda离线安装包，根据cpu指令集架构等选择正确的选项：

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local

出来的选项内容如下所示：

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4

第一个wget下载的内容不大，没有必要单独下载，第二个wget下载的安装包大概3.6G左右，需要用迅雷等下载加速，下载好以后按照以上的命令顺序执行即可。

sudo apt-get install -y cuda-drivers

三、安装cudnn

cudnn的安装过程与安装与cuda的安装过程类似，打开下面的网址并根据实际情况选择合适的选项：

https://developer.nvidia.com/cudnn-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local

wget https://developer.download.nvidia.com/compute/cudnn/9.1.1/local_installers/cudnn-local-repo-ubuntu2204-9.1.1_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2204-9.1.1_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2204-9.1.1/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudnn

四、安装完成以后配置一下系统的环境变量，导出cuda和cudnn相关头文件与库文件编译搜索路径

sudo vi /etc/profile

export PATH=/usr/local/cuda/bin:$PATH
export CPATH=$CPATH:/usr/include:/usr/local/cuda/include
export LIBRARY_PATH=$LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/cuda/lib64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/local/cuda/lib64

source /etc/profile

五：验证

执行nvidia-smi验证显卡驱动安装是否正确，正确的话应该输出类似下面这样：

Sat May 11 15:27:56 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 9999 ...    Off |   00000000:01:00.0  On |                  N/A |
| 30%   35C    P8             20W /  250W |      64MiB / 102400MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1035      G   /usr/lib/xorg/Xorg                             54MiB |
|    0   N/A  N/A      1106      G   /usr/bin/gnome-shell                            7MiB |
+-----------------------------------------------------------------------------------------+

如果安装的驱动有问题，可能会出现下面这样的提示：

root@server:~# nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 550.54

此时的处理方案是：

sudo apt-get remove --purge '^nvidia-.*'
sudo rm /etc/modprobe.d/blacklist-nvidia.conf
sudo rm /lib/modprobe.d/blacklist-nvidia.conf
apt-get update
apt-get install nvidia-driver-550

安装指定版本的驱动以后reboot重启系统应该就没问题了。

执行nvcc --version验证cuda版本，正常情况下输出如下：

sudo nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

cudnn版本验证不是很方便，可以写个小程序验证一下，例如新建main.cpp，内容如下：

#include <cudnn.h>
#include <iostream>

int main() {
    std::cout << "cuDNN Version: " << cudnnGetVersion() << std::endl;
    return 0;
}

 g++ main.cpp -o cudnn_test -I/usr/include -I/usr/local/cuda/include -L/usr/lib/x86_64-linux-gnu -L/usr/local/cuda/lib64 -lcudnn -lcudart

正常情况下应该可以编译得到cudnn_test的可执行文件，执行以后可以打印cudnn的版本信息：

./cudnn_test
cuDNN Version: 90101

六、安装对docker的gpu支持

根据docker官方文档资料正常安装docker，这里跳过docker的安装过程记录，以下内容是安装对docker的gpu支持，使docker实例可以访问宿主机gpu算力资源，依次执行以下命令：

# 设置稳定版仓库和 GPG 密钥
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# 安装 nvidia-docker2 包并重启 Docker 服务
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker

安装完成以后拉取一个官方镜像访问一下gpu算力资源，

 docker pull nvidia/cuda:12.4.1-cudnn-runtime-ubuntu20.04

docker镜像拉回来以后执行一下试试：

 docker run --gpus all nvidia/cuda:12.4.1-cudnn-runtime-ubuntu20.04 nvidia-smi

正常情况应该输出如下内容：


==========
== CUDA ==
==========

CUDA Version 12.4.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Sat May 11 07:46:25 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX xxxx ...    Off |   00000000:01:00.0  On |                  N/A |
| 30%   36C    P8             21W /  250W |      64MiB /   xxxxMiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

这样就彻底安装完了。最后说一句，盗取文章的死全家，关键是还有傻子私信骚扰我说我盗取别人的文章，本文首发于http://blog.csdn.net/peihexian