【WSL2笔记2】 搭建深度学习开发环境踩坑笔记 Ubuntu+CUDA+cuDNN+PyTorch+Tensorflow+ONNX

WSL2笔记2 搭建深度学习开发环境 Ubuntu+CUDA+cuDNN+PyTorch+Tensorflow+ONNX

1、Anaconda 安装环境配置 (系统级-管理各环境)

Anaconda官网版本档案
https://repo.anaconda.com/archive/

1.1 创建软件下载目录

cd ~
mkdir download
cd download

下载Anaconda安装包
wget https://repo.anaconda.com/archive/Anaconda3-2023.03-Linux-x86_64.sh
在这里插入图片描述

1.2 安装Anaconda

bash Anaconda3-2023.03-Linux-x86_64.sh

创建Python虚拟环境
conda create -n 名称 python=版本

激活环境
conda activate 名称
在这里插入图片描述

1.3错误的画蛇添足

设置Anaconda路径

$ vim ~/.bashrc

加入安装路径

 # Anaconda3
export PATH="/home/XXXX/anaconda3/bin:$PATH"
source activate

echo 'export PATH="~/anaconda3/bin:$PATH"' >> ~/.bashrc
echo 'source activate' >> ~/.bashrc

更新配置
source ~/.bashrc
错误的结果就是配置的所有虚拟环境都以base的python版本运行,无法配置每个虚拟环境使用不同python版本,失去了虚拟环境意义。

1.4 磁盘清理

定期进行缓存和依赖包的清理,解放磁盘空间。

  • 清理前
$ sudo du -sh /home/gpu/anaconda3/pkgs/
[sudo] password for gpu: 
174G    /home/gpu/anaconda3/pkgs/
  • 清理后
$ sudo du -sh /home/gpu/anaconda3/pkgs/
84G     /home/gpu/anaconda3/pkgs/

1.4.1 查看磁盘空间

df -hl

1.4.2 Apt-get清理

  • 清理下载缓存
    sudo apt-get clean
  • 清理不需要的依赖包
    sudo apt-get autoremove
  • 清理本地已卸载的包的依赖包
    sudo apt-get autoclean

1.4.3 AnaConda3清理

  • 统计conda空间占用
    sudo du -sh ~/anaconda3/*
  • 清除索引缓存、未使用缓存包,不影响已创建的环境
    conda clean -a

2、NVIDIA Driver (系统级-各环境共享)

2.1 官网

https://www.nvidia.com/download/index.aspx?lang=en-us

在这里插入图片描述

2.2 安装win10版本NVIDIA驱动

在这里插入图片描述

2.3 查看Nvidia-cuda

nvidia-smi

在这里插入图片描述

不要在 WSL 中安装任何 Linux 显卡驱动程序

https://docs.nvidia.cn/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl-2

2.4 Ubuntu 生产环境掉驱动问题 Failed to initialize NVML: Driver/library version mismatch

2.4.1 nvidia-smi

生产环境:V100x4
系统版本:Ubuntu 22.04
凌晨还在用watch显示使用状态

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-16GB           Off | 00000000:00:08.0 Off |                    0 |
| N/A   47C    P0             184W / 300W |   6945MiB / 16384MiB |     75%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2-16GB           Off | 00000000:00:09.0 Off |                    0 |
| N/A   45C    P0             249W / 300W |   7863MiB / 16384MiB |     91%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2-16GB           Off | 00000000:00:0A.0 Off |                    0 |
| N/A   45C    P0             194W / 300W |   7983MiB / 16384MiB |     75%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2-16GB           Off | 00000000:00:0B.0 Off |                    0 |
| N/A   35C    P0              41W / 300W |      0MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   1548534      C   python                                     6942MiB |
|    1   N/A  N/A   1548535      C   python                                     7860MiB |
|    2   N/A  N/A   1548536      C   python                                     7980MiB |
+---------------------------------------------------------------------------------------+

中午就发现这样了

$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.104

不管是nvtop还是nvitop还是gpustat都不管用

2.4.2 查看一番

  • 查看硬件
$ lspci | grep -i nvidia
00:08.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
00:09.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
00:0a.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
00:0b.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
  • 查看内核版本
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  535.86.10  Wed Jul 26 23:20:03 UTC 2023
GCC version:  gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04.1) 
  • 查看显卡驱动
$ dpkg -l | grep nvidia
ii  gpustat                               0.6.0-1                                     all          pretty nvidia device monitor
iU  libnvidia-cfg1-535:amd64              535.104.05-0ubuntu0.22.04.4                 amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-common-535                  535.86.10-0ubuntu1                          all          Shared files used by the NVIDIA libraries
iU  libnvidia-compute-535:amd64           535.104.05-0ubuntu0.22.04.4                 amd64        NVIDIA libcompute package
iU  libnvidia-decode-535:amd64            535.104.05-0ubuntu0.22.04.4                 amd64        NVIDIA Video Decoding runtime libraries
iU  libnvidia-encode-535:amd64            535.104.05-0ubuntu0.22.04.4                 amd64        NVENC Video Encoding runtime libraryiU  libnvidia-extra-535:amd64             535.104.05-0ubuntu0.22.04.4                 amd64        Extra libraries for the NVIDIA driver
iU  libnvidia-fbc1-535:amd64              535.104.05-0ubuntu0.22.04.4                 amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-gl-535:amd64                535.86.10-0ubuntu1                          amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
iU  nvidia-compute-utils-535              535.104.05-0ubuntu0.22.04.4                 amd64        NVIDIA compute utilities
iU  nvidia-dkms-535                       535.104.05-0ubuntu0.22.04.4                 amd64        NVIDIA DKMS package
iU  nvidia-driver-535                     535.104.05-0ubuntu0.22.04.4                 amd64        NVIDIA driver metapackage
iU  nvidia-firmware-535-535.104.05        535.104.05-0ubuntu0.22.04.4                 amd64        Firmware files used by the kernel module
ii  nvidia-kernel-common-535              535.86.10-0ubuntu1                          amd64        Shared files used with the kernel module
iU  nvidia-kernel-source-535              535.104.05-0ubuntu0.22.04.4                 amd64        NVIDIA kernel source package
ii  nvidia-modprobe                       535.86.10-0ubuntu1                          amd64        Load the NVIDIA kernel driver and create device files
ii  nvidia-prime                          0.8.17.1                                    all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                       535.86.10-0ubuntu1                          amd64        Tool for configuring the NVIDIA graphics driver
iU  nvidia-utils-535                      535.104.05-0ubuntu0.22.04.4                 amd64        NVIDIA driver support binaries
ii  screen-resolution-extra               0.18.2                                      all          Extension for the nvidia-settings control panel
iU  xserver-xorg-video-nvidia-535         535.104.05-0ubuntu0.22.04.4                 amd64        NVIDIA binary Xorg driver
  • 查看驱动日志
$ cat /proc/driver/nvidia/version
2023-09-27 06:18:38 upgrade nvidia-driver-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 status half-configured nvidia-driver-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked nvidia-driver-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status half-installed nvidia-driver-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked nvidia-driver-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 upgrade libnvidia-gl-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 status half-configured libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status half-installed libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status installed libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 upgrade nvidia-dkms-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 status half-configured nvidia-dkms-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-dkms-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status half-installed nvidia-dkms-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-dkms-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 upgrade nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 status half-configured nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status half-installed nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-kernel-source-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 install nvidia-firmware-535-535.104.05:amd64 <none> 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 status half-installed nvidia-firmware-535-535.104.05:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status unpacked nvidia-firmware-535-535.104.05:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 upgrade nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status half-configured nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status half-installed nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status installed nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 upgrade libnvidia-decode-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status half-configured libnvidia-decode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked libnvidia-decode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status half-installed libnvidia-decode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked libnvidia-decode-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 upgrade libnvidia-compute-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status half-configured libnvidia-compute-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked libnvidia-compute-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status half-installed libnvidia-compute-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-compute-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-extra-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-extra-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-extra-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-extra-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-extra-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-compute-utils-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-encode-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-encode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-encode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-encode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-encode-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade nvidia-utils-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured nvidia-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed nvidia-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-utils-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked xserver-xorg-video-nvidia-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-fbc1-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-cfg1-535:amd64 535.104.05-0ubuntu0.22.04.4

2023-09-27 06:18:38 upgrade nvidia-driver-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
原来是偷偷升级了535.86.10 -> 535.104.05,NVIDIA 内核驱动版本与系统驱动不一致

2.4.2 停止nvidia更新 以免生产环境突然掉驱动

sudo apt-mark hold nvidia-driver-版本

$ sudo apt-mark hold  nvidia-driver-535
nvidia-driver-535 set on hold.

2.4.3 关闭所有软件包自动更新

考虑生产环境保持软件和环境稳定,关闭软件包自动更新
sudo dpkg-reconfigure unattended-upgrades

$ sudo dpkg-reconfigure unattended-upgrades
Replacing config file /etc/apt/apt.conf.d/20auto-upgrades with new version

在这里插入图片描述选择No,不同意自动下载并安装稳定版软件升级

3、CUDA Toolkit (系统级-各环境共享)

3.1 CUDA Toolkit 官网

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=deb_local

在这里插入图片描述

历史版本
https://developer.nvidia.com/cuda-toolkit-archive

WSL 上的 CUDA 用户指南
https://docs.nvidia.cn/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl-2

3.2基本安装

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

3.3 GPG Key报错

W: GPG error: file:/var/cuda-repo-wsl-ubuntu-12-1-local  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY CDD5140FF7B46061
E: The repository 'file:/var/cuda-repo-wsl-ubuntu-12-1-local  InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

在这里插入图片描述

删除GPG key
sudo apt-key del 7fa2af80
安装GPG key
sudo cp /var/cuda-repo-wsl-ubuntu-12-1-local/cuda-F7B46061-keyring.gpg /usr/share/keyrings/
在这里插入图片描述

3.4 查看CUDA状态

nvcc -V

3.5 Command ‘nvcc’ not found

编辑路径配置
vim ~/.bashrc
加入系统路径

export LD_LIBRARY_PATH=LD_LIBRARY_PATH:/usr/local/cuda/lib64
export PATH=$PATH:/usr/local/cuda/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda

echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"' >> ~/.bashrc
echo 'export PATH="$PATH:/usr/local/cuda/bin"' >> ~/.bashrc
echo 'export CUDA_HOME="$CUDA_HOME:/usr/local/cuda"'>> ~/.bashrc

更新配置
source ~/.bashrc

3.6 关于官方CUDA版本与虚拟环境cudatoolkit版本的关系与区别

3.6.1 安装方法不同

  • 官方提供的CUDA(Toolkit)
    wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
    sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
    wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
    sudo dpkg -i cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
    sudo cp /var/cuda-repo-wsl-ubuntu-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
    sudo apt-get update
    sudo apt-get -y install cuda
  • Conda提供子环境方法cudatoolkit
    conda install cudatoolkit=10.0 -c pytorch

3.6.2 实现不同版本的cuda开发环境

  • 安装官方CUDA Toolkit,选用与显卡驱动匹配的最新版,它向下兼容
    它提供用于创建高性能 GPU 加速应用程序的完整开发环境,包括 GPU 加速库、调试和优化工具、C/C++ 编译器以及用于部署应用程序的运行时库。
  • 安装虚拟子环境CUDA Toolkit 的版本不能高于主环境中的官方CUDA版本
    为了匹配子环境其他软件版本,在虚拟子环境中安装的其他版本CUDA toolkit,属于运行时库等动态链接库,用于调用CUDA功能。

4、 cuDNN GPU加速的深度神经网络原语库 (系统级-各环境共享)

4.1官网

https://developer.nvidia.com/rdp/cudnn-archive
需要注册账号登录下载
在这里插入图片描述

4.2 通过SSH传送cuDDN安装包到WSL

在这里插入图片描述WSL2安装SSH服务请参考 这里

4.3 安装zliblg

sudo apt-get install zlib1g

(base) fb@VP01:~/download$ conda activate modelscope
(modelscope) fb@VP01:~/download$ sudo apt-get install zlib1g
[sudo] password for fb:
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
zlib1g is already the newest version (1:1.2.11.dfsg-2ubuntu9.2).
zlib1g set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 53 not upgraded.

4.4 安装cuDDN

4.4.1 启用本地存储库

sudo dpkg -i cudnn-local-repo-ubuntu2204-8.8.1.3_1.0-1_amd64.deb

[sudo] password for fb:
Selecting previously unselected package cudnn-local-repo-ubuntu2204-8.8.1.3.
(Reading database ... 40179 files and directories currently installed.)
Preparing to unpack cudnn-local-repo-ubuntu2204-8.8.1.3_1.0-1_amd64.deb ...
Unpacking cudnn-local-repo-ubuntu2204-8.8.1.3 (1.0-1) ...
Setting up cudnn-local-repo-ubuntu2204-8.8.1.3 (1.0-1) ...

The public cudnn-local-repo-ubuntu2204-8.8.1.3 GPG key does not appear to be installed.
To install the key, run this command:
sudo cp /var/cudnn-local-repo-ubuntu2204-8.8.1.3/cudnn-local-DB35EEEE-keyring.gpg /usr/share/keyrings/

4.4.2 导入 CUDA GPG 密钥

sudo cp /var/cudnn-local-repo-ubuntu2204-8.8.1.3/cudnn-local-DB35EEEE-keyring.gpg /usr/share/keyrings/
注意: key的导入命令从上步骤最后一行获取

The public cudnn-local-repo-ubuntu2204-8.8.1.3 GPG key does not appear to be installed.
To install the key, run this command:
sudo cp /var/cudnn-local-repo-ubuntu2204-8.8.1.3/cudnn-local-DB35EEEE-keyring.gpg /usr/share/keyrings/

4.4.3 刷新存储库元数据

sudo apt-get update

4.4.4 安装运行时库

sudo apt-get install libcudnn8=8.8.1.3-1+cuda12.1

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package libcudnn8 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Version '8.8.1.3-1+cuda12.1' for 'libcudnn8' was not found

E: Version ‘8.8.1.3-1+cuda12.1’ for ‘libcudnn8’ was not found
找不到8.8.1.3-1+cuda12.1结合的安装包

ll /var/cudnn-local-repo-ubuntu2204-8.8.1.3/

(modelscope) fb@VP01:~/download$ ll /var
total 68
drwxr-xr-x 15 root root    4096 Apr  7 00:43 ./
drwxr-xr-x 19 root root    4096 Apr  6 22:50 ../
drwxr-xr-x  2 root root    4096 Apr 18  2022 backups/
drwxr-xr-x 11 root root    4096 Apr  6 23:06 cache/
drwxrwxrwt  2 root root    4096 Feb 11 05:36 crash/
drwxr-xr-x  2 root root   12288 Apr  5 11:52 cuda-repo-wsl-ubuntu-12-1-local/
drwxr-xr-x  2 root root    4096 Apr  7 00:43 cudnn-local-repo-ubuntu2204-8.8.1.3/
drwxr-xr-x 28 root root    4096 Feb 11 05:36 lib/
drwxrwsr-x  2 root staff   4096 Apr 18  2022 local/
lrwxrwxrwx  1 root root       9 Feb 11 05:35 lock -> /run/lock/
drwxrwxr-x  7 root syslog  4096 Apr  5 11:08 log/
drwxrwsr-x  2 root backup  4096 Feb 11 05:35 mail/
drwxr-xr-x  2 root root    4096 Feb 11 05:35 opt/
lrwxrwxrwx  1 root root       4 Feb 11 05:35 run -> /run/
drwxr-xr-x  7 root root    4096 Feb 11 05:36 snap/
drwxr-xr-x  4 root root    4096 Feb 11 05:35 spool/
drwxrwxrwt  2 root root    4096 Apr  5 22:48 tmp/
(modelscope) fb@VP01:~/download$ ll /var/cudnn-local-repo-ubuntu2204-8.8.1.3/
total 872792
drwxr-xr-x  2 root root      4096 Apr  7 00:43 ./
drwxr-xr-x 15 root root      4096 Apr  7 00:43 ../
-rw-r--r--  1 root root      1662 Mar  2 04:21 DB35EEEE.pub
-rw-r--r--  1 root root      1575 Mar  2 04:21 InRelease
-rw-r--r--  1 root root      1930 Mar  2 04:21 Local.md5
-rw-r--r--  1 root root       836 Mar  2 04:21 Local.md5.gpg
-rw-r--r--  1 root root      2114 Mar  2 04:21 Packages
-rw-r--r--  1 root root       947 Mar  2 04:21 Packages.gz
-rw-r--r--  1 root root       690 Mar  2 04:21 Release
-rw-r--r--  1 root root       836 Mar  2 04:21 Release.gpg
-rw-r--r--  1 root root      1141 Mar  2 04:21 cudnn-local-DB35EEEE-keyring.gpg
-rw-r--r--  1 root root 440032208 Mar  2 04:21 libcudnn8-dev_8.8.1.3-1+cuda12.0_amd64.deb
-rw-r--r--  1 root root   1664314 Mar  2 04:21 libcudnn8-samples_8.8.1.3-1+cuda12.0_amd64.deb
-rw-r--r--  1 root root 451984894 Mar  2 04:21 libcudnn8_8.8.1.3-1+cuda12.0_amd64.deb

找到正确的包名,完美解决 ‘libcudnn8’ was not found’
sudo apt-get install libcudnn8=8.8.1.3-1+cuda12.0

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  libcudnn8
0 upgraded, 1 newly installed, 0 to remove and 53 not upgraded.
Need to get 0 B/452 MB of archives.
After this operation, 1152 MB of additional disk space will be used.
Get:1 file:/var/cudnn-local-repo-ubuntu2204-8.8.1.3  libcudnn8 8.8.1.3-1+cuda12.0 [452 MB]
Selecting previously unselected package libcudnn8.
(Reading database ... 40195 files and directories currently installed.)
Preparing to unpack .../libcudnn8_8.8.1.3-1+cuda12.0_amd64.deb ...
Unpacking libcudnn8 (8.8.1.3-1+cuda12.0) ...
Setting up libcudnn8 (8.8.1.3-1+cuda12.0) ...

4.4.5 安装开发者库

sudo apt-get install libcudnn8-dev=8.8.1.3-1+cuda12.0

4.4.6 安装代码示例和cuDNN 库文档

sudo apt-get install libcudnn8-samples=8.8.1.3-1+cuda12.0

4.5 验证cuDNN

cp -r /usr/src/cudnn_samples_v8/ $HOME
cd  $HOME/cudnn_samples_v8/mnistCUDNN
make clean && make
./mnistCUDNN

4.5.1 test.c:1:10: fatal error: FreeImage.h: No such file or directory

遇到报错

rm -rf *o
rm -rf mnistCUDNN
CUDA_VERSION is 12010
Linking agains cublasLt = true
CUDA VERSION: 12010
TARGET ARCH: x86_64
HOST_ARCH: x86_64
TARGET OS: linux
SMS: 50 53 60 61 62 70 72 75 80 86 87 90
test.c:1:10: fatal error: FreeImage.h: No such file or directory
    1 | #include "FreeImage.h"
      |          ^~~~~~~~~~~~~
compilation terminated.

安装缺失FreeImage模块

sudo apt-get install libfreeimage3 libfreeimage-dev

4.5.2 nvcc fatal : Unsupported gpu architecture ‘compute_35’ 算力不支持

rm -rf *o
rm -rf mnistCUDNN
CUDA_VERSION is 12030
Linking agains cublasLt = true
CUDA VERSION: 12030
TARGET ARCH: x86_64
HOST_ARCH: x86_64
TARGET OS: linux
SMS: 35 50 53 60 61 62 70 72 75 80 86 87
/usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include  -ccbin g++ -m64    -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_87,code=compute_87 -o fp16_dev.o -c fp16_dev.cu
nvcc fatal   : Unsupported gpu architecture 'compute_35'
make: *** [Makefile:241: fp16_dev.o] Error 1

编辑Makefile 禁用35
sudo vi Makefile

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
W10的WSL(SubSystem for Linux)功能可以让Windows系统下安装一个兼容Linux环境,其中包括Ubuntu,提供给开发者更为便捷的开发环境。本文将介绍搭建WSL Ubuntu开发环境的详细步骤。 首先,需要在Windows 10系统中启用WSL功能。具体操作为:进入“控制面板”-“程序”-“启用或关闭Windows功能”-勾选“适用于Linux的Windows子系统”并应用更改。 搭建WSL Ubuntu环境,也分为几个步骤。首先,在Microsoft store搜索Ubuntu并下载安装。安装完成后打开,系统会提示输入用户名和密码,这里输入的是为Ubuntu配置的用户名和密码。接着,建议及时更新系统,输入以下指令: ``` sudo apt update sudo apt upgrade ``` 之后就可以开始搭建环境了,可以根据需要安装和配置如下开发环境: 1. 安装python开发环境 输入以下指令: ``` sudo apt install python3 sudo apt install python3-pip ``` 2. 安装java开发环境 输入以下指令: ``` sudo apt install default-jre sudo apt install default-jdk ``` 3. 安装node.js环境 输入以下指令: ``` curl -sL https://deb.nodesource.com/setup_14.x | sudo -E bash - sudo apt-get install -y nodejs ``` 4. 安装vscode以及其它开发工具 在Windows系统中下载安装vscode,然后打开,按Ctrl+Shift+P调出命令面板,输入“WSL: 打开新的终端”,就可以在Ubuntu中打开vscode,实现兼容开发。 总结起来,搭建WSL的Ubuntu开发环境需要在Windows系统中启用WSL功能,下载安装Ubuntu,配置环境并安装开发工具。相比于直接在Windows系统中搭建开发环境,WSL Ubuntu更能提升开发效率,同时在Windows和Linux之间切换也更加方便,对于需要同时开发两个系统的开发者来说是一个不错的选择。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值