WSL2笔记2 搭建深度学习开发环境 Ubuntu+CUDA+cuDNN+PyTorch+Tensorflow+ONNX
- 1、Anaconda 安装环境配置 (系统级-管理各环境)
- 2、NVIDIA Driver (系统级-各环境共享)
- 3、CUDA Toolkit (系统级-各环境共享)
- 4、 cuDNN GPU加速的深度神经网络原语库 (系统级-各环境共享)
- 5、深度学习框架
-
- 5.1 PyTorch (环境级-各环境独立)
- 5.2 Tensorflow (环境级-各环境独立)
-
- 5.2.1官网
- 5.2.2 Python37 安装tensorflow 2.x
- 5.2.3 Python37 安装旧版tensorflow 1.15
- 5.2.4 验证安装结果
- 5.2.5 运行报错
-
- 5.2.5.1 Could not load dynamic library 'libcudart.so.10.0'
- 5.2.5.2 Could not load dynamic library 'libcudnn.so.7'
- 5.2.5.3 /usr/local/cuda/targets/x86_64-linux/lib/libcudnn.so.8 is not a symbolic link
- 5.2.5.4 error code is libcuda.so: cannot open shared object file
- 5.2.5.5 运行报错,protobuft版本不匹配
- 5.2.5.6 This TensorFlow binary is optimized with Intel(R) MKL-DNN
- 5.2.5.6 WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
- 5.3 ONNX Runtime (ORT) (环境级-各环境独立)
- 5.4 Xformers (环境级-各环境独立)
- 6、实测同一环境共存版本匹配方案
1、Anaconda 安装环境配置 (系统级-管理各环境)
Anaconda官网版本档案
https://repo.anaconda.com/archive/
1.1 创建软件下载目录
cd ~
mkdir download
cd download
下载Anaconda安装包
wget https://repo.anaconda.com/archive/Anaconda3-2023.03-Linux-x86_64.sh
1.2 安装Anaconda
bash Anaconda3-2023.03-Linux-x86_64.sh
创建Python虚拟环境
conda create -n 名称 python=版本
激活环境
conda activate 名称
1.3错误的画蛇添足
设置Anaconda路径
$ vim ~/.bashrc
加入安装路径
# Anaconda3
export PATH="/home/XXXX/anaconda3/bin:$PATH"
source activate
或
echo 'export PATH="~/anaconda3/bin:$PATH"' >> ~/.bashrc
echo 'source activate' >> ~/.bashrc
更新配置
source ~/.bashrc
错误的结果就是配置的所有虚拟环境都以base的python版本运行,无法配置每个虚拟环境使用不同python版本,失去了虚拟环境意义。
1.4 磁盘清理
定期进行缓存和依赖包的清理,解放磁盘空间。
- 清理前
$ sudo du -sh /home/gpu/anaconda3/pkgs/
[sudo] password for gpu:
174G /home/gpu/anaconda3/pkgs/
- 清理后
$ sudo du -sh /home/gpu/anaconda3/pkgs/
84G /home/gpu/anaconda3/pkgs/
1.4.1 查看磁盘空间
df -hl
1.4.2 Apt-get清理
- 清理下载缓存
sudo apt-get clean
- 清理不需要的依赖包
sudo apt-get autoremove
- 清理本地已卸载的包的依赖包
sudo apt-get autoclean
1.4.3 AnaConda3清理
- 统计conda空间占用
sudo du -sh ~/anaconda3/*
- 清除索引缓存、未使用缓存包,不影响已创建的环境
conda clean -a
2、NVIDIA Driver (系统级-各环境共享)
2.1 官网
https://www.nvidia.com/download/index.aspx?lang=en-us
2.2 安装win10版本NVIDIA驱动
2.3 查看Nvidia-cuda
nvidia-smi
不要在 WSL 中安装任何 Linux 显卡驱动程序
https://docs.nvidia.cn/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl-2
2.4 Ubuntu 生产环境掉驱动问题 Failed to initialize NVML: Driver/library version mismatch
2.4.1 nvidia-smi
生产环境:V100x4
系统版本:Ubuntu 22.04
凌晨还在用watch显示使用状态
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla V100-SXM2-16GB Off | 00000000:00:08.0 Off | 0 |
| N/A 47C P0 184W / 300W | 6945MiB / 16384MiB | 75% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2-16GB Off | 00000000:00:09.0 Off | 0 |
| N/A 45C P0 249W / 300W | 7863MiB / 16384MiB | 91% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2-16GB Off | 00000000:00:0A.0 Off | 0 |
| N/A 45C P0 194W / 300W | 7983MiB / 16384MiB | 75% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2-16GB Off | 00000000:00:0B.0 Off | 0 |
| N/A 35C P0 41W / 300W | 0MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1548534 C python 6942MiB |
| 1 N/A N/A 1548535 C python 7860MiB |
| 2 N/A N/A 1548536 C python 7980MiB |
+---------------------------------------------------------------------------------------+
中午就发现这样了
$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.104
不管是nvtop还是nvitop还是gpustat都不管用
2.4.2 查看一番
- 查看硬件
$ lspci | grep -i nvidia
00:08.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
00:09.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
00:0a.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
00:0b.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev a1)
- 查看内核版本
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.86.10 Wed Jul 26 23:20:03 UTC 2023
GCC version: gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04.1)
- 查看显卡驱动
$ dpkg -l | grep nvidia
ii gpustat 0.6.0-1 all pretty nvidia device monitor
iU libnvidia-cfg1-535:amd64 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA binary OpenGL/GLX configuration library
ii libnvidia-common-535 535.86.10-0ubuntu1 all Shared files used by the NVIDIA libraries
iU libnvidia-compute-535:amd64 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA libcompute package
iU libnvidia-decode-535:amd64 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA Video Decoding runtime libraries
iU libnvidia-encode-535:amd64 535.104.05-0ubuntu0.22.04.4 amd64 NVENC Video Encoding runtime libraryiU libnvidia-extra-535:amd64 535.104.05-0ubuntu0.22.04.4 amd64 Extra libraries for the NVIDIA driver
iU libnvidia-fbc1-535:amd64 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-gl-535:amd64 535.86.10-0ubuntu1 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
iU nvidia-compute-utils-535 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA compute utilities
iU nvidia-dkms-535 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA DKMS package
iU nvidia-driver-535 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA driver metapackage
iU nvidia-firmware-535-535.104.05 535.104.05-0ubuntu0.22.04.4 amd64 Firmware files used by the kernel module
ii nvidia-kernel-common-535 535.86.10-0ubuntu1 amd64 Shared files used with the kernel module
iU nvidia-kernel-source-535 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA kernel source package
ii nvidia-modprobe 535.86.10-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device files
ii nvidia-prime 0.8.17.1 all Tools to enable NVIDIA's Prime
ii nvidia-settings 535.86.10-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver
iU nvidia-utils-535 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA driver support binaries
ii screen-resolution-extra 0.18.2 all Extension for the nvidia-settings control panel
iU xserver-xorg-video-nvidia-535 535.104.05-0ubuntu0.22.04.4 amd64 NVIDIA binary Xorg driver
- 查看驱动日志
$ cat /proc/driver/nvidia/version
2023-09-27 06:18:38 upgrade nvidia-driver-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 status half-configured nvidia-driver-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked nvidia-driver-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status half-installed nvidia-driver-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked nvidia-driver-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 upgrade libnvidia-gl-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 status half-configured libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status half-installed libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status unpacked libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 status installed libnvidia-gl-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:38 upgrade nvidia-dkms-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 status half-configured nvidia-dkms-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-dkms-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status half-installed nvidia-dkms-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-dkms-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 upgrade nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 status half-configured nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status half-installed nvidia-kernel-source-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:46 status unpacked nvidia-kernel-source-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 install nvidia-firmware-535-535.104.05:amd64 <none> 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:46 status half-installed nvidia-firmware-535-535.104.05:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status unpacked nvidia-firmware-535-535.104.05:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 upgrade nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status half-configured nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status half-installed nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status installed nvidia-kernel-common-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 upgrade libnvidia-decode-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status half-configured libnvidia-decode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked libnvidia-decode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status half-installed libnvidia-decode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked libnvidia-decode-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 upgrade libnvidia-compute-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:47 status half-configured libnvidia-compute-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status unpacked libnvidia-compute-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:47 status half-installed libnvidia-compute-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-compute-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-extra-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-extra-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-extra-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-extra-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-extra-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed nvidia-compute-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-compute-utils-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-encode-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-encode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-encode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-encode-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-encode-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade nvidia-utils-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured nvidia-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed nvidia-utils-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked nvidia-utils-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed xserver-xorg-video-nvidia-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked xserver-xorg-video-nvidia-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-fbc1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-fbc1-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 upgrade libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:48 status half-configured libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status half-installed libnvidia-cfg1-535:amd64 535.86.10-0ubuntu1
2023-09-27 06:18:48 status unpacked libnvidia-cfg1-535:amd64 535.104.05-0ubuntu0.22.04.4
2023-09-27 06:18:38 upgrade nvidia-driver-535:amd64 535.86.10-0ubuntu1 535.104.05-0ubuntu0.22.04.4
原来是偷偷升级了535.86.10 -> 535.104.05,NVIDIA 内核驱动版本与系统驱动不一致
2.4.2 停止nvidia更新 以免生产环境突然掉驱动
sudo apt-mark hold nvidia-driver-版本
$ sudo apt-mark hold nvidia-driver-535
nvidia-driver-535 set on hold.
2.4.3 关闭所有软件包自动更新
考虑生产环境保持软件和环境稳定,关闭软件包自动更新
sudo dpkg-reconfigure unattended-upgrades
$ sudo dpkg-reconfigure unattended-upgrades
Replacing config file /etc/apt/apt.conf.d/20auto-upgrades with new version
选择No,不同意自动下载并安装稳定版软件升级
3、CUDA Toolkit (系统级-各环境共享)
3.1 CUDA Toolkit 官网
历史版本
https://developer.nvidia.com/cuda-toolkit-archive
WSL 上的 CUDA 用户指南
https://docs.nvidia.cn/cuda/wsl-user-guide/index.html#getting-started-with-cuda-on-wsl-2
3.2基本安装
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
3.3 GPG Key报错
W: GPG error: file:/var/cuda-repo-wsl-ubuntu-12-1-local InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY CDD5140FF7B46061
E: The repository 'file:/var/cuda-repo-wsl-ubuntu-12-1-local InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
删除GPG key
sudo apt-key del 7fa2af80
安装GPG key
sudo cp /var/cuda-repo-wsl-ubuntu-12-1-local/cuda-F7B46061-keyring.gpg /usr/share/keyrings/
3.4 查看CUDA状态
nvcc -V
3.5 Command ‘nvcc’ not found
编辑路径配置
vim ~/.bashrc
加入系统路径
export LD_LIBRARY_PATH=LD_LIBRARY_PATH:/usr/local/cuda/lib64
export PATH=$PATH:/usr/local/cuda/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda
或
echo 'export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"' >> ~/.bashrc
echo 'export PATH="$PATH:/usr/local/cuda/bin"' >> ~/.bashrc
echo 'export CUDA_HOME="$CUDA_HOME:/usr/local/cuda"'>> ~/.bashrc
更新配置
source ~/.bashrc
3.6 关于官方CUDA版本与虚拟环境cudatoolkit版本的关系与区别
3.6.1 安装方法不同
- 官方提供的CUDA(Toolkit)
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-1-local_12.1.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
- Conda提供子环境方法cudatoolkit
conda install cudatoolkit=10.0 -c pytorch
3.6.2 实现不同版本的cuda开发环境
- 安装官方CUDA Toolkit,选用与显卡驱动匹配的最新版,它向下兼容
它提供用于创建高性能 GPU 加速应用程序的完整开发环境,包括 GPU 加速库、调试和优化工具、C/C++ 编译器以及用于部署应用程序的运行时库。
- 安装虚拟子环境CUDA Toolkit 的版本不能高于主环境中的官方CUDA版本
为了匹配子环境其他软件版本,在虚拟子环境中安装的其他版本CUDA toolkit,属于运行时库等动态链接库,用于调用CUDA功能。
4、 cuDNN GPU加速的深度神经网络原语库 (系统级-各环境共享)
4.1官网
https://developer.nvidia.com/rdp/cudnn-archive
需要注册账号登录下载
4.2 通过SSH传送cuDDN安装包到WSL
WSL2安装SSH服务请参考 这里
4.3 安装zliblg
sudo apt-get install zlib1g
(base) fb@VP01:~/download$ conda activate modelscope
(modelscope) fb@VP01:~/download$ sudo apt-get install zlib1g
[sudo] password for fb:
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
zlib1g is already the newest version (1:1.2.11.dfsg-2ubuntu9.2).
zlib1g set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 53 not upgraded.
4.4 安装cuDDN
4.4.1 启用本地存储库
sudo dpkg -i cudnn-local-repo-ubuntu2204-8.8.1.3_1.0-1_amd64.deb
[sudo] password for fb:
Selecting previously unselected package cudnn-local-repo-ubuntu2204-8.8.1.3.
(Reading database ... 40179 files and directories currently installed.)
Preparing to unpack cudnn-local-repo-ubuntu2204-8.8.1.3_1.0-1_amd64.deb ...
Unpacking cudnn-local-repo-ubuntu2204-8.8.1.3 (1.0-1) ...
Setting up cudnn-local-repo-ubuntu2204-8.8.1.3 (1.0-1) ...
The public cudnn-local-repo-ubuntu2204-8.8.1.3 GPG key does not appear to be installed.
To install the key, run this command:
sudo cp /var/cudnn-local-repo-ubuntu2204-8.8.1.3/cudnn-local-DB35EEEE-keyring.gpg /usr/share/keyrings/
4.4.2 导入 CUDA GPG 密钥
sudo cp /var/cudnn-local-repo-ubuntu2204-8.8.1.3/cudnn-local-DB35EEEE-keyring.gpg /usr/share/keyrings/
注意: key的导入命令从上步骤最后一行获取
The public cudnn-local-repo-ubuntu2204-8.8.1.3 GPG key does not appear to be installed.
To install the key, run this command:
sudo cp /var/cudnn-local-repo-ubuntu2204-8.8.1.3/cudnn-local-DB35EEEE-keyring.gpg /usr/share/keyrings/
4.4.3 刷新存储库元数据
sudo apt-get update
4.4.4 安装运行时库
sudo apt-get install libcudnn8=8.8.1.3-1+cuda12.1
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package libcudnn8 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
E: Version '8.8.1.3-1+cuda12.1' for 'libcudnn8' was not found
E: Version ‘8.8.1.3-1+cuda12.1’ for ‘libcudnn8’ was not found
找不到8.8.1.3-1+cuda12.1结合的安装包
ll /var/cudnn-local-repo-ubuntu2204-8.8.1.3/
(modelscope) fb@VP01:~/download$ ll /var
total 68
drwxr-xr-x 15 root root 4096 Apr 7 00:43 ./
drwxr-xr-x 19 root root 4096 Apr 6 22:50 ../
drwxr-xr-x 2 root root 4096 Apr 18 2022 backups/
drwxr-xr-x 11 root root 4096 Apr 6 23:06 cache/
drwxrwxrwt 2 root root 4096 Feb 11 05:36 crash/
drwxr-xr-x 2 root root 12288 Apr 5 11:52 cuda-repo-wsl-ubuntu-12-1-local/
drwxr-xr-x 2 root root 4096 Apr 7 00:43 cudnn-local-repo-ubuntu2204-8.8.1.3/
drwxr-xr-x 28 root root 4096 Feb 11 05:36 lib/
drwxrwsr-x 2 root staff 4096 Apr 18 2022 local/
lrwxrwxrwx 1 root root 9 Feb 11 05:35 lock -> /run/lock/
drwxrwxr-x 7 root syslog 4096 Apr 5 11:08 log/
drwxrwsr-x 2 root backup 4096 Feb 11 05:35 mail/
drwxr-xr-x 2 root root 4096 Feb 11 05:35 opt/
lrwxrwxrwx 1 root root 4 Feb 11 05:35 run -> /run/
drwxr-xr-x 7 root root 4096 Feb 11 05:36 snap/
drwxr-xr-x 4 root root 4096 Feb 11 05:35 spool/
drwxrwxrwt 2 root root 4096 Apr 5 22:48 tmp/
(modelscope) fb@VP01:~/download$ ll /var/cudnn-local-repo-ubuntu2204-8.8.1.3/
total 872792
drwxr-xr-x 2 root root 4096 Apr 7 00:43 ./
drwxr-xr-x 15 root root 4096 Apr 7 00:43 ../
-rw-r--r-- 1 root root 1662 Mar 2 04:21 DB35EEEE.pub
-rw-r--r-- 1 root root 1575 Mar 2 04:21 InRelease
-rw-r--r-- 1 root root 1930 Mar 2 04:21 Local.md5
-rw-r--r-- 1 root root 836 Mar 2 04:21 Local.md5.gpg
-rw-r--r-- 1 root root 2114 Mar 2 04:21 Packages
-rw-r--r-- 1 root root 947 Mar 2 04:21 Packages.gz
-rw-r--r-- 1 root root 690 Mar 2 04:21 Release
-rw-r--r-- 1 root root 836 Mar 2 04:21 Release.gpg
-rw-r--r-- 1 root root 1141 Mar 2 04:21 cudnn-local-DB35EEEE-keyring.gpg
-rw-r--r-- 1 root root 440032208 Mar 2 04:21 libcudnn8-dev_8.8.1.3-1+cuda12.0_amd64.deb
-rw-r--r-- 1 root root 1664314 Mar 2 04:21 libcudnn8-samples_8.8.1.3-1+cuda12.0_amd64.deb
-rw-r--r-- 1 root root 451984894 Mar 2 04:21 libcudnn8_8.8.1.3-1+cuda12.0_amd64.deb
找到正确的包名,完美解决 ‘libcudnn8’ was not found’
sudo apt-get install libcudnn8=8.8.1.3-1+cuda12.0
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
libcudnn8
0 upgraded, 1 newly installed, 0 to remove and 53 not upgraded.
Need to get 0 B/452 MB of archives.
After this operation, 1152 MB of additional disk space will be used.
Get:1 file:/var/cudnn-local-repo-ubuntu2204-8.8.1.3 libcudnn8 8.8.1.3-1+cuda12.0 [452 MB]
Selecting previously unselected package libcudnn8.
(Reading database ... 40195 files and directories currently installed.)
Preparing to unpack .../libcudnn8_8.8.1.3-1+cuda12.0_amd64.deb ...
Unpacking libcudnn8 (8.8.1.3-1+cuda12.0) ...
Setting up libcudnn8 (8.8.1.3-1+cuda12.0) ...
4.4.5 安装开发者库
sudo apt-get install libcudnn8-dev=8.8.1.3-1+cuda12.0
4.4.6 安装代码示例和cuDNN 库文档
sudo apt-get install libcudnn8-samples=8.8.1.3-1+cuda12.0
4.5 验证cuDNN
cp -r /usr/src/cudnn_samples_v8/ $HOME
cd $HOME/cudnn_samples_v8/mnistCUDNN
make clean && make
./mnistCUDNN
4.5.1 test.c:1:10: fatal error: FreeImage.h: No such file or directory
遇到报错
rm -rf *o
rm -rf mnistCUDNN
CUDA_VERSION is 12010
Linking agains cublasLt = true
CUDA VERSION: 12010
TARGET ARCH: x86_64
HOST_ARCH: x86_64
TARGET OS: linux
SMS: 50 53 60 61 62 70 72 75 80 86 87 90
test.c:1:10: fatal error: FreeImage.h: No such file or directory
1 | #include "FreeImage.h"
| ^~~~~~~~~~~~~
compilation terminated.
安装缺失FreeImage模块
sudo apt-get install libfreeimage3 libfreeimage-dev
4.5.2 nvcc fatal : Unsupported gpu architecture ‘compute_35’ 算力不支持
rm -rf *o
rm -rf mnistCUDNN
CUDA_VERSION is 12030
Linking agains cublasLt = true
CUDA VERSION: 12030
TARGET ARCH: x86_64
HOST_ARCH: x86_64
TARGET OS: linux
SMS: 35 50 53 60 61 62 70 72 75 80 86 87
/usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -I/usr/local/cuda/include -IFreeImage/include -ccbin g++ -m64 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_87,code=compute_87 -o fp16_dev.o -c fp16_dev.cu
nvcc fatal : Unsupported gpu architecture 'compute_35'
make: *** [Makefile:241: fp16_dev.o] Error 1
编辑Makefile 禁用35
sudo vi Makefile
$(SMS) 改为 $(filter-out 35, $(SMS))
ifeq ($(GENCODE_FLAGS),)
# Generate SASS code for each SM architecture listed in $(SMS)
$(foreach sm,$(filter-out 35, $(SMS)),$(eval GENCODE_FLAGS += -gencode arch=compute_$(sm),code=sm_$(sm)))
GPU算力查询
https://developer.nvidia.com/cuda-gpus
4.5.3 正确的编译并执行
make clean && make
./mnistCUDNN
Executing: mnistCUDNN
cudnnGetVersion() : 8801 , CUDNN_VERSION from cudnn.h : 8801 (8.8.1)
Host compiler version : GCC 11.3.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 128 Capabilities 8.9, SmClock 2535.0 Mhz, MemSize (Mb) 24563, MemClock 10501.0 Mhz, Ecc=0, boardGroupID=0
Using device 0
Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.015360 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.018432 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.022528 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.192608 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 9.770848 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 76.177406 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.101376 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.140416 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.149504 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.186304 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 19.718143 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 20.643841 time requiring 2450080 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000562 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.018432 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.020480 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.025600 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.069536 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.128000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.137216 time requiring 178432 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.067584 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.105472 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for