在 Linux 主机上配置了很多次 Cuda/CuDNN 的运行环境,在此记录下用到的脚本命令以复用。
特别提醒,先了解清楚 GPU 卡的型号,查清与主机 Linux 内核兼容的驱动程序、Cuda 和 CuDNN 的发行版。
请以 root 权限执行本文的所有 bash 命令。
1. NVIDIA 驱动安装
# WIKI: https://download.nvidia.com/XFree86/Linux-x86_64/375.20/README/installdriver.html wget http://us.download.nvidia.com/tesla/384.145/NVIDIA-Linux-x86_64-384.145.run && \ chmod u+x NVIDIA-Linux-x86_64-384.145.run && \ ./NVIDIA-Linux-x86_64-384.145.run --silent --dkms --accept-license
2. 打开持久模式
nvidia-smi -pm ENABLED # WIKI https://docs.nvidia.com/deploy/driver-persistence/index.html
4. GPU 设备信息查看
nvidia-smi # +-----------------------------------------------------------------------------+ # | NVIDIA-SMI 384.145 Driver Version: 384.145 | # |-------------------------------+----------------------+----------------------+ # | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | # | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | # |===============================+======================+======================| # | 0 Tesla V100-PCIE... Off | 00000000:1A:00.0 Off | 0 | # | N/A 34C P0 37W / 250W | 0MiB / 16152MiB | 0% Default | # +-------------------------------+----------------------+----------------------+ # | 1 Tesla V100-PCIE... Off | 00000000:1F:00.0 Off | 0 | # | N/A 36C P0 36W / 250W | 0MiB / 16152MiB | 0% Default | # +-------------------------------+----------------------+----------------------+ nvidia-smi topo --matrix # 查看拓扑信息 # GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 mlx5_1 mlx5_0 CPU Affinity # GPU0 X PIX PIX PIX SYS SYS SYS SYS SYS SYS 0-15,32-47 # GPU1 PIX X PIX PIX SYS SYS SYS SYS SYS SYS 0-15,32-47 # GPU2 PIX PIX X PIX SYS SYS SYS SYS SYS SYS 0-15,32-47 # GPU3 PIX PIX PIX X SYS SYS SYS SYS SYS SYS 0-15,32-47 # GPU4 SYS SYS SYS SYS X PIX PIX PIX NODE NODE 16-31,48-63 # GPU5 SYS SYS SYS SYS PIX X PIX PIX NODE NODE 16-31,48-63 # GPU6 SYS SYS SYS SYS PIX PIX X PIX NODE NODE 16-31,48-63 # GPU7 SYS SYS SYS SYS PIX PIX PIX X NODE NODE 16-31,48-63 # mlx5_1 SYS SYS SYS SYS NODE NODE NODE NODE X PIX # mlx5_0 SYS SYS SYS SYS NODE NODE NODE NODE PIX X nvidia-smi --id=0 --format=csv --query-gpu=utilization.gpu,memory.used # utilization.gpu [%], memory.used [MiB] # 0 %, 0 MiB
5. CUDA Toolkit 安装
wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run && \ chmod u+x cuda_9.0.176_384.81_linux-run && \ ./cuda_9.0.176_384.81_linux-run --toolkit --silent --verbos cat << EOF >> /etc/ld.so.conf.d/cuda.conf /usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64 EOF ldconfig cat << EOF >> /etc/profile.d/cuda.sh export PATH=/usr/local/cuda/bin:\$PATH EOF source /etc/profile
5. CuDNN 安装
# CuDNN 下载需要 Nvidia 账号。直接访问以下 URL,会被重定向到登录页面。 # https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v7.0.5/prod/9.0_20171129/Ubuntu16_04-x64/libcudnn7_7.0.5.15-1+cuda9.0_amd64 dpkg -i libcudnn7_7.0.5.15-1+cuda9.0_amd64.deb # 安装到 /usr/lib/x86_64-linux-gnu