运行百度的paddle多卡训练需要依赖nccl,所以需要安装nccl,本文提供压缩包的nccl安装方式,亲测可用
Network Installer for Ubuntu20.04
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
$ sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
$ sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
$ sudo apt-get update
Network Installer for Ubuntu18.04
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
$ sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
$ sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
$ sudo apt-get update
Network Installer for RedHat/CentOS 8
$ sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
Network Installer for RedHat/CentOS 7
$ sudo yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
then run the following command to installer NCCL:
For Ubuntu: sudo apt install libnccl2=2.8.4-1+cuda11.1 libnccl-dev=2.8.4-1+cuda11.1
For RHEL/Centos: sudo yum install libnccl-2.8.4-1+cuda11.1 libnccl-devel-2.8.4-1+cuda11.1 libnccl-static-2.8.4-1+cuda11.1
Install the repository.
-
For a local NCCL repository:
sudo dpkg -i nccl-repo-<version>.deb
Note:
The local repository installation will prompt you to install the local key it embeds and with which packages are signed. Make sure to follow the instructions to install the local key, or the install phase will fail later. -
For the network repository:
wget https://developer.download.nvidia.com/compute/cuda/repos/<distro>/<architecture>/cuda-keyring_1.0-1_all.deb sudo dpkg -i cuda-keyring_1.0-1_all.deb
参考资料:
安装nccl教程
Ubuntu16.04安装NCCL
https://docs.nvidia.com/deeplearning/nccl/install-guide/#debian
NCCL所有版本