报错:若是用which nvfortran
命令会发现无法找到nvfortran编译器
modulefile文件位于/opt/nvidia/hpc_sdk/modulefiles/nvhpc
目录下,其中会有一个以你所安装的sdk版本命名的modulefile,我这里是22.5,然后需要module load ./22.5
,这时再用which nvfortran
发现已经可以找到nvfortran编译器了.
另外,module是Linux系统上用于加载和管理模块的命令,但是大多数的发行版中并未默认安装,因此在加载过程中可能会遇到 module:command not found
的报错,需要用root权限来安装,CentOS为 sudo yum install environment-modules
, Ubuntu为 sudo apt install environment-modules
,安装完成后,重新打开一个终端或者 source /etc/profile.d/module.sh
启用module命令,如果是csh则用 source /etc/profile.d/module.csh
报错:make[3]: mpiifort: No such file or directory
NCLL安装:
Network Installer for Ubuntu22.04
Network Installer for RedHat/CentOS 9
Network Installer for RedHat/CentOS 8
Network Installer for RedHat/CentOS 7
- $ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
- $ sudo dpkg -i cuda-keyring_1.0-1_all.deb
- $ sudo apt-get update
- Network Installer for Ubuntu20.04
- $ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
- $ sudo dpkg -i cuda-keyring_1.0-1_all.deb
- $ sudo apt-get update
- $ sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
- $ sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
- $ sudo yum-config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
NCCl安装报错: error: dpkg frontend is locked by another process
解决:lsof /var/lib/dpkg/lock-frontend
sudo kill -9 PID
sudo rm /var/lib/dpkg/lock-frontend
sudo dpkg --configure -a