安装 NVIDIA 驱动
lspci | grep -i nvidia # 检查显卡
gcc --version # 检查是否安装 gcc
sudo apt-get install linux-headers-$(uname -r) # 安装内核头
下载 CUDA Toolkit
sudo dpkg -i cuda-repo-<distro>_<version>_<architecture>.deb
sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
环境
export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
In addition, when using the runfile installation method, the LD_LIBRARY_PATH
variable needs to contain /usr/local/cuda-10.0/lib64
on a 64-bit system, or /usr/local/cuda-10.0/lib
on a 32-bit system
To change the environment variables for 64-bit operating systems:
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
To change the environment variables for 32-bit operating systems:
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Note that the above paths change when using a custom install path with the runfile installation method.
Because of the addition of new features specific to the NVIDIA POWER9 CUDA driver, there are some additional setup requirements in order for the driver to function properly. These additional steps are not handled by the installation of CUDA packages, and failure to ensure these extra requirements are met will result in a non-functional CUDA driver installation.
There are two changes that need to be made manually after installing the NVIDIA CUDA driver to ensure proper operation:
The NVIDIA Persistence Daemon should be automatically started for POWER9 installations. Check that it is running with the following command:
systemctl status nvidia-persistenced
If it is not active, run the following command:
sudo systemctl enable nvidia-persistenced
Disable a udev rule installed by default in some Linux distributions that cause hot-pluggable memory to be automatically onlined when it is physically probed. This behavior prevents NVIDIA software from bringing NVIDIA device memory online with non-default settings. This udev rule must be disabled in order for the NVIDIA CUDA driver to function properly on POWER9 systems.
On RedHat Enterprise Linux 7, this rule can be found in:
/lib/udev/rules.d/40-redhat.rules
On Ubuntu 17.04, this rule can be found in:
/lib/udev/rules.d/40-vm-hotadd.rules
The rule generally takes a form where it detects the addition of a memory block and changes the ‘state’ attribute to online. For example, in RHEL7, the rule looks like this:
SUBSYSTEM=="memory", ACTION=="add", PROGRAM="/bin/uname -p", RESULT!="s390*", ATTR{state}=="offline", ATTR{state}="online"
This rule must be disabled by copying the file to /etc/udev/rules.d and commenting out, removing, or changing the hot-pluggable memory rule in the /etc copy so that it does not apply to POWER9 NVIDIA systems. For example, on RHEL:
sudo cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d
sudo sed -i '/SUBSYSTEM=="memory", ACTION=="add"/d' /etc/udev/rules.d/40-redhat.rules
You will need to reboot the system to initialize the above changes.
卸载
Use the following command to uninstall a Toolkit runfile installation:
sudo /usr/local/cuda-X.Y/bin/uninstall_cuda_X.Y.pl
Use the following command to uninstall a Driver runfile installation:
sudo /usr/bin/nvidia-uninstall
Use the following commands to uninstall a RPM/Deb installation:
sudo yum remove <package_name> # Redhat/CentOS
sudo dnf remove <package_name> # Fedora
sudo zypper remove <package_name> # OpenSUSE/SLES
sudo apt-get --purge remove <package_name> # Ubuntu
NVIDIA-Docker
Make sure you have installed the NVIDIA driver and a supported version of Docker for your distribution (see prerequisites).
Docker install
sudo wget -qO- https://get.docker.com/ | sh
If you have a custom /etc/docker/daemon.json
, the nvidia-docker2
package might override it.
Ubuntu 14.04/16.04/18.04, Debian Jessie/Stretch
# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo apt-get purge -y nvidia-docker
# Add the package repositories
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
# Install nvidia-docker2 and reload the Docker daemon configuration
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd
# Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi