有个小朋友不知更新了啥导致服务器输入nvidia-smi之后显示如下信息:NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
此问题我找了半天原因,不管怎么重装nvidia驱动都不对,最后有用的解决方案是更新内核。
主要参考资料:https://devtalk.nvidia.com/default/topic/1000340/cuda-setup-and-installation/-quot-nvidia-smi-has-failed-because-it-couldn-t-communicate-with-the-nvidia-driver-quot-ubuntu-16-04/post/5233711/#5233711
解决方案:更新Ubuntu内核(我们服务器从Linux 3.13.0-24-generic更新至Linux 4.12.9-041209-generic),然后按照正常流程安装最新的驱动nvidia-390
具体操作如下
#系统内核更新
#下载3个内核deb安装文件
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12.9/linux-headers-4.12.9-041209_4.12.9-041209.201708242344_all.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12.9/linux-headers-4.12.9-041209-generic_4.12.9-041209.201708242344_amd64.deb
wget http://kernel.ubuntu.com/