序言
tensorflow中,为了编译并运行能够使用 GPU 的 TensorFlow, 需要先安装 NVIDIA 提供的 Cuda Toolkit和 CUDNN
失败的安装过程
首先tensorflow中文社区的安装提示中提示Cuda Toolkit 7.0和 CUDNN 6.5 V2.这版本.在CUDA官网中查询release notes:http://docs.nvidia.com/cuda/#axzz4g719e0em,得知Cuda Toolkit主要包含一下内容:
Compiler
The CUDA-C and CUDA-C++ compiler
Tools
The following development tools are available in the bin/ directory (except for Nsight Visual Studio Edition (VSE) which is installed as a plug-in to Microsoft Visual Studio)
IDEs: nsight (Linux, Mac), Nsight VSE (Windows)
Debuggers: cuda-memcheck, cuda-gdb (Linux, Mac), Nsight VSE (Windows)
Profilers: nvprof, nvvp, Nsight VSE (Windows)
Utilities: cuobjdump, nvdisasm, gwiz
cudnn主要是NIVAD提供的神经网络GPU加速的库.
我选择安装最新版本,想当然也知道,N卡加速肯定有系统要求的,去官网查看安装说明里面有如下内容:
To use CUDA on your system, you will need the following installed:
CUDA-capable GPU
A supported version of Linux with a gcc compiler and toolchain
NVIDIA CUDA Toolkit (available at http://developer.nvidia.com/cuda-downloads)
显卡需求:TensorFlow 的 GPU 特性只支持 NVidia Compute Capability >= 3.5 的显卡.
操作系统需求和GCC版本需求.
最后才是安装开发TOOLKIT.
我选择runfile的安装形式,官网提示直接使用 sudo sh cudaxxxx
报错:
It appears that an X server is running. Please exit X before installation. If you're sure that X is not running, but are getting this error, please delete any X lock files in /tmp.
官网提示:
Disable the Nouveau drivers.
Reboot into text mode (runlevel 3).
Verify that the Nouveau drivers are not loaded. If the Nouveau drivers are still loaded, consult your distribution's documentation to see if further steps are needed to disable Nouveau.
其中disable需要将模块添加黑名单,然后加载到内核中.期间需要使用到mkinitramfs
报错:
E: Problem with MergeList /var/lib/apt/lists/ppa.launchpad.net_vincent-c_nevernote_ubuntu_dists_xenial_main_binary-amd64_Packages
解决:sudo rm /var/lib/apt/lists/* -vf
lsmod | grep nouveau
显示已经没有该模块,报错如下
please delete any x lock file in /tmp
删除/temp 下.X0文件
ERROR: The kernel module failed to load, because it was not signed by a key
that is trusted by the kernel. Please try installing the driver
again,