新手教学：Centos7 + NvidiaTitanX + cuda9.0 + cudnn7.3 + python3.6.8 + TensorFlow1.12.0

最新推荐文章于 2023-08-07 16:09:41 发布

置顶 RipperAaron赵冠智

最新推荐文章于 2023-08-07 16:09:41 发布

阅读量1.5k

点赞数 5

分类专栏： Centos TensorFlow 文章标签： GPU 服务器

本文链接：https://blog.csdn.net/weixin_43473435/article/details/90901580

版权

Centos 同时被 2 个专栏收录

1 篇文章 0 订阅

订阅专栏

TensorFlow

1 篇文章 0 订阅

订阅专栏

新手教学：Centos7 + NvidiaTitanX + cuda9.0 + cudnn7.4 + python3.6.8 + TensorFlow1.12.0

0、可能用到的技术
- （1）参考：挂载u盘
1、安装Centos7，联网
2、安装依赖包
3、检测显卡驱动和型号并安装
4、安装Cuda9.0
5、安装cudnn7.3
6、安装Python3.6.8
7、TensorFlow1.12.0
8、后续操作
- （1）参考：创建用户

0、可能用到的技术

（1）参考：挂载u盘

https://blog.csdn.net/ido1ok/article/details/79620746

1、安装Centos7，联网

2、安装依赖包

（1）先转到root，避免频繁输入sudo

ps.#代表root权限

$ su root

输入密码

（2）再安装更新

# yum -y update

漫长的等待。。。如果报错/var/run/yum.id已被锁定，解决办法：

# rm -rf /var/run/yum.pid

（3）依次更新

# yum -y install kernel-devel
# yum -y install epel-release
# yum -y install dkms
# yum -y install gcc-c++
# yum -y install gcc kernel-devel kernel-headers

3、检测显卡驱动和型号并安装

（1）先添加ELPepo源

# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm

（2） NVIDIA驱动检测

# yum install nvidia-detect
# nvidia-detect -v

[root@localhost ripper]# nvidia-detect -v
Probing for supported NVIDIA devices…
[10de:17c2] NVIDIA Corporation GM200 [GeForce GTX TITAN X]
This device requires the current 418.74 NVIDIA driver kmod-nvidia

显卡驱动都是 418.74，登录NVIDIA官网 http://www.geforce.cn/drivers 设置驱动检索条件GeForce GTX TITAN X（注意尽量设置语言英文）：
在这里插入图片描述

（3） NVIDIA驱动下载

检索结果出现418.74，点击下载获取下载链接
http://us.download.nvidia.com/XFree86/Linux-x86_64/418.74/NVIDIA-Linux-x86_64-418.74.run
检索结果出现xxx.xx，点击下载获取下载链接
http://us.download.nvidia.com/XFree86/Linux-x86_64/xxx.xx/NVIDIA-Linux-x86_64-xxx.xx.run

或者ssh到服务器，下载驱动到/downloads目录下：
创建 /downloads目录

# mkdir /downloads

跳转到 /downloads目录

# cd /downloads

下载，xxx.xx是你搜索到的版本

wget -r -np -nd https://us.download.nvidia.com/XFree86/Linux-x86_64/418.74/NVIDIA-Linux-x86_64-418.74.run

wget -r -np -nd https://us.download.nvidia.com/XFree86/Linux-x86_64/xxx.xx/NVIDIA-Linux-x86_64-xxx.xx.run

（4）先解决自带nouveau驱动冲突问题，先检测一下

因为NVIDIA驱动会和系统自带nouveau驱动冲突，执行命令查看该驱动状态

# lsmod | grep nouveau

出现类似下图，是正常的。
在这里插入图片描述

（5）解决显卡冲突

在这里插入图片描述
To be more specific
修改配置文件，如果报错# yum install vim

# vim /etc/default/grub

从，按a输入，输入后，按esc，输入:wq

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet"
GRUB_DISABLE_RECOVERY="true"

修改到

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb rd.driver.blacklist=nouveau nouveau.modeset=0 quiet"
GRUB_DISABLE_RECOVERY="true"
~ 
~ 
:wq

生成配置

# grub2-mkconfig -o /boot/grub2/grub.cfg

修改文件，按a输入，在空文件中输入blacklist nouveau后，按esc，输入:wq

# vim /etc/modprobe.d/blacklist.conf

移动文件

# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img

更新配置

# dracut /boot/initramfs-$(uname -r).img $(uname -r)

重启

reboot

再检测，应该返回空

# lsmod | grep nouveau

（6）安装显卡驱动，在/downloads下应该存在NVIDIA-Linux-x86_64-xxx.xx.run文件

# cd /downloads
# chmod +x NVIDIA-Linux-x86_64-xxx.xx.run (以实际包名为准) 验证：通过ls 命令查看，包名高亮显示即可
# sh NVIDIA-Linux-x86_64-xxx.xx.run

类似下图，默认有错误
在这里插入图片描述

如果报错ERROR: You appear to be running an X server; please exit X before installing.，注销用户。
然后同时按键Ctrl、Alt和F2 键。

Localhostlogin: admin (以实际包名为准) 

Password：

切换到根权限

[admin@localhost~]$  su root

输入init3进入文本模式

[root@localhost  ~]# init 3

找到NVIDIA-Linux-x86_64-295.53.run所在的文件夹

[root@localhost  ~]# cd /downloads

[root@localhost  admin]# ls

NIVIDIA-Linux-x86_64-295.53.run

高亮状态下，运行安装文件

[root@localhost  admin]# sh NIVIDIA-Linux-x86_64-xxx.xx.run

恢复原有图形模式

# init 5

（6）检测安装状态

# nvidia-smi

类似下图在这里插入图片描述

4、安装Cuda9.0

（1）下载Cuda，爱莫能助

官网下载cuda-rpm包 https://developer.nvidia.com/cuda-downloads ，一定要对应自己的版本。
下载cuda9.0，网址 https://developer.nvidia.com/cuda-90-download-archive
在这里插入图片描述

（2）安装Cuda9.0

把文件放在/downloads目录下

# cd /downloads

命令：chmod +x cuda_8.0.44_linux.run (以实际包名为准)

# chmod +x cuda_9.0.176_384.81_linux.run

验证：通过ls 命令查看，包名高亮显示即可

# ls

在这里插入图片描述
命令：sh cuda_8.0.44_linux.run (以实际包名为准)

# sh cuda_9.0.176_384.81_linux.run

一直回车Enter，直到出现以下疑问。注意第二个填n
在这里插入图片描述

（3）验证Cuda9.0

cuda-x.x视情况而定

# cd /usr/local/cuda-x.x/samples/1_Utilities/deviceQuery
# make
# ./deviceQuery

结果

[root@localhost deviceQuery]# ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX TITAN X"
  CUDA Driver Version / Runtime Version          10.1 / 9.0
  CUDA Capability Major/Minor version number:    5.2
  Total amount of global memory:                 12210 MBytes (12802916352 bytes)
  (24) Multiprocessors, (128) CUDA Cores/MP:     3072 CUDA Cores
  GPU Max Clock rate:                            1076 MHz (1.08 GHz)
  Memory Clock rate:                             3505 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 3145728 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

（4）cuda添加到bashprofile中

方法1

# vim .bashprofile

按a输入，输入后，按esc，输入:wq

PATH=$PATH:$HOME/bin:/usr/local/cuda/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64/
CUDA_HOME=/usr/local/cuda
export PATH
export LD_LIBRARY_PATH
export CUDA_HOME

使环境变量立即生效

# source .bashprofile

方法2 貌似不太好用

# vim /etc/profile

按a输入，添加后，按esc，输入:wq

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

使环境变量立即生效

# source /etc/profile ;

（5）检验cuda添加到bashprofile中

查看nvcc版本号c

# nvcc -V
# nvcc --version

结果

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

# cuda ; 按两下 tab 键

结果

cudafe                       cuda-gdb                     cuda-install-samples-9.0.sh
cudafe++                     cuda-gdbserver               cuda-memcheck

5、安装cudnn7.3

官网教学
在这里插入图片描述

（1）先看版本

参考 tensorflow各个版本的CUDA以及Cudnn版本对应关系
https://blog.csdn.net/qq_27825451/article/details/89082978
参考 tar -xzvf cudnn-9.0-linux-x64-v7.tgz
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#install-linux
参考 cudnn-8.0/9.0/10.0-linux-x64-v6.0/7.0/7.1/7.2/7.3/7.4.tgz下载
https://blog.csdn.net/xiangxianghehe/article/details/79177833
参考 cudnn7.3下载 for cuda9
https://download.csdn.net/download/godfyun/10682330
在这里插入图片描述

（2）下载cudnn7.3

# cd /downloads

# wget http://developer.download.nvidia.com/compute/redist/cudnn/v7.3.0/cudnn-9.0-linux-x64-v7.3.0.29.tgz

（3）安装cudnn7.3

# cd /downloads

# tar -xvzf cudnn-8.0-linux-x64-v6.0.tgz

（4）复制cudnn7.3

# cd /downloads/cuda
# cp include/* /usr/local/cuda/include  
# cp lib64/* /usr/local/cuda/lib64

# chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

6、安装Python3.6.8

参考： Linux安装python3.6
https://www.cnblogs.com/kimyeee/p/7250560.html
在这里插入图片描述

（1）查看是否已经安装Python

CentOS 7默认安装了python2.7，因为一些命令要用它比如yum它使用的是python2.7。

# python -V   //查看一下是否安装Python
# which python   //查看一下Python可执行文件的位置

（2）先安装相关包

# yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gcc make

# yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel

（3）下载Python包

官网下载编译安装包或者直接执行以下命令下载/downloads

# cd /downloads
# wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tgz

（4）解压Python

# cd /downloads
# tar -zxvf Python-3.6.8.tgz

（5）编译安装Python

# cd Python-3.6.8  //切换进入
# ./configure prefix=/usr/local/python3  //编译安装

make

# make

make install

# make install

安装完毕，/usr/local/目录下就会有python3

（6）软链到执行目录下/usr/bin

ln -s /usr/local/python3/bin/python3 /usr/bin/python3   //添加软链到执行目录下/usr/bin

（7）python添加到PATH

修改

# vim ~/.bash_profile

从

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH

按a输入，输入后，按esc，输入:wq

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin:/usr/local/python3/bin

export PATH
~                                                                               
:wq

（8）使环境变量立即生效

# source ~/.bash_profile

（9）检测python

# python3 -V    //查看输出的是python3
# python2 -V   //查看输出的是python2

结果

Python 3.6.8 (default, Jun  5 2019, 17:45:35) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
[3]+  已停止               python3

7、TensorFlow1.12.0

（1）首先，pip工具安装

首先检查有没有安装python-pip包，直接执行：

# yum install python-pip

没有python-pip包就执行命令：

# yum -y install epel-release

执行成功之后，再次执行：

# yum install python-pip

对安装好的pip进行升级：

# pip install --upgrade pip

实在不行，还可以附源码安装 pip

# wget --no-check-certificate https://github.com/pypa/pip/archive/9.0.1.tar.gz  # 下载源代码
# tar -zvxf 9.0.1 -C pip-9.0.1  # 解压文件
# cd pip-9.0.1
# python3 setup.py install # 使用 Python 3 安装
# sudo ln -s /usr/local/python3/bin/pip /usr/bin/pip3  #创建链接
# pip install --upgrade pip  # 升级 pip

（2）用清华源下载tensorflow-gpu==1.12.0

pip3 install tensorflow-gpu==1.12.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

（3）测试TensorFlow

输入

# python3
>>> import tensorflow as tf  
>>> hello = tf.constant('Hello, TensorFlow!')  
>>> sess = tf.Session()  
>>> print(sess.run(hello))  //print注意语法（）

结果

[root@localhost Python-3.6.8]# python3
Python 3.6.8 (default, Jun  5 2019, 17:45:35) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2019-06-05 18:08:39.292310: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-05 18:08:39.352566: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-05 18:08:39.353113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:01:00.0
totalMemory: 11.92GiB freeMemory: 11.67GiB
2019-06-05 18:08:39.353137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-06-05 18:08:39.602924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-05 18:08:39.602960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-06-05 18:08:39.602984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-06-05 18:08:39.603087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11292 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0, compute capability: 5.2)
>>> print(sess.run(hello))
b'Hello, TensorFlow!'

到此全部结束，enjoy TensorFlow！