新手教学:Centos7 + NvidiaTitanX + cuda9.0 + cudnn7.3 + python3.6.8 + TensorFlow1.12.0

新手教学:Centos7 + NvidiaTitanX + cuda9.0 + cudnn7.4 + python3.6.8 + TensorFlow1.12.0

0、可能用到的技术

(1)参考:挂载u盘

https://blog.csdn.net/ido1ok/article/details/79620746

1、 安装Centos7,联网

(1)参考:U盘制作CentOS启动盘

https://jingyan.baidu.com/article/a681b0de5e33d03b1843460f.html

(2)参考:解决CentOS7 用U盘安装卡住 无法进入安装界面

https://blog.csdn.net/qq_39996062/article/details/79328540

(3)参考:Centos 7.4 1708 系统安装教程

http://baijiahao.baidu.com/s?id=1599601257937774752&wfr=spider&for=pc

(4)参考:PPPOE拨号上网

https://www.jianshu.com/p/43b10aff69ae

(5)参考:路由上网

https://jingyan.baidu.com/article/19192ad8f7c320e53e570728.html

2、 安装依赖包

(1) 先转到root,避免频繁输入sudo

ps.#代表root权限

$ su root

输入密码

(2)再安装更新

# yum -y update

漫长的等待。。。如果报错/var/run/yum.id已被锁定,解决办法:

# rm -rf /var/run/yum.pid

(3)依次更新

# yum -y install kernel-devel
# yum -y install epel-release
# yum -y install dkms
# yum -y install gcc-c++
# yum -y install gcc kernel-devel kernel-headers

3、 检测显卡驱动和型号并安装

(1) 先添加ELPepo源

# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm

(2) NVIDIA驱动检测

# yum install nvidia-detect
# nvidia-detect -v

[root@localhost ripper]# nvidia-detect -v
Probing for supported NVIDIA devices…
[10de:17c2] NVIDIA Corporation GM200 [GeForce GTX TITAN X]
This device requires the current 418.74 NVIDIA driver kmod-nvidia

显卡驱动都是 418.74,登录NVIDIA官网 http://www.geforce.cn/drivers 设置驱动检索条件GeForce GTX TITAN X(注意尽量设置语言英文):
在这里插入图片描述在这里插入图片描述

(3) NVIDIA驱动下载

检索结果出现418.74,点击下载获取下载链接
http://us.download.nvidia.com/XFree86/Linux-x86_64/418.74/NVIDIA-Linux-x86_64-418.74.run
检索结果出现xxx.xx,点击下载获取下载链接
http://us.download.nvidia.com/XFree86/Linux-x86_64/xxx.xx/NVIDIA-Linux-x86_64-xxx.xx.run

或者ssh到服务器,下载驱动到/downloads目录下:
创建 /downloads目录

# mkdir /downloads

跳转到 /downloads目录

# cd /downloads

下载,xxx.xx是你搜索到的版本

wget -r -np -nd https://us.download.nvidia.com/XFree86/Linux-x86_64/418.74/NVIDIA-Linux-x86_64-418.74.run
wget -r -np -nd https://us.download.nvidia.com/XFree86/Linux-x86_64/xxx.xx/NVIDIA-Linux-x86_64-xxx.xx.run

(4) 先解决自带nouveau驱动冲突问题,先检测一下

因为NVIDIA驱动会和系统自带nouveau驱动冲突,执行命令查看该驱动状态

# lsmod | grep nouveau

出现类似下图,是正常的。
在这里插入图片描述

(5) 解决显卡冲突

在这里插入图片描述
To be more specific
修改配置文件,如果报错# yum install vim

# vim /etc/default/grub

从,按a输入,输入后,按esc,输入:wq

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet"
GRUB_DISABLE_RECOVERY="true"

修改到

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb rd.driver.blacklist=nouveau nouveau.modeset=0 quiet"
GRUB_DISABLE_RECOVERY="true"
~ 
~ 
:wq

生成配置

# grub2-mkconfig -o /boot/grub2/grub.cfg

修改文件,按a输入,在空文件中输入blacklist nouveau后,按esc,输入:wq

# vim /etc/modprobe.d/blacklist.conf

移动文件

# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-nouveau.img

更新配置

# dracut /boot/initramfs-$(uname -r).img $(uname -r)

重启

reboot

再检测,应该返回空

# lsmod | grep nouveau

(6) 安装显卡驱动,在/downloads下应该存在NVIDIA-Linux-x86_64-xxx.xx.run文件

# cd /downloads
# chmod +x NVIDIA-Linux-x86_64-xxx.xx.run (以实际包名为准) 验证:通过ls 命令查看,包名高亮显示即可
# sh NVIDIA-Linux-x86_64-xxx.xx.run

类似下图,默认有错误
在这里插入图片描述

如果报错ERROR: You appear to be running an X server; please exit X before installing.,注销用户。
然后同时按键Ctrl、Alt和F2 键。

Localhostlogin: admin (以实际包名为准) 

Password:

切换到根权限

[admin@localhost~]$  su root

输入init3进入文本模式

[root@localhost  ~]# init 3

找到NVIDIA-Linux-x86_64-295.53.run所在的文件夹

[root@localhost  ~]# cd /downloads

[root@localhost  admin]# ls

NIVIDIA-Linux-x86_64-295.53.run

高亮状态下,运行安装文件

[root@localhost  admin]# sh NIVIDIA-Linux-x86_64-xxx.xx.run

恢复原有图形模式

# init 5

(6) 检测安装状态

# nvidia-smi

类似下图在这里插入图片描述

4、安装Cuda9.0

(1)下载Cuda,爱莫能助

官网下载cuda-rpm包 https://developer.nvidia.com/cuda-downloads ,一定要对应自己的版本。
下载cuda9.0,网址 https://developer.nvidia.com/cuda-90-download-archive
在这里插入图片描述

(2)安装Cuda9.0

把文件放在/downloads目录下

# cd /downloads

命令:chmod +x cuda_8.0.44_linux.run (以实际包名为准)

# chmod +x cuda_9.0.176_384.81_linux.run

验证:通过ls 命令查看,包名高亮显示即可

# ls

在这里插入图片描述
命令:sh cuda_8.0.44_linux.run (以实际包名为准)

# sh cuda_9.0.176_384.81_linux.run

一直回车Enter,直到出现以下疑问。注意第二个填n
在这里插入图片描述

(3)验证Cuda9.0

cuda-x.x视情况而定

# cd /usr/local/cuda-x.x/samples/1_Utilities/deviceQuery
# make
# ./deviceQuery

结果

[root@localhost deviceQuery]# ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX TITAN X"
  CUDA Driver Version / Runtime Version          10.1 / 9.0
  CUDA Capability Major/Minor version number:    5.2
  Total amount of global memory:                 12210 MBytes (12802916352 bytes)
  (24) Multiprocessors, (128) CUDA Cores/MP:     3072 CUDA Cores
  GPU Max Clock rate:                            1076 MHz (1.08 GHz)
  Memory Clock rate:                             3505 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 3145728 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

(4)cuda添加到bashprofile中

方法1

# vim .bashprofile

按a输入,输入后,按esc,输入:wq

PATH=$PATH:$HOME/bin:/usr/local/cuda/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64/
CUDA_HOME=/usr/local/cuda
export PATH
export LD_LIBRARY_PATH
export CUDA_HOME

使环境变量立即生效

# source .bashprofile

方法2 貌似不太好用

# vim /etc/profile

按a输入,添加后,按esc,输入:wq

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

使环境变量立即生效

# source /etc/profile ; 

(5)检验cuda添加到bashprofile中

查看nvcc版本号c

# nvcc -V
# nvcc --version

结果

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
# cuda ; 按两下 tab 键

结果

cudafe                       cuda-gdb                     cuda-install-samples-9.0.sh
cudafe++                     cuda-gdbserver               cuda-memcheck

5、安装cudnn7.3

官网教学
在这里插入图片描述

(1) 先看版本

参考 tensorflow各个版本的CUDA以及Cudnn版本对应关系
https://blog.csdn.net/qq_27825451/article/details/89082978
参考 tar -xzvf cudnn-9.0-linux-x64-v7.tgz
https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#install-linux
参考 cudnn-8.0/9.0/10.0-linux-x64-v6.0/7.0/7.1/7.2/7.3/7.4.tgz下载
https://blog.csdn.net/xiangxianghehe/article/details/79177833
参考 cudnn7.3下载 for cuda9
https://download.csdn.net/download/godfyun/10682330
在这里插入图片描述

(2) 下载cudnn7.3

# cd /downloads
# wget http://developer.download.nvidia.com/compute/redist/cudnn/v7.3.0/cudnn-9.0-linux-x64-v7.3.0.29.tgz

(3) 安装cudnn7.3

# cd /downloads
# tar -xvzf cudnn-8.0-linux-x64-v6.0.tgz

(4) 复制cudnn7.3

# cd /downloads/cuda
# cp include/* /usr/local/cuda/include  
# cp lib64/* /usr/local/cuda/lib64
# chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

6、安装Python3.6.8

参考: Linux安装python3.6
https://www.cnblogs.com/kimyeee/p/7250560.html
在这里插入图片描述在这里插入图片描述

(1)查看是否已经安装Python

CentOS 7默认安装了python2.7,因为一些命令要用它比如yum它使用的是python2.7。

# python -V   //查看一下是否安装Python
# which python   //查看一下Python可执行文件的位置

(2)先安装相关包

# yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gcc make
# yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel

(3)下载Python包

官网下载编译安装包或者直接执行以下命令下载/downloads

# cd /downloads
# wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tgz

(4)解压Python

# cd /downloads
# tar -zxvf Python-3.6.8.tgz

(5)编译安装Python

# cd Python-3.6.8  //切换进入
# ./configure prefix=/usr/local/python3  //编译安装

make

# make

make install

# make install

安装完毕,/usr/local/目录下就会有python3

(6)软链到执行目录下/usr/bin

ln -s /usr/local/python3/bin/python3 /usr/bin/python3   //添加软链到执行目录下/usr/bin

(7)python添加到PATH

修改

# vim ~/.bash_profile

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH

按a输入,输入后,按esc,输入:wq

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin:/usr/local/python3/bin

export PATH
~                                                                               
:wq

(8)使环境变量立即生效

# source ~/.bash_profile

(9)检测python

# python3 -V    //查看输出的是python3
# python2 -V   //查看输出的是python2

结果

Python 3.6.8 (default, Jun  5 2019, 17:45:35) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
[3]+  已停止               python3

7、TensorFlow1.12.0

(1)首先,pip工具安装

首先检查有没有安装python-pip包,直接执行:

# yum install python-pip

没有python-pip包就执行命令:

# yum -y install epel-release

执行成功之后,再次执行:

# yum install python-pip

对安装好的pip进行升级:

# pip install --upgrade pip

实在不行,还可以附源码安装 pip

# wget --no-check-certificate https://github.com/pypa/pip/archive/9.0.1.tar.gz  # 下载源代码
# tar -zvxf 9.0.1 -C pip-9.0.1  # 解压文件
# cd pip-9.0.1
# python3 setup.py install # 使用 Python 3 安装
# sudo ln -s /usr/local/python3/bin/pip /usr/bin/pip3  #创建链接
# pip install --upgrade pip  # 升级 pip

(2)用清华源下载tensorflow-gpu==1.12.0

pip3 install tensorflow-gpu==1.12.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

(3)测试TensorFlow

输入

# python3
>>> import tensorflow as tf  
>>> hello = tf.constant('Hello, TensorFlow!')  
>>> sess = tf.Session()  
>>> print(sess.run(hello))  //print注意语法()

结果

[root@localhost Python-3.6.8]# python3
Python 3.6.8 (default, Jun  5 2019, 17:45:35) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
2019-06-05 18:08:39.292310: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-05 18:08:39.352566: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-05 18:08:39.353113: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX TITAN X major: 5 minor: 2 memoryClockRate(GHz): 1.076
pciBusID: 0000:01:00.0
totalMemory: 11.92GiB freeMemory: 11.67GiB
2019-06-05 18:08:39.353137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-06-05 18:08:39.602924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-05 18:08:39.602960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-06-05 18:08:39.602984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-06-05 18:08:39.603087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11292 MB memory) -> physical GPU (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:01:00.0, compute capability: 5.2)
>>> print(sess.run(hello))
b'Hello, TensorFlow!'

到此全部结束,enjoy TensorFlow!

8、后续操作

(1)参考:创建用户

https://blog.csdn.net/xudailong_blog/article/details/80518266

自编译tensorflow: 1.python3.5,tensorflow1.12; 2.支持cuda10.0,cudnn7.3.1,TensorRT-5.0.2.6-cuda10.0-cudnn7.3; 3.无mkl支持; 软硬件硬件环境:Ubuntu16.04,GeForce GTX 1080 TI 配置信息: hp@dla:~/work/ts_compile/tensorflow$ ./configure WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.19.1 installed. Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3 Found possible Python library paths: /usr/local/lib/python3.5/dist-packages /usr/lib/python3/dist-packages Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0 Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.3.1 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: Do you wish to build TensorFlow with TensorRT support? [y/N]: y TensorRT support will be enabled for TensorFlow. Please specify the location where TensorRT is installed. [Default is /usr/lib/x86_64-linux-gnu]://home/hp/bin/TensorRT-5.0.2.6-cuda10.0-cudnn7.3/targets/x86_64-linux-gnu Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1,6.1,6.1]: Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished 编译: bazel build --config=opt --verbose_failures //tensorflow/tools/pip_package:build_pip_package 卸载已有tensorflow: hp@dla:~/temp$ sudo pip3 uninstall tensorflow 安装自己编译的成果: hp@dla:~/temp$ sudo pip3 install tensorflow-1.12.0-cp35-cp35m-linux_x86_64.whl
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值