Ubuntu 18.04, CUDA 10.0 编译安装tensorflow GPU版本

显卡: GTX 1080

系统: Ubuntu 18.04

 

在尝试了多次在Windows下安装tensorflow GPU版本仍然不成功后,决定在Ubuntu系统上安装tensorflow GPU版本。

 

安装的指导流程为: https://www.pytorials.com/how-to-install-tensorflow-gpu-with-cuda-10-0-for-python-on-ubuntu/

 

虽然最后证明哪条路都不好走,但是好歹还是在Linux上成功了,特此记录几个踩到的坑。

 

1. Ubuntu系统对显卡的支持

虽然Ubuntu系统在安装的时候就能够自动辨识Nvidia的显卡,但是18.04默认安装的显卡驱动是错误的。安装完就出现待机黑屏无法重新点亮的问题,如果不解决这个问题,会直接影响后续CUDA的成功安装。

 

1)使用nvidia-smi命令查看显卡是否正常工作,如果有错误信息,说明显卡驱动不对;

2) 在系统设置的detail中查看显卡的信息是否正确;

3)在Software&Updates中,更改显卡的驱动程序(先装系统检测到的390版本),设置UEFI硬件锁密码,在下次启动的时候,进入BIOS MOK设置;

 

2. 一些基础准备工作

1) 安装好中文输入法,方便查找问题;在安装了中文支持之后,需要重启一次,才能安装拼音输入法;

2) 使用sudo apt-get install python*方式安装python, pip等必要库;

sudo apt-get install python3-sklearn

在这个阶段,不推荐使用anaconda安装;

3) 配置好ubuntu以及pip的国内源,否则安装速度让人崩溃;

 

3. 安装CUDA及cuDNN

在Nvidia官网中,虽然有详细的教程,但是在安装过程中依然有问题:

1) 在Ubuntu 18.04版本下,Nvidia推荐CUDA 10.0;按照官网的流程安装完成后,重启,然后使用nvcc -V检查版本信息是否正确;第一次在没有解决显卡驱动的情况下,安装不成功;后来显卡问题解决后,CUDA的安装也正常了。

2)cuDNN的安装,官网的流程(deb文件安装),还需要把cuDNN.h以及libcudnn*.so文件拷到/usr/local/cuda-10.0/下相应的include(.h)和lib64(.so)文件夹下,并修改访问权限;

deb安装并没有自动把这些文件拷贝到正确的路径中;

官方安装教程:https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html

还有测试的方法

3)NCCL库的安装,由于只有一个GPU,可以先不管(其实跟cuDNN是一个问题,deb安装后,没有把文件装到正确的路径中)

4)把CUDA ,cuDNN和NCCL的版本号记录下来,后面编译的时候需要使用;在本次安装中,分别为10.0和7.4.2;

 

4. 从源代码用bazel编译tensorflow GPU

现在tensforflow GPU的预编译文件(1.12版本)不支持CUDA 10.0,只支持CUDA 9.0;因此直接用pip install安装,会报错找不到.so文件。只能用bazel从源代码编译tensorflow,在windows环境下同样需要从源代码编译。

 

1)NCCL的配置时,用1.3,编译的时候会自动下载安装;

2)需要把python3和python2都安装,否则会报错;

3)同时安装pip和pip3时,发生了报错(__main__错误),可使用sudo python3 -m pip uninstall pip && sudo apt install python3 -pip –reinstall修复

4)记得在配置时选上GPU支持,否则就白折腾了。

 

5. 几个还不知道答案的问题

1)Ubuntu 18.04是否可以安装CUDA 9.0,然后直接用pip install?

2)NCCL库没有使用最新的版本,是否影响性能?

 

祝踩坑愉快!

  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
编译tensorflow: 1.python3.5,tensorflow1.12; 2.支持cuda10.0,cudnn7.3.1,TensorRT-5.0.2.6-cuda10.0-cudnn7.3; 3.支持mkl,无MPI; 软硬件硬件环境:Ubuntu16.04,GeForce GTX 1080 配置信息: hp@dla:~/work/ts_compile/tensorflow$ ./configure WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.19.1 installed. Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3 Found possible Python library paths: /usr/local/lib/python3.5/dist-packages /usr/lib/python3/dist-packages Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0 Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.3.1 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: Do you wish to build TensorFlow with TensorRT support? [y/N]: y TensorRT support will be enabled for TensorFlow. Please specify the location where TensorRT is installed. [Default is /usr/lib/x86_64-linux-gnu]:/home/hp/bin/TensorRT-5.0.2.6-cuda10.0-cudnn7.3/targets/x86_64-linux-gnu Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1,6.1,6.1]: Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished 编译: hp@dla:~/work/ts_compile/tensorflow$ bazel build --config=opt --config=mkl --verbose_failures //tensorflow/tools/pip_package:build_pip_package 卸载已有tensorflow: hp@dla:~/temp$ sudo pip3 uninstall tensorflow 安装自己编译的成果: hp@dla:~/temp$ sudo pip3 install tensorflow-1.12.0-cp35-cp35m-linux_x86_64.whl

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值