显卡、CUDA、TensorFlow、PyTorch等之间各个版本问题大总结

10 篇文章 2 订阅
2 篇文章 0 订阅

显卡、CUDA、TensorFlow、PyTorch这四个组件想要能够稳定的运行在一个电脑上有点不容易,由于其多样性的版本经常出现各种问题,下面讨论下如何让四个组件协调工作,只考虑Windows系统下。

显卡

显卡是个人计算机最基本组成部分之一,用途是将计算机系统所需要的显示信息进行转换驱动显示器。显示芯片是显卡的主要处理单元,因此又称为图形处理器(Graphic Processing Unit,GPU),GPU是NVIDIA公司首先提出的。

这四个组件最重要的是显卡,因为是硬件是要花钱的,总不能随便就换吧,其它三个都是软件,又不要钱。

默认安装了NVIDIA显卡,需要检查你的显卡支持哪个版本的CUDA,打开电脑控制面板

这是你的电脑能够支持的CUDA版本,我的显卡是RTX2070,能够支持CUDA10.1,这是向下兼容的,当然CUDA9也是支持的

显卡驱动 

显卡驱动需要与CUDA版本对应,英伟达官方给出了对应关系,详细可以查询 https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

显卡驱动同样可以去控制面板中查询

 如果显卡驱动不满足,可以手动下载安装 https://www.nvidia.cn/Download/index.aspx?lang=cn

CUDA 

CUDA下载的官网https://developer.nvidia.com/cuda-toolkit-archive,选择适合你电脑系统的版本下载安装

 注意有的版本会在下面提供补丁,也需要下载,不然运行程序偶尔会报错,例如CUDA9.0提供了四个补丁,我当时就没安装,导致运行一个程序一直报错

cuDNN 

cuDNN是一个CUDA的一个加速配件,可以去https://developer.nvidia.com/rdp/cudnn-archive 下载(需要注册)

cuDNN版本和CUDA版本也有对应关系,如下图所示。需要注意的是一个CUDA版本会对应很多cuDNN版本

TensorFlow 

TensorFlow和CUDA版本对应关系如下,这图是我从别人那里拿的,需要注意的是最右边的cuDNN不是固定的,上面解释过了。 如果TensorFlow版本不足,而CUDA版本过高,是不能正常运行的。

当然这是别人总结的,官方也提供了对应关系 https://www.tensorflow.org/install/source_windows

 这两个图都不全,建议合在一起看

 PyTorch

PyTorch版本去这个网站查询https://download.pytorch.org/whl/torch_stable.html

cu100表示CUDA10.0版本,cu101表示CUDA10.1版本

1.0.1和1.2.0表示具体的PyTorch版本

CP36表示python3.6版本,cp27表示python2.7版本

win_amd64表示Windows系统

补充torchvison和pytorch的版本关系如下:

可以在这里查看最新的https://github.com/pytorch/vision

总结

图中连线表示需要版本协调

自编译tensorflow: 1.python3.5,tensorflow1.12; 2.支持cuda10.0,cudnn7.3.1,TensorRT-5.0.2.6-cuda10.0-cudnn7.3; 3.无mkl支持; 软硬件硬件环境:Ubuntu16.04,GeForce GTX 1080 TI 配置信息: hp@dla:~/work/ts_compile/tensorflow$ ./configure WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown". You have bazel 0.19.1 installed. Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3 Found possible Python library paths: /usr/local/lib/python3.5/dist-packages /usr/lib/python3/dist-packages Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages] Do you wish to build TensorFlow with XLA JIT support? [Y/n]: XLA JIT support will be enabled for TensorFlow. Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: No OpenCL SYCL support will be enabled for TensorFlow. Do you wish to build TensorFlow with ROCm support? [y/N]: No ROCm support will be enabled for TensorFlow. Do you wish to build TensorFlow with CUDA support? [y/N]: y CUDA support will be enabled for TensorFlow. Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 10.0]: Please specify the location where CUDA 10.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-10.0 Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7]: 7.3.1 Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: Do you wish to build TensorFlow with TensorRT support? [y/N]: y TensorRT support will be enabled for TensorFlow. Please specify the location where TensorRT is installed. [Default is /usr/lib/x86_64-linux-gnu]://home/hp/bin/TensorRT-5.0.2.6-cuda10.0-cudnn7.3/targets/x86_64-linux-gnu Please specify the locally installed NCCL version you want to use. [Default is to use https://github.com/nvidia/nccl]: Please specify a list of comma-separated Cuda compute capabilities you want to build with. You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1,6.1,6.1]: Do you want to use clang as CUDA compiler? [y/N]: nvcc will be used as CUDA compiler. Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: Do you wish to build TensorFlow with MPI support? [y/N]: No MPI support will be enabled for TensorFlow. Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native -Wno-sign-compare]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: Not configuring the WORKSPACE for Android builds. Preconfigured Bazel build configs. You can use any of the below by adding "--config=" to your build command. See .bazelrc for more details. --config=mkl # Build with MKL support. --config=monolithic # Config for mostly static monolithic build. --config=gdr # Build with GDR support. --config=verbs # Build with libverbs support. --config=ngraph # Build with Intel nGraph support. --config=dynamic_kernels # (Experimental) Build kernels into separate shared objects. Preconfigured Bazel build configs to DISABLE default on features: --config=noaws # Disable AWS S3 filesystem support. --config=nogcp # Disable GCP support. --config=nohdfs # Disable HDFS support. --config=noignite # Disable Apacha Ignite support. --config=nokafka # Disable Apache Kafka support. --config=nonccl # Disable NVIDIA NCCL support. Configuration finished 编译: bazel build --config=opt --verbose_failures //tensorflow/tools/pip_package:build_pip_package 卸载已有tensorflow: hp@dla:~/temp$ sudo pip3 uninstall tensorflow 安装自己编译的成果: hp@dla:~/temp$ sudo pip3 install tensorflow-1.12.0-cp35-cp35m-linux_x86_64.whl
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值