Windows下编译带CUDA 11.3的TensorFlow 2.5.0(Python3.9.5,cuDNN 8.2.0,兼容性3.5 - 8.6,附编译结果下载)

基本参照我的这篇文章:《Windows下编译带CUDA 11.2的TensorFlow 2.4.1(Python3.9.1,cuDNN 8.1.0,兼容性3.5 - 8.6,附编译结果下载)》,有些地方有所改动。

环境准备

1. 内存要求

在8个并行任务下(默认并行数为CPU线程数),应有不小于10G的内存,否则会产生编译器堆空间不足的错误。

2. Python & Pip

首先Python需要安装一些包:six、numpy、wheel、setuptools、keras_applications和keras_preprocessing,使用管理员权限打开命令提示符:

pip install six numpy wheel setuptools
pip install Keras_applications Keras_preprocessing --no-deps

 注意,Python路径中不能出现空格,即Windows下默认安装路径C:\Program Files\Python39会在编译时报错,因此如果装到了这个路径,需要在一个没有空格的目录下创建一个链接(不是快捷方式),用mklink命令。

3. CUDA

这里选的CUDA 11.3,CUDA官网下载安装,没什么好说的。

4. Bazel

然后是Bazel,bazel很简单,就一个exe,需要设置环境变量给到Path下,我偷懒直接放到CUDA的bin目录下。我选的版本是3.7.2。

5. MSYS2

再安装MSYS2,同样需要给msys64\usr\bin目录设置环境变量。

装好后再安装一些包,用的是pacman,由于默认源极慢极慢,所以建议国内换源。

进到msys64\etc\pacman.d目录下,修改所有mirrolist,分别在各自所有Server行前加一行,把下面清华/中科大/北邮的随便选一个复制上来就行。

打开msys64命令行,官方教程少提了一个zip包,因此安装命令如下:

pacman -S git patch unzip zip

6. Visual Studio 2019

 然后是VS,下载VS安装器,为避免麻烦,装到C盘默认路径这次我没有尝试非C盘路径,不知道找不到编译器的bug还在不在)。如果非VS用户,只需安装除必选组件外的MSVC v142 - VS 2019 C++ x64/x86生成工具(随便一个,我选的最新版本)和Windows 10 SDK(同样随便,我选的最新的)。

编译

配置编译

下载TensorFlow 2.5.0源码,进入解压后的根目录,执行

D:\tensorflow-2.5.0>python configure.py
You have bazel 3.7.2 installed.
Please specify the location of python. [Default is C:\Python39\python.exe]:


Found possible Python library paths:
  C:\Python39\lib\site-packages
Please input the desired Python library path to use.  Default is [C:\Python39\lib\site-packages]

Do you wish to build TensorFlow with ROCm support? [y/N]:
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Found CUDA 11.3 in:
    D:/CUDA/lib/x64
    D:/CUDA/include
Found cuDNN 8 in:
    D:/CUDA/lib/x64
    D:/CUDA/include


Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 3.5,3.7,5.0,5.2,6.0,6.1,7.0,7.5,8.0,8.6


Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is /arch:AVX]: /arch:AVX2


Would you like to override eigen strong inline for some C++ compilation to reduce the compilation time? [Y/n]:
Eigen strong inline overridden.

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
        --config=mkl            # Build with MKL support.
        --config=mkl_aarch64    # Build with oneDNN and Compute Library for the Arm Architecture (ACL).
        --config=monolithic     # Config for mostly static monolithic build.
        --config=numa           # Build with NUMA support.
        --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.
        --config=v2             # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
        --config=noaws          # Disable AWS S3 filesystem support.
        --config=nogcp          # Disable GCP support.
        --config=nohdfs         # Disable HDFS support.
        --config=nonccl         # Disable NVIDIA NCCL support.

这版TensorFlow编译SM 3.5会报错,查了一下貌似TensorRT不支持这么低的版本。

代码修改

启用MKL时

从2.4开始TensorFlow将MKL用到的OpenMP从直接下载二进制可执行文件变为了从LLVM项目下载开源代码并编译,带来了一系列问题,这里建议官网下载安装程序(Pre-Built Binaries)安装LLVM(记住安装中写入系统环境变量),作用有二:

  1. 用现成的libiomp5md.lib和libiomp5md.dll取代编译过程,MSVC编译LLVM的OpenMP会出错。
  2. 将LLVM用到的DLL(在LLVM安装目录\bin下)复制到msys安装目录\usr\bin下,否则在后续编译步骤中使用msys里的bash时没有复制系统环境变量的Path会报错(比如编译Lite相关内容时会报找不到api-ms-win-crt-locale-l1-1-0.dll的错)。

将libiomp5md.lib放到third_party\mkl目录下,并在third_party\mkl\mkl.BUILD的第75行后插入:

cc_import(
    name = "iomp5",
    interface_library = "libiomp5md.lib",
    system_provided = 1,
)

 然后将下方Windows编译配置修改为:

cc_library(
    name = "mkl_libs_windows",
    deps = [
        "iomp5"
    ],
    visibility = ["//visibility:public"],
)

将libiomp5md.dll放到系统环境变量里。

将third_party\llvm_openmp\BUILD第74行修改为0,取消强制使用MSVC:

omp_vars_win = {
    "MSVC": 0,
}

没有这一步会报类似于如下错误(这段报错复制自2.4.1版,但2.5.0也会报同样的错误):

ERROR: D:/output_base/external/llvm_openmp/BUILD.bazel:176:10: C++ compilation of rule '@llvm_openmp//:libiomp5md.dll' failed (Exit 1): ml64.exe failed: error executing command
  cd D:/output_base/execroot/org_tensorflow
  SET CUDA_TOOLKIT_PATH=D:/CUDA
    SET INCLUDE=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\ATLMFC\include;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include;C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt;C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared;C:\Progr
  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 5
    评论
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值