Windows下编译带CUDA 11.3的TensorFlow 2.5.0（Python3.9.5，cuDNN 8.2.0，兼容性3.5 - 8.6，附编译结果下载）

Windows环境下编译TensorFlow 2.5.0 with CUDA 11.3 & cuDNN 8.2.0

最新推荐文章于 2024-07-11 12:52:26 发布

原创

最新推荐文章于 2024-07-11 12:52:26 发布 · 3k 阅读

5 ·

CC 4.0 BY-SA版权

文章标签：

#tensorflow #windows #cuda #gpu #mkl

该博客详细介绍了如何在Windows系统下编译TensorFlow 2.5.0，包括所需的环境（Python 3.9.5, CUDA 11.3, cuDNN 8.2.0, MKL），编译步骤，以及注意事项。涉及CUDA安装、Bazel、MSYS2、Visual Studio 2019的配置，以及针对MKL的特殊处理。编译完成后提供pip包和C++库的下载链接。" 130470005,15115704,Ceph分布式存储：架构解析与读写原理,"['分布式存储', 'Ceph', '架构']

基本参照我的这篇文章：《Windows下编译带CUDA 11.2的TensorFlow 2.4.1（Python3.9.1，cuDNN 8.1.0，兼容性3.5 - 8.6，附编译结果下载）》，有些地方有所改动。

环境准备

1. 内存要求

在8个并行任务下（默认并行数为CPU线程数），应有不小于10G的内存，否则会产生编译器堆空间不足的错误。

2. Python & Pip

首先Python需要安装一些包：six、numpy、wheel、setuptools、keras_applications和keras_preprocessing，使用管理员权限打开命令提示符：

pip install six numpy wheel setuptools
pip install Keras_applications Keras_preprocessing --no-deps

注意，Python路径中不能出现空格，即Windows下默认安装路径C:\Program Files\Python39会在编译时报错，因此如果装到了这个路径，需要在一个没有空格的目录下创建一个链接（不是快捷方式），用mklink命令。

3. CUDA

这里选的CUDA 11.3，CUDA官网下载安装，没什么好说的。

4. Bazel

然后是Bazel，bazel很简单，就一个exe，需要设置环境变量给到Path下，我偷懒直接放到CUDA的bin目录下。我选的版本是3.7.2。

5. MSYS2

再安装MSYS2，同样需要给msys64\usr\bin目录设置环境变量。

装好后再安装一些包，用的是pacman，由于默认源极慢极慢，所以建议国内换源。

进到msys64\etc\pacman.d目录下，修改所有mirrolist，分别在各自所有Server行前加一行，把下面清华/中科大/北邮的随便选一个复制上来就行。

打开msys64命令行，官方教程少提了一个zip包，因此安装命令如下：

pacman -S git patch unzip zip

6. Visual Studio 2019

然后是VS，下载VS安装器，为避免麻烦，装到C盘默认路径（这次我没有尝试非C盘路径，不知道找不到编译器的bug还在不在）。如果非VS用户，只需安装除必选组件外的MSVC v142 - VS 2019 C++ x64/x86生成工具（随便一个，我选的最新版本）和Windows 10 SDK（同样随便，我选的最新的）。

编译

配置编译

下载TensorFlow 2.5.0源码，进入解压后的根目录，执行

D:\tensorflow-2.5.0>python configure.py
You have bazel 3.7.2 installed.
Please specify the location of python. [Default is C:\Python39\python.exe]:


Found possible Python library paths:
  C:\Python39\lib\site-packages
Please input the desired Python library path to use.  Default is [C:\Python39\lib\site-packages]

Do you wish to build TensorFlow with ROCm support? [y/N]:
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Found CUDA 11.3 in:
    D:/CUDA/lib/x64
    D:/CUDA/include
Found cuDNN 8 in:
    D:/CUDA/lib/x64
    D:/CUDA/include


Please specify a list of comma-separated CUDA compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code.
Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 3.5,3.7,5.0,5.2,6.0,6.1,7.0,7.5,8.0,8.6


Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is /arch:AVX]: /arch:AVX2


Would you like to override eigen strong inline for some C++ compilation to reduce the compilation time? [Y/n]:
Eigen strong inline overridden.

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.
        --config=mkl            # Build with MKL support.
        --config=mkl_aarch64    # Build with oneDNN and Compute Library for the Arm Architecture (ACL).
        --config=monolithic     # Config for mostly static monolithic build.
        --config=numa           # Build with NUMA support.
        --config=dynamic_kernels        # (Experimental) Build kernels into separate shared objects.
        --config=v2             # Build TensorFlow 2.x instead of 1.x.
Preconfigured Bazel build configs to DISABLE default on features:
        --config=noaws          # Disable AWS S3 filesystem support.
        --config=nogcp          # Disable GCP support.
        --config=nohdfs         # Disable HDFS support.
        --config=nonccl         # Disable NVIDIA NCCL support.

这版TensorFlow编译SM 3.5会报错，查了一下貌似TensorRT不支持这么低的版本。

代码修改

启用MKL时

从2.4开始TensorFlow将MKL用到的OpenMP从直接下载二进制可执行文件变为了从LLVM项目下载开源代码并编译，带来了一系列问题，这里建议官网下载安装程序（Pre-Built Binaries）安装LLVM（记住安装中写入系统环境变量），作用有二：

用现成的libiomp5md.lib和libiomp5md.dll取代编译过程，MSVC编译LLVM的OpenMP会出错。
将LLVM用到的DLL（在LLVM安装目录\bin下）复制到msys安装目录\usr\bin下，否则在后续编译步骤中使用msys里的bash时没有复制系统环境变量的Path会报错（比如编译Lite相关内容时会报找不到api-ms-win-crt-locale-l1-1-0.dll的错）。

将libiomp5md.lib放到third_party\mkl目录下，并在third_party\mkl\mkl.BUILD的第75行后插入：

cc_import(
    name = "iomp5",
    interface_library = "libiomp5md.lib",
    system_provided = 1,
)

然后将下方Windows编译配置修改为：

cc_library(
    name = "mkl_libs_windows",
    deps = [
        "iomp5"
    ],
    visibility = ["//visibility:public"],
)

将libiomp5md.dll放到系统环境变量里。

将third_party\llvm_openmp\BUILD第74行修改为0，取消强制使用MSVC：

omp_vars_win = {
    "MSVC": 0,
}

没有这一步会报类似于如下错误（这段报错复制自2.4.1版，但2.5.0也会报同样的错误）：

ERROR: D:/output_base/external/llvm_openmp/BUILD.bazel:176:10: C++ compilation of rule '@llvm_openmp//:libiomp5md.dll' failed (Exit 1): ml64.exe failed: error executing command
  cd D:/output_base/execroot/org_tensorflow
  SET CUDA_TOOLKIT_PATH=D:/CUDA
    SET INCLUDE=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\ATLMFC\include;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\include;C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt;C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared;C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um;C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt;C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt
    SET LIB=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\ATLMFC\lib\x64;C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC