Windows下编译带CUDA 11.2的TensorFlow 2.4.1

最新推荐文章于 2024-06-30 20:41:02 发布

A寒树

最新推荐文章于 2024-06-30 20:41:02 发布

阅读量1.3k

点赞数

分类专栏：动态库编译 tensorflow编译 cuda加速文章标签：深度学习 tensorflow windows

本文链接：https://blog.csdn.net/weixin_43140187/article/details/114701582

版权

动态库编译同时被 3 个专栏收录

1 篇文章 0 订阅

订阅专栏

tensorflow编译

1 篇文章 0 订阅

订阅专栏

cuda加速

1 篇文章 0 订阅

订阅专栏

Windows下编译带CUDA 11.2的TensorFlow 2.4.1

参考链接：https://blog.csdn.net/u012440550/article/details/113361176?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.control&dist_request_id=&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.control

https://zhuanlan.zhihu.com/p/259789357

编译好的动态库下载链接：

CPU版：https://download.csdn.net/download/weixin_43140187/15745733

GPU版：https://download.csdn.net/download/weixin_43140187/15745707

环境准备

1. 内存要求

在8个并行任务下（默认并行数为CPU线程数），应有不小于10G的内存，否则会产生编译器堆空间不足的错误。

2. Python & Pip

首先Python需要安装一些包：six、numpy、wheel、setuptools、keras_applications和keras_preprocessing，使用管理员权限打开命令提示符：

pip install six numpy wheel setuptools

pip install keras_applications --no-deps

pip install keras_preprocessing --no-deps

注意，Python路径中不能出现空格，即Windows下默认安装路径C:\Program Files\Python39会在编译时报错，因此如果装到了这个路径，需要在一个没有空格的目录下创建一个链接（不是快捷方式），用mklink命令。

（本人采用的Anaconda3自带的python环境）

3. CUDA

这里选的CUDA 11.1，CUDA官网下载安装，没什么好说的。

4. Bazel

然后是Bazel，bazel很简单，就一个exe，需要设置环境变量给到Path下，我偷懒直接放到CUDA的bin目录下。我选的版本是3.7.2。

5. MSYS2

再安装MSYS2，同样需要给msys64\usr\bin目录设置环境变量。

官方教程少提了一个zip包，因此安装命令如下：

pacman -S git patch unzip zip

6. Visual Studio 2019

然后是VS，下载VS安装器，为避免麻烦，装到C盘默认路径（这次我没有尝试非C盘路径，不知道找不到编译器的bug还在不在）。如果非VS用户，只需安装除必选组件外的MSVC v142 - VS 2019 C++ x64/x86生成工具（随便一个，我选的最新版本）和Windows 10 SDK（同样随便，我选的最新的）。

编译

配置

下载TensorFlow 2.4.1源码，使用CMD进入解压后的根目录，执行：python configure.py

（注意如果编译CPU版， CUDA那一步可以选择N）

D:\tf2\tensorflow-2.4.1>python configure.py

You have bazel 3.7.2 installed.

Please specify the location of python. [Default is C:\Users\XJWT\anaconda3\python.exe]:

Found possible Python library paths:

C:\Users\XJWT\anaconda3\lib\site-packages

Please input the desired Python library path to use. Default is [C:\Users\XJWT\anaconda3\lib\site-packages]

Do you wish to build TensorFlow with ROCm support? [y/N]: N

No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y

CUDA support will be enabled for TensorFlow.

Found CUDA 11.1 in:

C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.1/lib/x64

C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.1/include

Found cuDNN 8 in:

C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.1/lib/x64

C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.1/include

Please specify a list of comma-separated CUDA compute capabilities you want to build with.

You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus. Each capability can be specified as "x.y" or "compute_xy" to include both virtual and binary GPU code, or as "sm_xy" to only include the binary code.

Please note that each additional compute capability significantly increases your build time and binary size, and that TensorFlow only supports compute capabilities >= 3.5 [Default is: 3.5,7.0]: 3.5,7.5

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is /arch:AVX]:

Would you like to override eigen strong inline for some C++ compilation to reduce the compilation time? [Y/n]: Y

Eigen strong inline overridden.

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: N

Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See .bazelrc for more details.

--config=mkl # Build with MKL support.

--config=mkl_aarch64 # Build with oneDNN support for Aarch64.

--config=monolithic # Config for mostly static monolithic build.

--config=ngraph # Build with Intel nGraph support.

--config=numa # Build with NUMA support.

--config=dynamic_kernels # (Experimental) Build kernels into separate shared objects.

--config=v2 # Build TensorFlow 2.x instead of 1.x.

Preconfigured Bazel build configs to DISABLE default on features:

--config=noaws # Disable AWS S3 filesystem support.

--config=nogcp # Disable GCP support.

--config=nohdfs # Disable HDFS support.

--config=nonccl # Disable NVIDIA NCCL support.

更改编译输出路径：

知道源码中的.bazelrc文件，在最后的末尾加上以下代码：

try-import %workspace%/.bazelrc.user

startup --output_user_root=D:/tf2/out

之后开始执行编译代码：

GPU编译指令：

Dll：bazel --output_user_root=D:/tf2/out2 --output_base=D:/tf2/out1 build --config=mkl --config=numa --config=monolithic --define=tensorflow_enable_mlir_generated_gpu_kernels=0 --experimental_strict_action_env=false //tensorflow:tensorflow_cc.dll

Lib：bazel --output_user_root=D:/tf2/out2 --output_base=D:/tf2/out1 build --config=mkl --config=numa --config=monolithic --define=tensorflow_enable_mlir_generated_gpu_kernels=0 --experimental_strict_action_env=false //tensorflow:tensorflow_cc_dll_import_lib

Include：bazel --output_user_root=D:/tf2/out2 --output_base=D:/tf2/out1 build --config=mkl --config=numa --config=monolithic --define=tensorflow_enable_mlir_generated_gpu_kernels=0 --experimental_strict_action_env=false //tensorflow:install_headers

CPU编译指令：

bazel build //tensorflow:tensorflow_cc.dll

bazel build //tensorflow:tensorflow_cc_dll_import_lib

bazel build //tensorflow:install_headers

报错解决方案：

1、启用MKL时报错解决方案

从2.4开始TensorFlow将MKL用到的OpenMP从直接下载二进制可执行文件变为了从LLVM项目下载开源代码并编译，带来了一系列问题，这里建议官网下载安装程序（Pre-Built Binaries）安装LLVM（记住安装中写入系统环境变量），作用有二：

（1）.用现成的libiomp5md.lib和libiomp5md.dll取代编译过程，MSVC编译LLVM的OpenMP会出错。

（2）.将LLVM用到的DLL（在LLVM安装目录\bin下）复制到msys安装目录\usr\bin下，否则在后续编译步骤中使用msys里的bash时没有复制系统环境变量的Path会报错（比如编译Lite相关内容时会报找不到api-ms-win-crt-locale-l1-1-0.dll的错）。

将libiomp5md.lib放到third_party\mkl目录下，并在third_party\mkl\mkl.BUILD的第75行后插入：

cc_import(

name = "iomp5",

interface_library = "libiomp5md.lib",

system_provided = 1,

)

然后将下方Windows编译配置修改为：

cc_library(

name = "mkl_libs_windows",

deps = [

"iomp5"

visibility = ["//visibility:public"]，

)

将libiomp5md.dll放到系统环境变量里。

将third_party\llvm_openmp\BUILD第74行修改为0，取消强制使用MSVC：
omp_vars_win = {

"MSVC": 0,

}

此外，最后链接时找不到DLL，需要将LLVM的libiomp5md.dll放到python.exe同目录下

2、调用不到tensorflow函数错误

.编译完成后，随便写个简单的代码跑一跑，就会报缺少符号的错误，旧版本的tf有人做了补丁添加符号，新版本需在需要的地方加 TF_EXPORT ，根据报错的信息添加。

如果只是简单跑pb图的推断，可以按上文链接里的改，下图是具体改法，如果需要其他的，需要根据具体代码和报错进行修改。

在tensorflow-master\tensorflow\core\public\session.h里

这几个地方加 TF_EXPORT，

并且加 #include "tensorflow/core/platform/macros.h"

在 /tensorflow/core/public/session_options.h 里添加

A寒树

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
打赏
7
评论
Windows下编译带CUDA 11.2的TensorFlow 2.4.1

Windows下编译带CUDA 11.2的TensorFlow 2.4.1参考链接：https://blog.csdn.net/u012440550/article/details/113361176?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-4.control&dist_request_id=&depth_1-utm_source=distribute.pc_rele
复制链接

扫一扫