Tensoflow c++ so编译 基于bazel

本文详细记录了手动编译TensorFlow 2.8.0以支持Ampere架构GPU的过程,包括选择编译环境、下载源码、配置编译选项、设置CUDA和TensorRT路径、编译参数以及最终整理库目录的步骤。通过这个过程,可以对比TF32和FP32的推理速度,并为特定硬件环境定制TensorFlow库。
摘要由CSDN通过智能技术生成

每日一歌,分享好心情: 叶卡捷琳娜战歌

一、起点

  1. 为什么编译tensorfflow?
    nvidia 2020年发布的Ampere架构gpu支持TF32,为了比对TF32、FP32推理速度,而公司现在用的Tensorflow版本太老,不支持TF32, so,升级TF。

  2. 编译环境怎么选?
    由于我需要编译支持GPU的tensorflow,所以需要在cuda环境编译。一般是选择nvidia官方devel镜像(关于这方面有很多有趣的话题,在以后的文章中聊吧),结合我的GPU硬件情况,选择nvidia/cuda:11.4.2-cudnn8-devel-ubuntu18.04, 编译过程都在此docker中进行(话说,在企业中离了docker真过不了~~)

二、环境准备

  1. 下载tf源码
    没什么可说的,clone吧 ,附上github源码地址: https://github.com/tensorflow/tensorflow
    我使用最新的2.8.0版本(2022-04-29)。

  2. 安装编译工具bazel
    这里有个小坑,不同版本的tf对bazel版本要求不同。
    下载完源码后,在源码顶层目录configure.py文件中有对bazel的要求,

_TF_MIN_BAZEL_VERSION = '4.2.1'
_TF_MAX_BAZEL_VERSION = '4.99.0'

本次编译选择 4.2.1版本,bazel安装过程嘛,可以按照tf官方给的教程https://www.tensorflow.org/install/source#install_bazel,其实,直接下载bazel可执行程序更简单,下载地址(https://github.com/bazelbuild/bazel/releases/download/4.2.1/bazel-4.2.1-linux-x86_64),下载完成后加入PATH路径,再ln一下,ok~~
在这里插入图片描述

三、编译配置

  1. 配置build
    在源码顶层目录下,运行
./configure

配置过程中最重要的步骤是开启GPU支持以及设置计算能力
ps:计算能力这个被nvidia造出来的概念也会在后续文章中聊聊,比如怎么识别gpu的计算能力、计算能力一般用在哪里、怎样在计算能力是7.5的环境中编译出支持8.0的库…。

  1. 选择python版本:
    在这里插入图片描述

  2. 开启gpu支持:
    在这里插入图片描述

  3. 开启TF-TRT支持:
    如果需要的话就开启,一般是需要滴…毕竟TRT是老黄的嫡系呀,加速真是杠杠的…
    这里需要提前下载好TensorRT 官方build包(晕,坑有点多,还需要去nvidia官网注册个会员),附上地址https://developer.nvidia.com/nvidia-tensorrt-8x-download
    下载好,我直接解压到根目录了,毕竟咱是在自己的docker中嘛,随便折腾…
    在这里插入图片描述
    后面编译需要用到TRT,那就得让TF知道TRT在哪儿呀,so,设置环境变量

export  TENSORRT_INSTALL_PATH=/TensorRT-8.2.1.8/

ps: 感兴趣的话,你可以在源码里可以看一下这个环境是什么时候起作用的,,我就不说了,说多了都是泪呀

  1. 设置计算能力:

这一步需要根据自己编译出来的库的gpu运行环境设置,我这里因为需要运行在安培架构的GPU上,需要额外设置,不能用默认的,其实也可以,只不过inference速度会慢一丢丢,具体设置步骤参考: 《计算能力那些事儿》。
在这里插入图片描述
其他选项使用默认即可。

其实,./configure程序会根据我们的输入产生一个叫**.tf_configure.bazelrc**的配置文件(如果你是大牛或者疯子,直接编辑这个文件也可以).

以下是此次生成的文件内容:

build --action_env PYTHON_BIN_PATH="/usr/bin/python3"
build --action_env PYTHON_LIB_PATH="/usr/lib/python3/dist-packages"
build --python_path="/usr/bin/python3"
build --config=tensorrt
build --action_env CUDA_TOOLKIT_PATH="/usr/local/cuda-11.4"
build --action_env TENSORRT_INSTALL_PATH="/TensorRT-8.2.1.8/"
build --action_env TF_CUDA_COMPUTE_CAPABILITIES="6.1,7.0,7.5,8.0,8.6"
build --action_env LD_LIBRARY_PATH="/workspace/third_party/source/w16/third_party/prebuilt/tensorrt8_install/linux-x64-cuda114/lib:/workspace/third_party/source/w16/third_party/prebuilt/openvino_install/linux-x64-cuda114/lib/intel64:/workspace/third_party/source/w16/third_party/prebuilt/caffe_install/linux-x64-cuda114/lib:/workspace/third_party/source/w16/third_party/prebuilt/pytorch_install/linux-x64-cuda114/lib:/workspace/third_party/source/w16/third_party/prebuilt/onnx_install/linux-x64-cuda114/lib:/workspace/third_party/source/w16/third_party/prebuilt/ncnn_install/linux-x64-cuda114/lib"
build --action_env GCC_HOST_COMPILER_PATH="/usr/bin/x86_64-linux-gnu-gcc-7"
build --config=cuda
build:opt --copt=-Wno-sign-compare
build:opt --host_copt=-Wno-sign-compare
test --flaky_test_attempts=3
test --test_size_filters=small,medium
test --test_env=LD_LIBRARY_PATH
test:v1 --test_tag_filters=-benchmark-test,-no_oss,-no_gpu,-oss_serial
test:v1 --build_tag_filters=-benchmark-test,-no_oss,-no_gpu
test:v2 --test_tag_filters=-benchmark-test,-no_oss,-no_gpu,-oss_serial,-v1only
test:v2 --build_tag_filters=-benchmark-test,-no_oss,-no_gpu,-v1only

四、开始真正的编译

提前说好,这是一个漫长的过程
在tensorflow源码顶层目录,运行

bazel build --config=opt --config=cuda --verbose_failures  //tensorflow:libtensorflow_cc.so //tensorflow:install_headers

如果你不了解bazel编译的话,这个命令乍看起来可能有点晕,这里只简单介绍以下,后续文章再详谈bazel编译系统。

  • –config=cuda: 指定编译配置,至于cuda都包含哪些配置,可以查看.bazelrc
  • –verbose_failures: 你懂的
  • //tensorflow:libtensorflow_cc.so:指定编译/tensorflow目录下的libtensorflow_cc.so目标(因为我需要tf的c++接口嘛)
  • //tensorflow:install_headers: 指定编译/tensorflow目录下的install_headers目标

让bazel飞一会儿…耐心等待…
编译过程是真的慢,用到了很多第三方库,我大胆预测,这将是你程序猿生丫编译程序最long的一次~~
下面是我的编译日志,贴出来吧,没啥用,就是为了纪念这次编译。

Starting local Bazel server and connecting to it...
WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'build' from /workspace/tf/tensorflow-2.8.0/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /workspace/tf/tensorflow-2.8.0/.bazelrc:
  'build' options: --define framework_shared_object=true --java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --host_java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library
INFO: Reading rc options for 'build' from /workspace/tf/tensorflow-2.8.0/.tf_configure.bazelrc:
  'build' options: --action_env PYTHON_BIN_PATH=/usr/bin/python3 --action_env PYTHON_LIB_PATH=/usr/lib/python3/dist-packages --python_path=/usr/bin/python3 --config=tensorrt --action_env CUDA_TOOLKIT_PATH=/usr/local/cuda-11.4 --action_env TENSORRT_INSTALL_PATH=/TensorRT-8.2.1.8/ --action_env TF_CUDA_COMPUTE_CAPABILITIES=3.5,6.1,7.0,7.5,8.6 --action_env LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/TensorRT-8.2.1.8/lib:/opt/aibee/protobuf/lib/ --action_env GCC_HOST_COMPILER_PATH=/usr/bin/x86_64-linux-gnu-gcc-7 --config=cuda
INFO: Reading rc options for 'build' from /workspace/tf/tensorflow-2.8.0/.bazelrc:
  'build' options: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/fallback,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils
INFO: Found applicable config definition build:short_logs in file /workspace/tf/tensorflow-2.8.0/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /workspace/tf/tensorflow-2.8.0/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:tensorrt in file /workspace/tf/tensorflow-2.8.0/.bazelrc: --repo_env TF_NEED_TENSORRT=1
INFO: Found applicable config definition build:cuda in file /workspace/tf/tensorflow-2.8.0/.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda
INFO: Found applicable config definition build:opt in file /workspace/tf/tensorflow-2.8.0/.tf_configure.bazelrc: --copt=-Wno-sign-compare --host_copt=-Wno-sign-compare
INFO: Found applicable config definition build:cuda in file /workspace/tf/tensorflow-2.8.0/.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda
INFO: Found applicable config definition build:linux in file /workspace/tf/tensorflow-2.8.0/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /workspace/tf/tensorflow-2.8.0/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
Loading:  (1 packages loaded)
Loading: 1 packages loaded
Analyzing: 2 targets (2 packages loaded, 0 targets configured)
Analyzing: 2 targets (68 packages loaded, 218 targets configured)
Analyzing: 2 targets (149 packages loaded, 4788 targets configured)
Analyzing: 2 targets (215 packages loaded, 13237 targets configured)
Analyzing: 2 targets (241 packages loaded, 28753 targets configured)
INFO: Analyzed 2 targets (241 packages loaded, 28758 targets configured).
INFO: Found 2 targets...
[0 / 7] [Prepa] BazelWorkspaceStatusAction stable-status.txt
[376 / 3,292] checking cached actions
[624 / 3,584] Linking external/nsync/libnsync_cpp.so; 0s local
[791 / 3,589] Linking external/nsync/libnsync_cpp.so; 2s local ... (2 actions, 1 running)
[1,069 / 3,761] Linking external/nsync/libnsync_cpp.so; 6s local ... (2 actions running)
[1,315 / 3,888] Linking external/nsync/libnsync_cpp.so; 8s local ... (2 actions running)
[1,636 / 6,949] Linking external/com_google_absl/absl/base/libmalloc_internal.so; 1s local
[2,051 / 9,341] checking cached actions
[2,612 / 10,002] checking cached actions
[11,328 / 15,022] [Prepa] Writing file tensorflow/core/kernels/mlir_generated/libminimum_gpu_i8_i8_kernel_generator.so-2.params
[13,148 / 17,690] checking cached actions
[16,206 / 22,395] checking cached actions
[20,075 / 28,871] Linking external/llvm-project/mlir/libX86VectorToLLVMIRTranslation.so; 0s local ... (2 actions running)
[22,736 / 30,167] checking cached actions
[24,208 / 30,167] checking cached actions
[25,739 / 30,167] checking cached actions
[26,947 / 30,167] Linking tensorflow/core/kernels/linalg/libmatrix_set_diag_op_gpu.so; 0s local ... (2 actions, 1 running)
[28,002 / 30,167] Linking tensorflow/compiler/mlir/tfr/libgraph_decompose_pass.so; 0s local ... (6 actions, 5 running)
[30,165 / 30,167] [Prepa] Executing genrule //tensorflow:libtensorflow_cc.so.2_sym
INFO: Elapsed time: 284.425s, Critical Path: 156.23s
INFO: 6111 processes: 4073 internal, 2038 local.
INFO: Build completed successfully, 6111 total actions
INFO: Build completed successfully, 6111 total actions

五、整理库目录

编译完了,我们需要将编译成功的库拷贝出来给别人用,
下面将so及头文件整理一下

mkdir -p  tf_install/include/ #我们的头文件放在此文件夹下
mkdir -p  tf_install/lib/     #库文件放在此文件夹下
cp -rd bazel-bin/tensorflow/include/* tf_install/include/
cp -rd  bazel-bin/tensorflow/libtensorflow* tf_install/lib/

include 目录:
在这里插入图片描述
lib目录:
在这里插入图片描述

ok,万事大吉~~干饭

六、题外话

如果漂亮国真禁用github或者限个龟速,吾辈休矣!

  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

小白龙呢

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值