CentOS 6.3 上编译TensorFlow 2.4 so动态链接库

最新推荐文章于 2021-04-18 20:46:25 发布

wxj1992

最新推荐文章于 2021-04-18 20:46:25 发布

阅读量641

点赞数 1

文章标签： tensorflow c++ 机器学习后端深度学习

本文链接：https://blog.csdn.net/wxj1992/article/details/107965877

版权

关于tensorflow（以下简称tf），在实际的生产环境里，经常会有在线预测的需求，用python训练好模型供在服务使用来进行实时的预测，很通行的一种做法是把tf编译成so动态库供c++的在线预测服务调用，关于这类的文章也不少，而且如果是用官方指定的Ubuntu，整个编译过程也很轻松愉悦，没什么坑，但很不幸的是，我们的线上机器主要是Centos 6.3的，不光不是Ubuntu，系统版本比较老，各种组件也比较老，我这里要编译最新版的tf，找了一圈也没找到很相关的文章，原本也担心glibc的版本比较老是否能支持最新的tf，最终结合官方文档踩了很多坑所幸最终还是编译成功并跑通了，这里记录下关键点。

1. 环境和组件

1.系统：Centos 6.3 final，因为不想动公司机器的环境所以选了个相同版本的镜像在虚拟机里编。
下载地址：https://archive.kernel.org/centos-vault/6.3/isos/x86_64/CentOS-6.3-x86_64-bin-DVD1.iso。系统装完后更新一下，不然可能会有奇怪的问题。
2.Python3：参照了官网的支持版本，这里选的是3.7.3，光是编译so lib其实不需要python，考虑到安装编译pip类似所以顺便也装了。
下载地址：https://www.python.org/ftp/python/3.7.3/Python-3.7.3.tgz，
3.gcc：因为要编译最新的tf参照官网的支持版本选择了 GCC 7.3.1，这里我图省事直接用的yum安装并临时设的路径：
yum -y install centos-release-scl
yum -y install devtoolset-7-gcc devtoolset-7-gcc-c++ devtoolset-7-binutils
scl enable devtoolset-7 bash
4.bazel：tf用的构建工具，最新的tf需要用到3.1.0，centos6上也不好编译安装，所幸找到了大佬编译好的可执行文件，直接拿来用就行。
下载地址：https://github.com/sub-mod/bazel-builds/releases

2. 编译步骤

（1）编译安装python以及对应依赖
如果只是编译so这一步可以跳过。要在本机编译pip软件包需要用对应的GCC版本安装编译python，并且安装相应依赖：
pip install -U --user pip six numpy wheel setuptools mock ‘future>=0.17.1’
pip install -U --user keras_applications --no-deps
pip install -U --user keras_preprocessing --no-deps
这里有个比较坑的点在于python3.7对ssl的版本有要求，centos6.5带的版本太老会导致pip报ssl相关的错误，需要自己去手动安装libressl，这个网上有很多相关教程，不再赘述。

（2）安装bazel需要的openjdk1.8，这个网上教程也很多。

（3）git clone tf的代码库并选择需要的版本分支，当前的master是2.4：
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git checkout branch_name # r1.9, r1.10, etc.

（4）配置build
./configure
这里贴一个官网的configure例子，比较老，但整体上一样，按需选择即可。

You have bazel 0.15.0 installed.
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python2.7

Found possible Python library paths:
  /usr/local/lib/python2.7/dist-packages
  /usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is [/usr/lib/python2.7/dist-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]:
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]:
Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]:
Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon AWS Platform support? [Y/n]:
Amazon AWS Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]:
Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]:
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]:
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]:
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: Y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 9.0

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.0

Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Do you wish to build TensorFlow with TensorRT support? [y/N]:
No TensorRT support will be enabled for TensorFlow.

Please specify the NCCL version you want to use. If NCLL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]: 1.3

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus
Please note that each additional compute capability significantly increases your
build time and binary size. [Default is: 3.5,7.0] 6.1

Do you want to use clang as CUDA compiler? [y/N]:
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:

Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
    --config=mkl            # Build with MKL support.
    --config=monolithic     # Config for mostly static monolithic build.
Configuration finished

其中Please specify optimization flags to use during compilation when bazel option “–config=opt” is specified [Default is -march=native]:需要注意下，这个是选择编译产出的优化选项，默认是march=native，也就是根据当前编译机器的架构的指令集等来优化，如果是要为别的平台编需要选择对应的架构，可以参考这份gcc官方文档：
https://gcc.gnu.org/onlinedocs/gcc-7.5.0/gcc/x86-Options.html#x86-Options，因为我的编译机器是AMD的，目标用户机器是intel的haswell架构的，所以我这里定的是march=haswell。

（5）执行编译so
bazel build --config=opt //tensorflow:libtensorflow_cc.so
这里碰到一个link error问题，export BAZEL_LINKLIBS=-l%:libstdc++.a解决了。

3. 收集产出的so和头文件用于编译自己的服务

（1）编译完成后可以看到产出的so文件：
bazel-bin文件夹下面是so文件和相关的头文件，也就是libensorflow_cc.so.2和libensorflow_framewok.so.2。
在这里插入图片描述
（2）找到需要的头文件：我们知道so文件要用起来还需要用include描述性的头文件，具体需要哪些没看到靠谱的文档，比较明确的只有core和cc文件夹是要包含的，其余的第三方依赖我是根据自己服务的编译报错一个个找的，最终发现需要以下几个：
cc
bazel-bin/cc

core
bazel-bin/core

absl
/root/.cache/bazel/_bazel_root/948ab1d53a88eb529c08a9574bfb5faf/external/com_google_absl/absl

eigen3
third_party/eigen3
/root/.cache/bazel/_bazel_root/948ab1d53a88eb529c08a9574bfb5faf/external/org_tensorflow/bazel-tensorflow/external/eigen_archive/

protobuf
/root/.cache/bazel/_bazel_root/948ab1d53a88eb529c08a9574bfb5faf/external/com_google_protobuf/src/google/protobuf

（3）指定头文件跟动态链接库编译自己的服务调用训练的模型，这块就不再多说了。

4. tips

最好挂一个国外的代理，不然下载代码库和各依赖组件很崩溃
export HTTPS_PROXY=http://me:mypassword@myproxyserver.domain.com:myport
export HTTP_PROXY=http://me:mypassword@myproxyserver.domain.com:myport

参考：
https://www.tensorflow.org/install/source#linux
github
stackoverflow