最新tensorflow 2.2.0源码编译，支持gpu加速，附python、c++验证

最新推荐文章于 2024-10-09 17:25:12 发布

迷途小书童的Note

最新推荐文章于 2024-10-09 17:25:12 发布

阅读量3.4k

点赞数 3

分类专栏：人工智能文章标签： tensorflow python 源码编译机器学习 cuda

本文链接：https://blog.csdn.net/djstavaV/article/details/106290915

版权

人工智能专栏收录该内容

27 篇文章 3 订阅

订阅专栏

文章首发

https://xugaoxiang.com/2020/05/22/compile-tensorflow2-with-gpu/

软硬件环境

ubuntu 18.04 64bit
anaconda3 with python 3.7.6
tensorflow 2.2.0
bazel 2.0.0
cuda 10.1
cudnn 7.6.5
gcc 7
nvidia gtx 1070Ti

tensorflow简介

tensorflow是谷歌公司推出的开源机器学习框架，它提供了c++、python、java、javascript、go等语言的API，具有快速、灵活并适合产品级大规模应用等特点，让每个开发者都能方便地使用人工智能来解决多样化的实际问题，因此非常受欢迎。

tensorflow的命名来源于本身的运行原理。tensor(张量)意味着N维数组，flow(流)意味着基于数据流图的计算，tensorflow是张量从流图的一端流动到另一端计算过程。

tensorflow中的计算可以表示为一个有向图，或称计算图，其中每一个运算操作将作为一个节点，节点间的链接叫做边。这个计算图描述了数据的计算流程，它也负责维护和更新状态，用户可以对计算图的分支进行条件控制和循环操作。计算图中的每一个节点可以有任意多个输入和输出，每一个节点描述了一种运算操作，节点可以算是运算操作的实例化。

准备工作

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fxwgbQrD-1590152654982)(https://code.xugaoxiang.com/xugaoxiang/blog/raw/master/images/ai/tf/tf_build_01.png)]

安装python环境

我们使用anaconda，详细安装使用方法可以参考文章 anaconda使用。

为了加快conda安装软件的速度，使用国内的清华源，编辑文件~/.condarc，添加

channels:
  - defaults
show_channel_urls: true
default_channels:
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
  - https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/r
custom_channels:
  conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  msys2: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  menpo: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
  pytorch: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud

接下来创建一个独立、干净的虚拟环境tfgpu

conda create -n tfgpu python=3.7
conda activate tfgpu

安装protobuf

protobuf也是谷歌家的产品，它是一种数据交换/存储的格式，我们通过conda来安装，当前默认的版本是3.11.4

conda install protobuf

在我编译tensorflow-2.2.0过程中，protobuf-3.11.4并没有报错，如果在你的环境中出错了，可以查看文件tensorflow/workspace.bzl中protobuf的版本，然后进行安装

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zzU0qiOI-1590152654984)(https://code.xugaoxiang.com/xugaoxiang/blog/raw/master/images/ai/tf/tf_build_02.png)]

如tensorflow-2.2.0对应的protobuf-3.8.0，安装方法如下

wget https://github.com/protocolbuffers/protobuf/releases/download/v3.8.0/protobuf-all-3.8.0.tar.gz
tar xvf protobuf-all-3.8.0.tar.gz
cd protobuf-3.8.0
./autogen.sh
./configure
make
sudo make install

安装cuda和cuDNN

我们选择目前主流的cuda 10.1和cudnn 7.6.5，可以参考文章 ubuntu安装CUDA

tensorflow版本

本文选择目前最新的正式版2.2.0，可以直接到官网下载压缩包并解压，地址: https://github.com/tensorflow/tensorflow/releases/tag/v2.2.0

tar xvf tensorflow-2.2.0.tar.gz

安装bazel

bazel是谷歌推出的一套工程构建系统，它的版本选择直接影响到tensorflow的源码编译，我们可以通过查看tensorflow源码目录下的文件configure.py，找到下面的语句

_TF_MIN_BAZEL_VERSION = '2.0.0'
_TF_MAX_BAZEL_VERSION = '2.0.0'

可以看到tensorflow 2.2.0版本要求的bazel版本号是2.0.0。我们直接来到bazel的站点下载2.0.0，地址是: https://github.com/bazelbuild/bazel/releases，这里下载的文件是二进制文件是bazel-2.0.0-linux-x86_64，然后执行

sudo mv bazel-2.0.0-linux-x86_64 /usr/bin/bazel
sudo chmod a+x /usr/bin/bazel

安装必要的软件包

这些工具也是在编译tensorflow中需要用到的

pip install numpy six keras_preprocessing

GCC版本

ubuntu 18.04默认使用的gcc版本号是7.5.0，这里编译过程中没有出现问题。如果出现相关问题，可以考虑给gcc降版本，因为官方并没有在这个版本上测试过

sudo apt install gcc-6 g++-6

然后进行版本切换

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-6 100
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/gcc-6 100

如果想切换回gcc-7的话，使用类似的命令，增大最后一个参数即优先级，如101

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 101
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/gcc-7 101

编译过程

首先进行项目配置，执行configure

cd tensorflow-2.2.0
./configure

这时候终端会出现一系列的选项，这里需要根据你的需要进行选择，本文的配置是这样的

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xXEmHVZN-1590152654985)(https://code.xugaoxiang.com/xugaoxiang/blog/raw/master/images/ai/tf/tf_build_06.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pkMxmT7J-1590152654988)(https://code.xugaoxiang.com/xugaoxiang/blog/raw/master/images/ai/tf/tf_build_07.png)]

接下来就执行编译的命令

bazel build --verbose-failures --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

如果需要编译c++开发中需要的动态链接库，需要使用下面的命令

bazel build --verbose-failures --noincompatible_do_not_split_linking_cmdline --config=opt --config=cuda //tensorflow:libtensorflow_cc.so //tensorflow:install_headers

或者将两个target写在一起也是可以的

bazel build --verbose-failures --noincompatible_do_not_split_linking_cmdline --config=opt --config=cuda //tensorflow:libtensorflow_cc.so //tensorflow:install_headers //tensorflow/tools/pip_package:build_pip_package

常见的外部库链接错误

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-rAN9doAn-1590152654989)(https://code.xugaoxiang.com/xugaoxiang/blog/raw/master/images/ai/tf/tf_build_05.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Tl2b1RLa-1590152654990)(https://code.xugaoxiang.com/xugaoxiang/blog/raw/master/images/ai/tf/tf_build_06.png)]

可以通过使用参数--noincompatible_do_not_split_linking_cmdline来解决，具体的可以参考这个issue https://github.com/tensorflow/tensorflow/issues/35623

整个编译时间依赖于你的机器配置

测试发现，同样的编译命令在16G的机器上运行时，会出现系统崩溃的情况，原因是out of memory，在32G内存的机器上没有问题。我们可以通过调整bazel的以下参数来完成编译

--local_ram_resources 使用RAM的大小，单位是MB
--local_cpu_resources 使用CPU的核数
--jobs 并发数

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CwsQJ376-1590152654991)(https://code.xugaoxiang.com/xugaoxiang/blog/raw/master/images/ai/tf/tf_build_08.png)]

最后，我们去生成python安装需要的whl文件

sudo ./tensorflow/tools/pip_package/build_pip_package.sh /tmp/tensorflow_pkg

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-2qCEc899-1590152654992)(https://code.xugaoxiang.com/xugaoxiang/blog/raw/master/images/ai/tf/tf_build_09.png)]

这里的/tmp/tensorflow_pkg是用于存放whl文件的地方，你可以任意指定，成功生成后，我们来安装

pip install /tmp/tensorflow_pkg/tensorflow-2.2.0-cp37-cp37m-linux_x86_64.whl

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-W5XEB5mr-1590152654992)(https://code.xugaoxiang.com/xugaoxiang/blog/raw/master/images/ai/tf/tf_build_10.png)]

python验证

我们打开ipython，这里注意不要在tensorflow源码目录下打开ipython，否则会报错,添加简单的tensorflow测试代码，看看会不会报错以及相应的输出信息

import tensorflow as tf

tf.__version__
tf.test.is_gpu_available()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-fSC1QOzE-1590152654993)(https://code.xugaoxiang.com/xugaoxiang/blog/raw/master/images/ai/tf/tf_build_11.png)]

由上图可知，安装的版本号是2.2.0，且gpu也可以正常被使用

c++验证

libtensorflow_cc的target编译成功后，在bazel-bin/tensorflow目录下会生成相应的头文件(include)和动态链接库(libtensorflow_cc.so和libtensorflow_framework.so.2，这两个都是软链接文件)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-i6pQPtw5-1590152654994)(https://code.xugaoxiang.com/xugaoxiang/blog/raw/master/images/ai/tf/tf_build_03.png)]

使用clion集成开发工具创建基于cmake的工程，编写源码文件main.cpp，这个实例来自网络

#include <iostream>
#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/tensor.h"

using namespace tensorflow;
using namespace tensorflow::ops;

int main()
{
    Scope root = Scope::NewRootScope();

    // Matrix A = [3 2; -1 0]
    auto A = Const(root, { {3.f, 2.f}, {-1.f, 0.f} });
    // Vector b = [3 5]
    auto b = Const(root, { {3.f, 5.f} });
    // v = Ab^T
    auto v = MatMul(root.WithOpName("v"), A, b, MatMul::TransposeB(true));

    std::vector<Tensor> outputs;
    ClientSession session(root);

    // Run and fetch v
    TF_CHECK_OK(session.Run({v}, &outputs));
    std::cout << "tensorflow session run ok" << std::endl;
    // Expect outputs[0] == [19; -3]
    std::cout << outputs[0].matrix<float>();

    return 0;
}

接下来编写规则CMakeLists.txt

project(libtf)
cmake_minimum_required(VERSION 3.0)

add_definitions(-std=c++11)

set(TENSORFLOW_ROOT_DIR /home/xugaoxiang/Downloads/tensorflow-2.2.0-cc)

include_directories(
        ${TENSORFLOW_ROOT_DIR}/bazel-bin/tensorflow/include
)

aux_source_directory(./ DIR_SRCS)

link_directories(/home/xugaoxiang/Downloads/tensorflow-2.2.0-cc/bazel-bin/tensorflow)

add_executable(libtf ${DIR_SRCS})
#target_link_libraries(libtf
#        tensorflow_cc
#        tensorflow_framework
#        )

target_link_libraries(libtf /home/xugaoxiang/Downloads/tensorflow-2.2.0-cc/bazel-bin/tensorflow/libtensorflow_cc.so /home/xugaoxiang/Downloads/tensorflow-2.2.0-cc/bazel-bin/tensorflow/libtensorflow_framework.so.2)

这里需要注意，两个库的名字是不一样的，这里直接些写上了绝对路径，然后编译运行

mkdir build
cd build
cmake ..
make
./libtf

执行结果如下

(base) xugaoxiang@1070Ti:~/CLionProjects/libtf/build$ ./libtf 
2020-05-22 16:30:25.469170: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3092885000 Hz
2020-05-22 16:30:25.469647: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560f48a29630 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-05-22 16:30:25.469694: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-05-22 16:30:25.473522: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-22 16:30:25.613992: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560f48988550 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-05-22 16:30:25.614051: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1070 Ti, Compute Capability 6.1
2020-05-22 16:30:25.615325: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:03:00.0 name: GeForce GTX 1070 Ti computeCapability: 6.1
coreClock: 1.683GHz coreCount: 19 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 238.66GiB/s
2020-05-22 16:30:25.615715: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-22 16:30:25.619174: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-22 16:30:25.621307: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-22 16:30:25.621587: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-22 16:30:25.623604: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-22 16:30:25.625051: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-22 16:30:25.629452: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-22 16:30:25.630598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-05-22 16:30:25.630633: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-22 16:30:25.631317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-22 16:30:25.631335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2020-05-22 16:30:25.631342: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2020-05-22 16:30:25.632489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6477 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
tensorflow session run ok
19

注意到gpu被正确识别且使用，达到了预期，完美