用C++ API训练tensorflow模型

最新推荐文章于 2024-08-12 08:44:24 发布

keep_forward

最新推荐文章于 2024-08-12 08:44:24 发布

阅读量3k

点赞数

分类专栏： tensorflow 文章标签： tensorflow 训练 c train

本文链接：https://blog.csdn.net/b876144622/article/details/79962583

版权

tensorflow 专栏收录该内容

13 篇文章 3 订阅

订阅专栏

在前面的博客中，已经从源码安装了tensorflow，能够成功编译c++的代码，那么就可以编写c++的代码编写tensorflow的模型，并训练模型。这里可以参考https://github.com/tensorflow/tensorflow/blob/master/tensorflow/cc/tutorials/example_trainer.cc#L49完成利用c++代码完成模型的构建和训练。但按照google上的说法，Auto-differentiation（自动微分，自动求导）功能不够完善，很多在c++的API中还没有集成（https://github.com/tensorflow/tensorflow/issues/4130）。

找到另外一篇博客（https://tebesu.github.io/posts/Training-a-TensorFlow-graph-in-C++-API）这里介绍另外一种用c++ API训练模型的方式。首先也是需要用python撰写网络结构，但是每个节点都需要命名，在c++中直接运行类似sess.run(节点名)的代码即可。

1、利用python构建graph，代码如下：

在Python的代码中，运行时，会提醒tf.initialize_variables这个函数已经被放弃了，可以改用tf.global_variables_initializer()，但是我还没找到如何在c++调用这个初始化(不能给这个操作命名，添加name)，并且继续用tf.initialize_variables，可以正常运行。故目前继续采用这种方式。

（更新2018/1/5，可以用init = tf.group(tf.global_variables_initializer(), name = 'init') 来解决）

import tensorflow as tf
-Python 代码
01
with tf.Session() as sess:
02
x = tf.placeholder(tf.float32, [None, 32], name="x")
03
y = tf.placeholder(tf.float32, [None, 8], name="y")
04

05
w1 = tf.Variable(tf.truncated_normal([32, 16], stddev=0.1))
06
b1 = tf.Variable(tf.constant(0.0, shape=[16]))
07

08
w2 = tf.Variable(tf.truncated_normal([16, 8], stddev=0.1))
09
b2 = tf.Variable(tf.constant(0.0, shape=[8]))
10

11
a = tf.nn.tanh(tf.nn.bias_add(tf.matmul(x, w1), b1))
12
y_out = tf.nn.tanh(tf.nn.bias_add(tf.matmul(a, w2), b2), name="y_out")
13
cost = tf.reduce_sum(tf.square(y-y_out), name="cost")
14
optimizer = tf.train.AdamOptimizer().minimize(cost, name="train")
15

16
init = tf.initialize_variables(tf.all_variables(), name='init_all_vars_op')
17
tf.train.write_graph(sess.graph_def,
18
'./',
19
'mlp.pb', as_text=False)

2、编写c++代码读取pb文件，并读取数据，开始训练

代码如下：
-Cpp 代码
01
#include "tensorflow/core/public/session.h"
02
#include "tensorflow/core/graph/default_device.h"
03
using namespace tensorflow;
04

05
int main(int argc, char* argv[]) {
06

07
std::string graph_definition = "mlp.pb";
08
Session* session;
09
GraphDef graph_def;
10
SessionOptions opts;
11
std::vector<Tensor> outputs; // Store outputs
12
TF_CHECK_OK(ReadBinaryProto(Env::Default(), graph_definition, &graph_def));
13

14
// Set GPU options
15
graph::SetDefaultDevice("/gpu:0", &graph_def);
16
opts.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(0.5);
17
opts.config.mutable_gpu_options()->set_allow_growth(true);
18

19
// create a new session
20
TF_CHECK_OK(NewSession(opts, &session));
21

22
// Load graph into session
23
TF_CHECK_OK(session->Create(graph_def));
24

25
// Initialize our variables
26
TF_CHECK_OK(session->Run({}, {}, {"init_all_vars_op"}, nullptr));
27

28
Tensor x(DT_FLOAT, TensorShape({100, 32}));
29
Tensor y(DT_FLOAT, TensorShape({100, 8}));
30
auto _XTensor = x.matrix<float>();
31
auto _YTensor = y.matrix<float>();
32

33
_XTensor.setRandom();
34
_YTensor.setRandom();
35

36
for (int i = 0; i < 10; ++i) {
37

38
TF_CHECK_OK(session->Run({{"x", x}, {"y", y}}, {"cost"}, {}, &outputs)); // Get cost
39
float cost = outputs[0].scalar<float>()(0);
40
std::cout << "Cost: " << cost << std::endl;
41
TF_CHECK_OK(session->Run({{"x", x}, {"y", y}}, {}, {"train"}, nullptr)); // Train
42
outputs.clear();
43
}
44

45

46
session->Close();
47
delete session;
48
return 0;
49
}

3、编译运行

编译运行c++代码的时候，跟前面博客介绍的一样，有两种方式，一是直接用bazel编译，二是用so库+gcc的方式编译。

3.1 使用libtensorflow.so库 + gcc编译
在某个文件下，比如/home/heke/test/train_with_cpp下，保存步骤1的代码为build_graph.py，运行python代码，得到mlp.pb文件。
保存步骤2的代码为train.cc

输入编译的命令
-Bash 代码
1
gcc -std=c++11 -I /usr/local/include/tf -L /usr/local/lib train.cc -ltensorflow
但出现了如下问题（暂未解决）：
/usr/local/include/tf/third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1:42: fatal error: unsupported/Eigen/CXX11/Tensor: No such file or directory

3.2 用bazel的方式编译

在tensorflow-master/tensorflow/heke目录下，新建train_with_cpp目录，再在train_with_cpp目录下新建cpp和py两个文件夹。
保存步骤1的代码到py文件夹下，运行build_graph.py,得到mlp.pb文件
保存步骤2的代码到cpp文件夹下，得到train.cc文件
新建BUILD文件，粘贴如下代码：
-Bash 代码
01
load("//tensorflow:tensorflow.bzl", "tf_cc_binary")
02

03
tf_cc_binary(
04
name = "train_inCpp",
05
srcs = ["train_inCpp.cc"],
06
deps = [
07
"//tensorflow/cc:cc_ops",
08
"//tensorflow/cc:client_session",
09
"//tensorflow/core:tensorflow",
10
],
11
)

然后在cpp目录下，运行 bazel build :train

build完成后，就会在tensorflow-master/bazel-bin/tensorflow/heke/train_with_cpp/cpp目录下生成可执行文件，直接运行即可 ./train
出现类似下面的cost

4、移植到其它机器上运行

把cpp整个目录复制到其它机器上，并把libtensorflow_cc.so和libtensorflow.so、libtensorflow_framework.so放到/usr/local/lib下，就可以运行。

参考博客：

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/cc/tutorials/example_trainer.cc

https://medium.com/jim-fleming/loading-a-tensorflow-graph-with-the-c-api-4caaff88463f

https://tebesu.github.io/posts/Training-a-TensorFlow-graph-in-C++-API

https://www.tensorflow.org/api_guides/cc/guide

https://matrices.io/training-a-deep-neural-network-using-only-tensorflow-c/