ubuntu 14.04 编译和使用 tensorflowC++ 接口

2019-2-26 15:25:14
今天在这个环境中使用opencv 读取图像构造tensor 数据 又出问题了,MD,
搞了很久才发现是 包含了tensorflow 一些文件后 使用 opencv 就会出问题
这是tensorflow 1.8 的bug
https://github.com/tensorflow/tensorflow/issues/1924

https://github.com/tensorflow/tensorflow/commit/e6570147c4699518af50d2b08190290003d33aa8#diff-a45ad39a8349c3896b6decf56bcbe304

在这里插入图片描述
按照这里 ,对 tensorflow/BUILD 文件进行修改,增加tensorflow/tf_framework_version_script.lds文件 以及增加对应的内容
或者 按照 https://github.com/tensorflow/tensorflow/issues/14267
build 时 加上 bazel build --config=monolithic

============================================================

记录一下编译tensorflow 过程, MD, 费了N天时间
配置:cuda 8.0, cudnn 7.0.3, ubuntu 14.04, protobuf版本为 3.5.1

一开始 直接下载 master 分支代码, 使用 bazel 0.21, 结果下载文件超慢, 好不容易下载好了 开始编译, 显示 如下错误
Starting local Bazel server and connecting to it…
WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
INFO: Invocation ID: 2c4fe3c3-fc6e-4583-8906-358dd3bf3000
INFO: Analysed target //tensorflow:libtensorflow_cc.so (129 packages loaded, 10734 targets configured).
INFO: Found 1 target…
ERROR: /home/XXX/.cache/bazel/bazel_root/08a772d8033fea3caa85941d64fad1ab/external/protobuf_archive/BUILD:259:1: C++ compilation of rule ‘@protobuf_archive//:protoc_lib’ failed (Exit 127): crosstool_wrapper_driver_is_not_gcc failed: error executing command
(cd /home/XXX/.cache/bazel/bazel_root/08a772d8033fea3caa85941d64fad1ab/execroot/org_tensorflow &&
exec env -
PATH=/bin:/usr/bin
PWD=/proc/self/cwd
external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/host/bin/external/protobuf_archive/objs/protoc_lib/csharp_reflection_class.d '-frandom-seed=bazel-out/host/bin/external/protobuf_archive/objs/protoc_lib/csharp_reflection_class.o’ -iquote external/protobuf_archive -iquote bazel-out/host/genfiles/external/protobuf_archive -iquote bazel-out/host/bin/external/protobuf_archive -isystem external/protobuf_archive/src -isystem bazel-out/host/genfiles/external/protobuf_archive/src -isystem bazel-out/host/bin/external/protobuf_archive/src ‘-std=c++11’ -Wno-builtin-macro-redefined '-D__DATE=“redacted”’ '-D__TIMESTAMP
=“redacted”’ ‘-D__TIME__=“redacted”’ -fPIE -U_FORTIFY_SOURCE ‘-D_FORTIFY_SOURCE=1’ -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections -g0 ‘-march=native’ -g0 -DHAVE_PTHREAD -Wall -Wwrite-strings -Woverloaded-virtual -Wno-sign-compare -Wno-unused-function -Wno-writable-strings -c external/protobuf_archive/src/google/protobuf/compiler/csharp/csharp_reflection_class.cc -o bazel-out/host/bin/external/protobuf_archive/_objs/protoc_lib/csharp_reflection_class.o)
Execution platform: @bazel_tools//platforms:host_platform
: No such file or directory
Target //tensorflow:libtensorflow_cc.so failed to build
INFO: Elapsed time: 10.713s, Critical Path: 0.37s
INFO: 0 processes.
FAILED: Build did NOT complete successfully

在网上搜了很久, 看别人怎么做的, 只看到有个人碰到这样的问题,但没人回答该怎么解决! 渐渐觉得 可能是 某些软件版本兼容性的问题。 看到有人用 1.8 的tensorflow编译
https://blog.csdn.net/qq_37674858/article/details/81095101

git clone https://github.com/tensorflow/tensorflow -b r1.8

与此同时,卸载 原有 bazel, 安装 0.18 版本
卸载命令为

rm -fr ~/.bazel ~/.bazelrc

然后

//先执行
./configure  

bazel build --config=opt --config=cuda //tensorflow:libtensorflow_cc.so  

这样做以后,比较顺畅地编译起来了,但很快碰到第一个问题:
Cannot find libdevice.10.bc under /usr/local/cuda-8.0

这个比较简单, Github 上有说:

cd /usr/local/cuda-8.0/nvvm/libdevice
sudo ln -s libdevice.compute_50.10.bc libdevice.10.bc

解决之, 但很快又碰到第二个问题:
ERROR: /home/XXX/toolCode/tensorflow1.8/tensorflow/tensorflow/core/kernels/BUILD:915:1: output ‘tensorflow/core/kernels/_objs/where_op_gpu/where_op_gpu_impl_2.cu.pic.o’ was not created
ERROR: /home/XXX/toolCode/tensorflow1.8/tensorflow/tensorflow/core/kernels/BUILD:915:1: not all outputs were created or valid
Target //tensorflow:libtensorflow_cc.so failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 272.823s, Critical Path: 66.17s
INFO: 1450 processes: 1450 local.
FAILED: Build did NOT complete successfully
把 错误细节输出来以后, 是 这样的错误:
ror: calling a host function("__builtin_expect") from a device function("cub::AgentSelectIf< ::cub::DispatchSelectIf< ::cub::CountingInputIterator<long long, long> , ::cub::TransformInputIterator<bool, ::tensorflow::functor::_NV_ANON_NAMESPACE::IsNonzero , const float *, long> , ::tensorflow::functor::WhereOutputIterator<(int)2> , long long *, ::cub::NullType, ::cub::NullType, int, (bool)0> ::PtxSelectIfPolicyT, ::cub::CountingInputIterator<long long, long> , ::cub::TransformInputIterator<bool, ::tensorflow::functor::_NV_ANON_NAMES

搜了下, 是 cuda 代码编译相关的问题
https://github.com/tensorflow/tensorflow/issues/19203#issue-321998060
这个比较相似,对 tensorflow/core/platform/macros.h 进行编辑

#if TF_HAS_BUILTIN(__builtin_expect) || (defined(__GNUC__) && __GNUC__ >= 3) 

改成

#if (!defined(__NVCC__)) && (TF_HAS_BUILTIN(__builtin_expect) || (defined(__GNUC__) && __GNUC__ >= 3))

继续编译, 终于显示 完全编译成功, 费了N天功夫。
碰到这样的问题, 从看源码的方法去解决, 工作量无法评估, 只能换版本了。以后要注意这一点。 虽说 人应该有好奇心, 但每次使用一个 产品或者软件,都必须知道内部原理和细节才能使用的话, 没有谁 有这么多时间啊。 只能说 怀念一下 乔布斯了。。

编译出so后, 在tensorflow/bazel-bin/tensorflow目录下可以找到
安装依赖:到 tensorflow/contrib/makefile目录下,找到build_all_xxx.sh文件并执行

总结: 类似编译较为复杂的库这样的事情,写文章应该把 所用的软件版本写清楚,这样才好重现!

评论 10
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值