1.安装环境
centos
cuda 9
cudnn 7
tensorflow serving r1.12 以及tensorflow1.12通过编译.
2.非gpu版本tf_serving安装
- tf_serving的安装(非gpu版本)(cuda 9 cudnn 7)
- 编译tf_serving的过程
- git clone -b r1.3 --recurse-submodules https://github.com/tensorflow/serving
- 进入serving/tensorflow目录,运行./configure 配置构建文件
bazel build -c opt --config=cuda tensorflow_serving/...
目前是bazel build -c opt tensorflow_serving/...
通过编译
- 编译tf_serving中遇到的问题以及解决方法
- 1. no such package ‘@protobuf//’: java.io.IOException: Error downloading
解决方法:sed -i '\@https://github.com/google/protobuf/archive/0b059a3d8a8f8aa40dde7bea55edca4ec5dfea66.tar.gz@d' tensorflow/workspace.bzl
- 2. no such target '@org_tensorflow//third_party/gpus/crosstool:crosstool’
解决方法:- 编辑 tools/bazel.rc 文件,把
@org_tensorflow//third_party/gpus/crosstool
修改成@local_config_cuda//crosstool:toolchain
- 执行:bazel clean --expunge && export TF_NEED_CUDA=1
- 执行:bazel query ‘kind(rule, @local_config_cuda//…)’
- 编辑 tools/bazel.rc 文件,把
- 3. fatal error: stropts.h: No such file or directory
解决方法:- vim tensorflow/third_party/curl.BUILD
- remove the line like:
- define HAVE_STROPTS_H 1
- 4. 其中如果遇到/tmp的权限问题,可以设置export TMPDIR=XXX.
- 1. no such package ‘@protobuf//’: java.io.IOException: Error downloading
- 编译tf_serving的过程
3.gpu版本tf_serving安装
-
tf_serving的安装(gpu版本)(cuda 9 cudnn 7)
- 编译tf_serving的过程
- bazel build -c opt --config=cuda tensorflow_serving/…
- 最终编译命令:
export TF_NEED_CUDA=1 && export TMPDIR=/home/work/XXX/tools/serving/tmp && /home/work/XXX/bin/bazel build -c opt --config=cuda tensorflow_serving/...
- 问题总结
- 1.ERROR: Building with --config=cuda but TensorFlow is not configured to build with GPU support(已经在configure中设置了,但是还是出现这个问题)
解决方法:export “TF_NEED_CUDA=1”. 最终命令为export TF_NEED_CUDA=1 && /home/work/XXX/bin/bazel build -c opt --config=cuda tensorflow_serving/...
- 2.fatal error: third_party/nccl/nccl.h: No such file or directory
解决方法: (设置两个参数)首先是安装nccl
export TF_NCCL_VERSION=‘2.1.15’
export NCCL_INSTALL_PATH=/usr/local/nccl2 (my prefered path)
最终命令为:export TF_NEED_CUDA=1 && export TMPDIR=/home/work/XXX/tools/serving/tmp && export TF_NCCL_VERSION=2.3.4 && export NCCL_INSTALL_PATH=/usr/local/nccl_2.3.4 && /home/work/XXX/bin/bazel build -c opt --config=cuda tensorflow_serving/...
- 3.error adding symbols: DSO missing from command line
解决方法:修改util/net_http/server/testing/BUILD和tensorflow_serving/util/net_http/client/testing/BUILD文件,修改为cc_binary( … linkopts = ["-lm"], ).即在cc_binary中添加’linkopts = ["-lm"],’,原有的配置不变. - 4. error: possibly undefined macro: AC_PROG_LIBTOOL
解决方法: 安装libtool (centos 使用yum ubuntu使用apt-get) - 5.在同一个机器,重新编译tf_serving的时候,会出现nccl位置变量已经设置好并且 “NCCL_HDR_PATH” not found in dictionary
解决方法:重新设置TF_NCCL_VERSION,NCCL_HDR_PATH等变量.命令如下:
serving]$ export TF_NEED_CUDA=1 && export TMPDIR=/home/XXX/tmp && export TF_NCCL_VERSION=2.3.4 && export NCCL_INSTALL_PATH=/home/XXX/tools/nccl_2.3.4/lib && export NCCL_HDR_PATH=/home/XXX/tools/nccl_2.3.4/include && /home/XXX/bin/bazel build -c opt --config=cuda tensorflow_serving/...
- 6. undefined reference to symbol 'XXX@@GLIBC_2.2.5)
解决方法:https://github.com/tensorflow/tensorflow/issues/2291 解决方法是找到相应的文件grep -rn "LINK_OPTS" */*/*/*
, 修改为:"//conditions:default": ["-lpthread","-lrt","-lm"],
, 主要是根据前面的提示,找到相应的BUILD文件,然后修改.
比方说:ERROR: /home/XXX/tools/serving/tensorflow_serving/util/net_http/socket/testing/BUILD
那么就修改XX/socket/testing/BUILD文件. - 7. error adding symbols: Bad value,错误描述relocation R_X86_64_32 against ‘.rodata’ can not be used when making a shared object; recompile with -fPIC
解决方法:build时候添加 –copt="-fPIC" 参数.最终命令为
export TF_NEED_CUDA=1 && export TMPDIR=/home/XXX/tmp && export TF_NCCL_VERSION=2.3.4 && export NCCL_INSTALL_PATH=/home/XXX/tools/nccl_2.3.4/lib && export NCCL_HDR_PATH=/home/XXX/tools/nccl_2.3.4/include && /home/XXX/bin/bazel build -c opt --config=cuda --copt="-fPIC" tensorflow_serving/...
- 1.ERROR: Building with --config=cuda but TensorFlow is not configured to build with GPU support(已经在configure中设置了,但是还是出现这个问题)
- 编译tf_serving的过程
-
启动tf_serving服务
- bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=bert --model_base_path=/XXX/model/bert/save_model
- 设置tensorflow serving的gpu数,方法为
export CUDA_VISIBLE_DEVICES="0"
-
在运行client端程序时,出现 NDIMS == dims() (2 vs. 4)Asking for tensor of 2 dimensions from a tensor of 4 dimensions
- 可能是因为tensorflow版本不对,需要调整tensorflow的版本,比如说调整到1.12.
以下是安装serving+tensorRT的脚本
export TF_NEED_CUDA=1
export TMPDIR=/home/xxx/tools/serving_tensorrt/tmp
export TF_NCCL_VERSION=2.3.4
export TF_NEED_TENSORRT=1
export TF_TENSORRT_VERSION=5.0.2
export TENSORRT_INSTALL_PATH=/home/xxx/tools/TensorRT-5.0.2.6/lib
#export TENSORRT_BIN_PATH=/home/xxx/tools/TensorRT-5.0.2.6
export NCCL_INSTALL_PATH=/home/xxx/tools/nccl_2.3.4/lib
export NCCL_HDR_PATH=/home/xxx/tools/nccl_2.3.4/include
/home/xxx/bin/bazel build -c opt \
--copt="-fPIC" \
--config=cuda tensorflow_serving/...