TensorFlow Serving源码解读

厉力文武

于 2024-01-12 09:40:43 发布

阅读量371

点赞数 8

文章标签： tensorflow neo4j 人工智能

本文链接：https://blog.csdn.net/weixin_40154354/article/details/135544583

版权

启动命令：

./tensorflow_model_server --port=8208 --rest_api_port=8501 --model_name=model --model_base_path=/models/model --model_config_file=/models/rank/model.config

日志参数：

export TF_CPP_MIN_VLOG_LEVEL=1，打印VLOG（1）的log

调试环境准备：

dd if=/dev/zero of=/swapfile bs=1024k count=65536，mkswap /swapfile，swapon /swapfile，启用swap交换；
执行命令swapoff /swapfile禁止swap交换，然后删除/swapfile文件，否则删除镜像将会提示系统占用无法删除；
容器启动：docker run -it -p 8208:8208 --privileged --cap-add sys_ptrace tensorflow/serving:latest-devel；
apt-get update、apt-get install vim；安装gdb-8.3，并复制相关文件到/usr/local/share/gdb/python下；
bazel build -c opt tensorflow_serving/...，Elapsed time: 10817.682s，17767 processes: 17767 local，19787 total actions；
采用opt参数编译目标文件143M，采用dbg参数编译目标文件3.8G，用于gdb进行调试，attach进程会非常耗时；
注意事项：通过迁移docker容器镜像位置来确保足够的磁盘空间，通过swap确保足够的内存空间，否则各种异常；

所有主程入参：

// gRPC Server options
tensorflow::int32 grpc_port = 8500;
tensorflow::string grpc_channel_arguments;
tensorflow::string grpc_socket_path;

// HTTP Server options.
tensorflow::int32 http_port = 0;
tensorflow::int32 http_num_threads = 4.0 * port::NumSchedulableCPUs();
tensorflow::int32 http_timeout_in_ms = 30000; // 30 seconds.

// Model Server options.
bool enable_batching = false;
float per_process_gpu_memory_fraction = 0;
tensorflow::string batching_parameters_file;
tensorflow::string model_name;
tensorflow::int32 max_num_load_retries = 5;
tensorflow::int64 load_retry_interval_micros = 1LL * 60 * 1000 * 1000;
tensorflow::int32 file_system_poll_wait_seconds = 1;
bool flush_filesystem_caches = true;
tensorflow::string model_base_path;
tensorflow::string saved_model_tags;
// Tensorflow session parallelism of zero means that both inter and intra op
// thread pools will be auto configured.
tensorflow::int64 tensorflow_session_parallelism = 0;

// Zero means that the thread pools will be auto configured.
tensorflow::int64 tensorflow_intra_op_parallelism = 0;
tensorflow::int64 tensorflow_inter_op_parallelism = 0;
tensorflow::string platform_config_file;
tensorflow::string ssl_config_file;
string model_config_file;
bool enable_model_warmup = true;
tensorflow::string monitoring_config_file;

重点入参说明；

model_servers/main.cc是tfs主程入口，主程从入参解析开始；
入参port为grpc_port端口指定，默认8500；
入参rest_api_port为HTTP/REST API端口指定，默认0；
入参rest_api_num_threads为HTTP/REST API工作线程数，默认CPU核数乘4；
入参rest_api_timeout_in_ms为HTTP/REST API超时设置，默认30秒；
入参enable_model_warmup对模型热身以减少第一次访问延时，默认true；

服务构建及启动：

函数声明：Status BuildAndStart(const Options& server_options)，构建并启动GRPC（和可选的HTTP）服务器，以准备通过GRPC（和可选的HTTP/REST）接受和处理新请求。
函数声明：void WaitForTermination()，等待在上面的buildAndStart（）中启动的服务器终止，阻止当前线程退出，直到构建和启动成功为止。

模型文件初始化：

如果配置项model_config_file为空，即model.config多模型配置文件为空，则初始化单一模型；
否则解析model_config_file多模型配置文件，并逐一解析各个模型，初始化到ModelServerConfig对象中去；
ModelServerConfig对象在tensorflow_serving\config目录下model_server_config.proto文件中定义的；
platform_config_file非空则从该文件名中读取一个ascii platformconfigmap protobuf，并使用该平台配置而非tensorflow平台配置；

对外服务方式：

UNIX域套接字，grpc_socket_path参数，指定UNIX域套接字路径，可以是相对路径，也可以是绝对路径，底层内存文件交换免去网络调用等诸多问题；
HTTP通信，指定监听端口，采用HTTP协议承载通信数据；
gRPC通信，采用谷歌远程调用框架实现数据通信；

HTTP服务实现：

server.cc文件BuildAndStart函数中调用http_server.cc文件中的CreateAndStartHttpServer函数，实现初始化；
初始化重点2个变量，端口和执行者，执行者为RequestExecutor对象，定义在http_server.cc文件中；
函数中创建了一个server变量，类型为std::unique_ptr<HTTPServerInterface>接口，实际为EvHTTPServer对象；
接下来调用EvHTTPServer::Initialize() ，其内部是基于libevent库的开发，实现HTTP协议的数据传输过程；
evhttp_set_gencb定义了派发函数DispatchEvRequestFn，将数据下发到线程池工作线程上；
最后EvHTTPServer对象的event_base在event_base_loopexit函数上挂起等待任务执行；

main.cc::server.BuildAndStart → server.cc::CreateAndStartHttpServer → http_server.cc::net_http::CreateEvHTTPServer → httpserver.h::CreateEvHTTPServer → evhttp_server.cc::Initialize → DispatchEvRequestFn → DispatchEvRequest

DispatchEvRequest入参：evhttp_request* req → auto parsed_request = absl::make_unique<ParsedEvRequest>(req) → std::unique_ptr<EvHTTPRequest> ev_request(new EvHTTPRequest(std::move(parsed_request), this))

gRPC框架实现：

server.cc文件BuildAndStart函数中定义ServerBuilder对象；
为该对象添加地址端口，格式
创建服务器凭证、注册服务（model服务和prediction服务）；
设置最大消息尺寸；
最后调用BuildAndStart完成构建；

gRPC堆栈信息：

#0 tensorflow::serving::PredictionServiceImpl::Predict (this=0x555565f29dc0, context=0x7fffec00b440, request=0x7fffec009ee0, response=0x7fffdf7da200) at tensorflow_serving/model_servers/prediction_service_impl.cc:48
#1 0x0000555555a0227a in std::__invoke_impl<grpc::Status, grpc::Status (tensorflow::serving::PredictionService::Service::* const&)(grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*), tensorflow::serving::PredictionService::Service*, grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*> (
__f=@0x555565f2b0b8: &virtual tensorflow::serving::PredictionService::Service::Predict(grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*),
__t=@0x7fffdf7da0e8: 0x555565f29dc0, __args#0=@0x7fffdf7da0e0: 0x7fffec00b440, __args#1=@0x7fffdf7da0d8: 0x7fffec009ee0, __args#2=@0x7fffdf7da0d0: 0x7fffdf7da200) at /usr/include/c++/7/bits/invoke.h:73
#2 0x0000555555a0161e in std::__invoke<grpc::Status (tensorflow::serving::PredictionService::Service::* const&)(grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*), tensorflow::serving::PredictionService::Service*, grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*> (
__fn=@0x555565f2b0b8: &virtual tensorflow::serving::PredictionService::Service::Predict(grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*),
__args#0=@0x7fffdf7da0e8: 0x555565f29dc0, __args#1=@0x7fffdf7da0e0: 0x7fffec00b440, __args#2=@0x7fffdf7da0d8: 0x7fffec009ee0, __args#3=@0x7fffdf7da0d0: 0x7fffdf7da200)
at /usr/include/c++/7/bits/invoke.h:96
#3 0x0000555555a00124 in std::_Mem_fn_base<grpc::Status (tensorflow::serving::PredictionService::Service::*)(grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*), true>::operator()<tensorflow::serving::PredictionService::Service*, grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*> (this=0x555565f2b0b8,
__args#0=@0x7fffdf7da0e8: 0x555565f29dc0, __args#1=@0x7fffdf7da0e0: 0x7fffec00b440, __args#2=@0x7fffdf7da0d8: 0x7fffec009ee0, __args#3=@0x7fffdf7da0d0: 0x7fffdf7da200) at /usr/include/c++/7/functional:175
#4 0x00005555559fd6c1 in std::_Function_handler<grpc::Status (tensorflow::serving::PredictionService::Service*, grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*), std::_Mem_fn<grpc::Status (tensorflow::serving::PredictionService::Service::*)(grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*)> >::_M_invoke(std::_Any_data const&, tensorflow::serving::PredictionService::Service*&&, grpc::ServerContext*&&, tensorflow::serving::PredictRequest const*&&, tensorflow::serving::PredictResponse*&&) (__functor=...,
__args#0=@0x7fffdf7da0e8: 0x555565f29dc0, __args#1=@0x7fffdf7da0e0: 0x7fffec00b440, __args#2=@0x7fffdf7da0d8: 0x7fffec009ee0, __args#3=@0x7fffdf7da0d0: 0x7fffdf7da200)
at /usr/include/c++/7/bits/std_function.h:302
#5 0x0000555555a0eb86 in std::function<grpc::Status (tensorflow::serving::PredictionService::Service*, grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*)>::operator()(tensorflow::serving::PredictionService::Service*, grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*) const (this=0x555565f2b0b8,
__args#0=0x555565f29dc0, __args#1=0x7fffec00b440, __args#2=0x7fffec009ee0, __args#3=0x7fffdf7da200) at /usr/include/c++/7/bits/std_function.h:706
#6 0x0000555555a078e4 in grpc::internal::RpcMethodHandler<tensorflow::serving::PredictionService::Service, tensorflow::serving::PredictRequest, tensorflow::serving::PredictResponse>::RunHandler(grpc::internal::MethodHandler::HandlerParameter const&)::{lambda()#1}::operator()() const (__closure=0x7fffdf7da1c0) at external/grpc/include/grpcpp/impl/codegen/method_handler_impl.h:68
#7 0x0000555555a0ebbb in grpc::internal::CatchingFunctionHandler<grpc::internal::RpcMethodHandler<tensorflow::serving::PredictionService::Service, tensorflow::serving::PredictRequest, tensorflow::serving::PredictResponse>::RunHandler(grpc::internal::MethodHandler::HandlerParameter const&)::{lambda()#1}>(grpc::internal::RpcMethodHandler<tensorflow::serving::PredictionService::Service, tensorflow::serving::PredictRequest, tensorflow::serving::PredictResponse>::RunHandler(grpc::internal::MethodHandler::HandlerParameter const&)::{lambda()#1}&&) (handler=...)
at external/grpc/include/grpcpp/impl/codegen/method_handler_impl.h:42
#8 0x0000555555a07999 in grpc::internal::RpcMethodHandler<tensorflow::serving::PredictionService::Service, tensorflow::serving::PredictRequest, tensorflow::serving::PredictResponse>::RunHandler (this=0x555565f2b0b0, param=...) at external/grpc/include/grpcpp/impl/codegen/method_handler_impl.h:65
#9 0x0000555555a485f2 in grpc::Server::SyncRequest::CallData::ContinueRunAfterInterception (this=0x7fffec00b420) at external/grpc/src/cpp/server/server_cc.cc:306
#10 0x0000555555a48425 in grpc::Server::SyncRequest::CallData::Run (this=0x7fffec00b420, global_callbacks=..., resources=true) at external/grpc/src/cpp/server/server_cc.cc:293
#11 0x0000555555a498c1 in grpc::Server::SyncRequestThreadManager::DoWork (this=0x555565e25930, tag=0x555565ef59d0, ok=true, resources=true) at external/grpc/src/cpp/server/server_cc.cc:629
#12 0x0000555555a5bcf1 in grpc::ThreadManager::MainWorkLoop (this=0x555565e25930) at external/grpc/src/cpp/thread_manager/thread_manager.cc:200
#13 0x0000555555a5b4d9 in grpc::ThreadManager::WorkerThread::Run (this=0x555565efb480) at external/grpc/src/cpp/thread_manager/thread_manager.cc:42
#14 0x0000555555a5b3d2 in grpc::ThreadManager::WorkerThread::<lambda(void*)>::operator()(void *) const (__closure=0x0, th=0x555565efb480) at external/grpc/src/cpp/thread_manager/thread_manager.cc:36
#15 0x0000555555a5b3f2 in grpc::ThreadManager::WorkerThread::<lambda(void*)>::_FUN(void *) () at external/grpc/src/cpp/thread_manager/thread_manager.cc:36
#16 0x0000555555b674a7 in grpc_core::(anonymous namespace)::ThreadInternalsPosix::<lambda(void*)>::operator()(void *) const (__closure=0x0, v=0x555565f29330) at external/grpc/src/core/lib/gprpp/thd_posix.cc:100
#17 0x0000555555b674e4 in grpc_core::(anonymous namespace)::ThreadInternalsPosix::<lambda(void*)>::_FUN(void *) () at external/grpc/src/core/lib/gprpp/thd_posix.cc:103
#18 0x00007ffff7bbd6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#19 0x00007ffff6b9b88f in clone () from /lib/x86_64-linux-gnu/libc.so.6

gRPC调用过程：

prediction_service_impl.cc::Predict → prediction_impl.cc::Predict → PredictWithModelSpec → RunPredict；
SessionBundlePredict方法已被否决，放在RunPredict后面仅是用于向后版本兼容；

重点流程说明：

PredictionServiceImpl::Predict首先设置超时时间；然后调用prediction_impl.cc::Predict方法，入参包括配置和输入输出等；
TensorflowPredictor::Predict首先判断请求中的模型详情，如果缺乏相关信息则抛出错误并返回，否则调用PredictWithModelSpec；
在TensorflowPredictor::PredictWithModelSpec方法中调用internal::RunPredict函数，根据传入的请求对象构建模型的名称、标签和版本；
PreProcessPrediction函数作用是验证，参数为模型标签、请求对象、传入tensor集合向量，以及输出tensor名称向量和别名向量；
验证包括二部分，VerifySignature函数模型签名进行验证，VerifyRequestInputsSize对请求输入尺寸进行验证，失败返回错误信息；
session->Run(...)方法是算法处理的入口，session是ServingSessionWrapper对象的实例，派生自ServingSession父类（派生自Session父类）；
模型计算结束以后，调用PostProcessPredictionResult函数填充应答对象，向客户端返回结果；
说明：ServingSessionWrapper::Run → DirectSession::Run的调用显示，算法的处理从TFS工程转到了TF工程内部；

模型计算说明：

函数原型：Status Run(

const RunOptions& run_options,
const std::vector<std::pair<string, Tensor>>& inputs,
const std::vector<string>& output_tensor_names,
const std::vector<string>& target_node_names,
std::vector<Tensor>* outputs, RunMetadata* run_metadata

) override

模型计算的入口在tensorflow/core/common_runtime/direct_session.cc文件，方法名称为DirectSession::Run；
TF_RETURN_IF_ERROR宏首先判断session是否被关闭，在判断graph是否被创建；
遍历vector<std::pair<string, Tensor>>容器，将特征名追加到“输入tensor名称”向量容器中；// 提取会话运行的输入名称，
检查上述参数的执行器是否已经被创建，如果已经存在则返回，否则创建执行器；
配置并构造一个框架（FunctionCallFrame），使用它向执行器提供和获取数据，函数入口DirectSession::RunInternal；
最后接收输出项即计算结果；

日志说明：

系统日志相关的实现是在TensorFlow工程中定义和实现的；
使用getenv(“TF_CPP_MIN_VLOG_LEVEL”)指令获取系统变量；
TensorFlow Serving日志级别只支持1（ WARNING ）、2（ERROR）；
export TF_CPP_MIN_VLOG_LEVEL=1，打印VLOG（1）的日志；

优化操作：

线程锁采用GCC的__attribute__编译器指令，希望让编译器执行更多的错误检查和高级优化；
判断语句采用GCC的__builtin_expect(EXP, N)编译器指令，表示EXP大概率等于N，降低指令跳转带来的性能下降；
vector容器采用emplace_back代替push_back操作，速度提升一倍；

厉力文武

关注

8
点赞
踩
10

收藏

觉得还不错? 一键收藏
打赏
0
评论
TensorFlow Serving源码解读

DispatchEvRequest入参：evhttp_request* req → auto parsed_request = absl::make_unique<ParsedEvRequest>(req) → std::unique_ptr<EvHTTPRequest> ev_request(new EvHTTPRequest(std::move(parsed_request), this))export TF_CPP_MIN_VLOG_LEVEL=1，打印VLOG（1）的log。
复制链接

扫一扫