TensorFlow Serving源码解读

启动命令:

./tensorflow_model_server --port=8208 --rest_api_port=8501 --model_name=model --model_base_path=/models/model --model_config_file=/models/rank/model.config

日志参数:

export TF_CPP_MIN_VLOG_LEVEL=1,打印VLOG(1)的log

调试环境准备:

  • dd if=/dev/zero of=/swapfile bs=1024k count=65536,mkswap /swapfile,swapon /swapfile,启用swap交换;
  • 执行命令swapoff /swapfile禁止swap交换,然后删除/swapfile文件,否则删除镜像将会提示系统占用无法删除;
  • 容器启动:docker run -it -p 8208:8208 --privileged --cap-add sys_ptrace tensorflow/serving:latest-devel;
  • apt-get update、apt-get install vim;安装gdb-8.3,并复制相关文件到/usr/local/share/gdb/python下;
  • bazel build -c opt tensorflow_serving/...,Elapsed time: 10817.682s,17767 processes: 17767 local,19787 total actions;
  • 采用opt参数编译目标文件143M,采用dbg参数编译目标文件3.8G,用于gdb进行调试,attach进程会非常耗时;
  • 注意事项:通过迁移docker容器镜像位置来确保足够的磁盘空间,通过swap确保足够的内存空间,否则各种异常;

所有主程入参:

// gRPC Server options
tensorflow::int32 grpc_port = 8500;
tensorflow::string grpc_channel_arguments;
tensorflow::string grpc_socket_path;

// HTTP Server options.
tensorflow::int32 http_port = 0;
tensorflow::int32 http_num_threads = 4.0 * port::NumSchedulableCPUs();
tensorflow::int32 http_timeout_in_ms = 30000; // 30 seconds.

// Model Server options.
bool enable_batching = false;
float per_process_gpu_memory_fraction = 0;
tensorflow::string batching_parameters_file;
tensorflow::string model_name;
tensorflow::int32 max_num_load_retries = 5;
tensorflow::int64 load_retry_interval_micros = 1LL * 60 * 1000 * 1000;
tensorflow::int32 file_system_poll_wait_seconds = 1;
bool flush_filesystem_caches = true;
tensorflow::string model_base_path;
tensorflow::string saved_model_tags;
// Tensorflow session parallelism of zero means that both inter and intra op
// thread pools will be auto configured.
tensorflow::int64 tensorflow_session_parallelism = 0;

// Zero means that the thread pools will be auto configured.
tensorflow::int64 tensorflow_intra_op_parallelism = 0;
tensorflow::int64 tensorflow_inter_op_parallelism = 0;
tensorflow::string platform_config_file;
tensorflow::string ssl_config_file;
string model_config_file;
bool enable_model_warmup = true;
tensorflow::string monitoring_config_file;

重点入参说明;

  • model_servers/main.cc是tfs主程入口,主程从入参解析开始;
  • 入参port为grpc_port端口指定,默认8500;
  • 入参rest_api_port为HTTP/REST API端口指定,默认0;
  • 入参rest_api_num_threads为HTTP/REST API工作线程数,默认CPU核数乘4;
  • 入参rest_api_timeout_in_ms为HTTP/REST API超时设置,默认30秒;
  • 入参enable_model_warmup对模型热身以减少第一次访问延时,默认true;

服务构建及启动:

函数声明:Status BuildAndStart(const Options& server_options),构建并启动GRPC(和可选的HTTP)服务器,以准备通过GRPC(和可选的HTTP/REST)接受和处理新请求。
函数声明:void WaitForTermination(),等待在上面的buildAndStart()中启动的服务器终止,阻止当前线程退出,直到构建和启动成功为止。

模型文件初始化:

  • 如果配置项model_config_file为空,即model.config多模型配置文件为空,则初始化单一模型;
  • 否则解析model_config_file多模型配置文件,并逐一解析各个模型,初始化到ModelServerConfig对象中去;
  • ModelServerConfig对象在tensorflow_serving\config目录下model_server_config.proto文件中定义的;
  • platform_config_file非空则从该文件名中读取一个ascii platformconfigmap protobuf,并使用该平台配置而非tensorflow平台配置;

对外服务方式:

  1. UNIX域套接字,grpc_socket_path参数,指定UNIX域套接字路径,可以是相对路径,也可以是绝对路径,底层内存文件交换免去网络调用等诸多问题;
  2. HTTP通信,指定监听端口,采用HTTP协议承载通信数据;
  3. gRPC通信,采用谷歌远程调用框架实现数据通信;

HTTP服务实现:

  1. server.cc文件BuildAndStart函数中调用http_server.cc文件中的CreateAndStartHttpServer函数,实现初始化;
  2. 初始化重点2个变量,端口和执行者,执行者为RequestExecutor对象,定义在http_server.cc文件中;
  3. 函数中创建了一个server变量,类型为std::unique_ptr<HTTPServerInterface>接口,实际为EvHTTPServer对象;
  4. 接下来调用EvHTTPServer::Initialize() ,其内部是基于libevent库的开发,实现HTTP协议的数据传输过程;
  5. evhttp_set_gencb定义了派发函数DispatchEvRequestFn,将数据下发到线程池工作线程上;
  6. 最后EvHTTPServer对象的event_base在event_base_loopexit函数上挂起等待任务执行;

main.cc::server.BuildAndStart → server.cc::CreateAndStartHttpServer → http_server.cc::net_http::CreateEvHTTPServer → httpserver.h::CreateEvHTTPServer → evhttp_server.cc::Initialize → DispatchEvRequestFn → DispatchEvRequest

DispatchEvRequest入参:evhttp_request* req → auto parsed_request = absl::make_unique<ParsedEvRequest>(req) → std::unique_ptr<EvHTTPRequest> ev_request(new EvHTTPRequest(std::move(parsed_request), this))

gRPC框架实现:

  1. server.cc文件BuildAndStart函数中定义ServerBuilder对象;
  2. 为该对象添加地址端口,格式
  3. 创建服务器凭证、注册服务(model服务和prediction服务);
  4. 设置最大消息尺寸;
  5. 最后调用BuildAndStart完成构建;

gRPC堆栈信息:

#0 tensorflow::serving::PredictionServiceImpl::Predict (this=0x555565f29dc0, context=0x7fffec00b440, request=0x7fffec009ee0, response=0x7fffdf7da200) at tensorflow_serving/model_servers/prediction_service_impl.cc:48
#1 0x0000555555a0227a in std::__invoke_impl<grpc::Status, grpc::Status (tensorflow::serving::PredictionService::Service::* const&)(grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*), tensorflow::serving::PredictionService::Service*, grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*> (
__f=@0x555565f2b0b8: &virtual tensorflow::serving::PredictionService::Service::Predict(grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*),
__t=@0x7fffdf7da0e8: 0x555565f29dc0, __args#0=@0x7fffdf7da0e0: 0x7fffec00b440, __args#1=@0x7fffdf7da0d8: 0x7fffec009ee0, __args#2=@0x7fffdf7da0d0: 0x7fffdf7da200) at /usr/include/c++/7/bits/invoke.h:73
#2 0x0000555555a0161e in std::__invoke<grpc::Status (tensorflow::serving::PredictionService::Service::* const&)(grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*), tensorflow::serving::PredictionService::Service*, grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*> (
__fn=@0x555565f2b0b8: &virtual tensorflow::serving::PredictionService::Service::Predict(grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*),
__args#0=@0x7fffdf7da0e8: 0x555565f29dc0, __args#1=@0x7fffdf7da0e0: 0x7fffec00b440, __args#2=@0x7fffdf7da0d8: 0x7fffec009ee0, __args#3=@0x7fffdf7da0d0: 0x7fffdf7da200)
at /usr/include/c++/7/bits/invoke.h:96
#3 0x0000555555a00124 in std::_Mem_fn_base<grpc::Status (tensorflow::serving::PredictionService::Service::*)(grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*), true>::operator()<tensorflow::serving::PredictionService::Service*, grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*> (this=0x555565f2b0b8,
__args#0=@0x7fffdf7da0e8: 0x555565f29dc0, __args#1=@0x7fffdf7da0e0: 0x7fffec00b440, __args#2=@0x7fffdf7da0d8: 0x7fffec009ee0, __args#3=@0x7fffdf7da0d0: 0x7fffdf7da200) at /usr/include/c++/7/functional:175
#4 0x00005555559fd6c1 in std::_Function_handler<grpc::Status (tensorflow::serving::PredictionService::Service*, grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*), std::_Mem_fn<grpc::Status (tensorflow::serving::PredictionService::Service::*)(grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*)> >::_M_invoke(std::_Any_data const&, tensorflow::serving::PredictionService::Service*&&, grpc::ServerContext*&&, tensorflow::serving::PredictRequest const*&&, tensorflow::serving::PredictResponse*&&) (__functor=...,
__args#0=@0x7fffdf7da0e8: 0x555565f29dc0, __args#1=@0x7fffdf7da0e0: 0x7fffec00b440, __args#2=@0x7fffdf7da0d8: 0x7fffec009ee0, __args#3=@0x7fffdf7da0d0: 0x7fffdf7da200)
at /usr/include/c++/7/bits/std_function.h:302
#5 0x0000555555a0eb86 in std::function<grpc::Status (tensorflow::serving::PredictionService::Service*, grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*)>::operator()(tensorflow::serving::PredictionService::Service*, grpc::ServerContext*, tensorflow::serving::PredictRequest const*, tensorflow::serving::PredictResponse*) const (this=0x555565f2b0b8,
__args#0=0x555565f29dc0, __args#1=0x7fffec00b440, __args#2=0x7fffec009ee0, __args#3=0x7fffdf7da200) at /usr/include/c++/7/bits/std_function.h:706
#6 0x0000555555a078e4 in grpc::internal::RpcMethodHandler<tensorflow::serving::PredictionService::Service, tensorflow::serving::PredictRequest, tensorflow::serving::PredictResponse>::RunHandler(grpc::internal::MethodHandler::HandlerParameter const&)::{lambda()#1}::operator()() const (__closure=0x7fffdf7da1c0) at external/grpc/include/grpcpp/impl/codegen/method_handler_impl.h:68
#7 0x0000555555a0ebbb in grpc::internal::CatchingFunctionHandler<grpc::internal::RpcMethodHandler<tensorflow::serving::PredictionService::Service, tensorflow::serving::PredictRequest, tensorflow::serving::PredictResponse>::RunHandler(grpc::internal::MethodHandler::HandlerParameter const&)::{lambda()#1}>(grpc::internal::RpcMethodHandler<tensorflow::serving::PredictionService::Service, tensorflow::serving::PredictRequest, tensorflow::serving::PredictResponse>::RunHandler(grpc::internal::MethodHandler::HandlerParameter const&)::{lambda()#1}&&) (handler=...)
at external/grpc/include/grpcpp/impl/codegen/method_handler_impl.h:42
#8 0x0000555555a07999 in grpc::internal::RpcMethodHandler<tensorflow::serving::PredictionService::Service, tensorflow::serving::PredictRequest, tensorflow::serving::PredictResponse>::RunHandler (this=0x555565f2b0b0, param=...) at external/grpc/include/grpcpp/impl/codegen/method_handler_impl.h:65
#9 0x0000555555a485f2 in grpc::Server::SyncRequest::CallData::ContinueRunAfterInterception (this=0x7fffec00b420) at external/grpc/src/cpp/server/server_cc.cc:306
#10 0x0000555555a48425 in grpc::Server::SyncRequest::CallData::Run (this=0x7fffec00b420, global_callbacks=..., resources=true) at external/grpc/src/cpp/server/server_cc.cc:293
#11 0x0000555555a498c1 in grpc::Server::SyncRequestThreadManager::DoWork (this=0x555565e25930, tag=0x555565ef59d0, ok=true, resources=true) at external/grpc/src/cpp/server/server_cc.cc:629
#12 0x0000555555a5bcf1 in grpc::ThreadManager::MainWorkLoop (this=0x555565e25930) at external/grpc/src/cpp/thread_manager/thread_manager.cc:200
#13 0x0000555555a5b4d9 in grpc::ThreadManager::WorkerThread::Run (this=0x555565efb480) at external/grpc/src/cpp/thread_manager/thread_manager.cc:42
#14 0x0000555555a5b3d2 in grpc::ThreadManager::WorkerThread::<lambda(void*)>::operator()(void *) const (__closure=0x0, th=0x555565efb480) at external/grpc/src/cpp/thread_manager/thread_manager.cc:36
#15 0x0000555555a5b3f2 in grpc::ThreadManager::WorkerThread::<lambda(void*)>::_FUN(void *) () at external/grpc/src/cpp/thread_manager/thread_manager.cc:36
#16 0x0000555555b674a7 in grpc_core::(anonymous namespace)::ThreadInternalsPosix::<lambda(void*)>::operator()(void *) const (__closure=0x0, v=0x555565f29330) at external/grpc/src/core/lib/gprpp/thd_posix.cc:100
#17 0x0000555555b674e4 in grpc_core::(anonymous namespace)::ThreadInternalsPosix::<lambda(void*)>::_FUN(void *) () at external/grpc/src/core/lib/gprpp/thd_posix.cc:103
#18 0x00007ffff7bbd6db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#19 0x00007ffff6b9b88f in clone () from /lib/x86_64-linux-gnu/libc.so.6

gRPC调用过程:

  1. prediction_service_impl.cc::Predict → prediction_impl.cc::Predict → PredictWithModelSpec → RunPredict;
  2. SessionBundlePredict方法已被否决,放在RunPredict后面仅是用于向后版本兼容;

重点流程说明:

  1. PredictionServiceImpl::Predict首先设置超时时间;然后调用prediction_impl.cc::Predict方法,入参包括配置和输入输出等;
  2. TensorflowPredictor::Predict首先判断请求中的模型详情,如果缺乏相关信息则抛出错误并返回,否则调用PredictWithModelSpec;
  3. 在TensorflowPredictor::PredictWithModelSpec方法中调用internal::RunPredict函数,根据传入的请求对象构建模型的名称、标签和版本;
  4. PreProcessPrediction函数作用是验证,参数为模型标签、请求对象、传入tensor集合向量,以及输出tensor名称向量和别名向量;
  5. 验证包括二部分,VerifySignature函数模型签名进行验证,VerifyRequestInputsSize对请求输入尺寸进行验证,失败返回错误信息;
  6. session->Run(...)方法是算法处理的入口,session是ServingSessionWrapper对象的实例,派生自ServingSession父类(派生自Session父类);
  7. 模型计算结束以后,调用PostProcessPredictionResult函数填充应答对象,向客户端返回结果;
  8. 说明:ServingSessionWrapper::Run → DirectSession::Run的调用显示,算法的处理从TFS工程转到了TF工程内部;

模型计算说明:

函数原型:Status Run(

const RunOptions& run_options,
const std::vector<std::pair<string, Tensor>>& inputs,
const std::vector<string>& output_tensor_names,
const std::vector<string>& target_node_names,
std::vector<Tensor>* outputs, RunMetadata* run_metadata

) override

  • 模型计算的入口在tensorflow/core/common_runtime/direct_session.cc文件,方法名称为DirectSession::Run;
  • TF_RETURN_IF_ERROR宏首先判断session是否被关闭,在判断graph是否被创建;
  • 遍历vector<std::pair<string, Tensor>>容器,将特征名追加到“输入tensor名称”向量容器中;// 提取会话运行的输入名称,
  • 检查上述参数的执行器是否已经被创建,如果已经存在则返回,否则创建执行器;
  • 配置并构造一个框架(FunctionCallFrame),使用它向执行器提供和获取数据,函数入口DirectSession::RunInternal;
  • 最后接收输出项即计算结果;

日志说明:

  • 系统日志相关的实现是在TensorFlow工程中定义和实现的;
  • 使用getenv(“TF_CPP_MIN_VLOG_LEVEL”)指令获取系统变量;
  • TensorFlow Serving日志级别只支持1( WARNING )、2(ERROR);
  • export TF_CPP_MIN_VLOG_LEVEL=1,打印VLOG(1)的日志;

优化操作:

  • 线程锁采用GCC的__attribute__编译器指令,希望让编译器执行更多的错误检查和高级优化;
  • 判断语句采用GCC的__builtin_expect(EXP, N)编译器指令,表示EXP大概率等于N,降低指令跳转带来的性能下降;
  • vector容器采用emplace_back代替push_back操作,速度提升一倍;

  • 8
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

厉力文武

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值