Tensorflow serving架构与代码解析

最新推荐文章于 2021-07-26 19:24:54 发布

遇到我的Bug你要裂了

最新推荐文章于 2021-07-26 19:24:54 发布

阅读量2.1k

点赞数

分类专栏： # Tensorflow

本文链接：https://blog.csdn.net/budong282712018/article/details/103255919

版权

Tensorflow 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Tensorflow serving

架构

在这里插入图片描述

角色

Servables

类似于模型服务，产出流式结果

streaming results
experimental APIs（提高API）
asynchronous modes of operation（异步处理模型操作）

Models

Models
TensorFlow Serving represents a model as one or more servables. A machine-learned model may include one or more algorithms (including learned weights) and lookup or embedding tables.

You can represent a composite model as either of the following:

multiple independent servables

single composite servable

A servable may also correspond to a fraction of a model. For example, a large lookup table could be sharded across many TensorFlow Serving instances.

Loaders

Loaders 管理模型的生命周期，支持加载与卸载

Sources

Sources are plugin modules that find and provide servables
supplies one Loader instance
Sources -> SourceAdapters -> loaders

Aspired versions

期望的版本

Managers

loading Servables

serving Servables

unloading Servables

required resources aren’t available
wait to unload until a newer version finishes loading

简单来说：
1、Sources create Loaders for Servable Versions.
2、Loaders are sent as Aspired Versions to the Manager, which loads and serves them to client requests

详细来说：

1、A Source plugin creates a Loader for a specific version. The Loader contains whatever metadata it needs to load the Servable.（Source 创造特定版本的Loader）

2、The Source uses a callback to notify the Manager of the Aspired Version.(Source通过callback告诉Manager有新版本)

3、The Manager applies the configured Version Policy to determine the next action to take, which could be to unload a previously loaded version or to load the new version.（Manager判断是否有足够的资源）

4、If the Manager determines that it’s safe, it gives the Loader the required resources and tells the Loader to load the new version.（有资源情况下，Manager给充足的资源到Loader准备工作）

5、Clients ask the Manager for the Servable, either specifying a version explicitly or just requesting the latest version. The Manager returns a handle for the Servable. （客户端请求版本Manager返回信息）

举例来说：

1、The Source detects a new version of the model weights. It creates a Loader that contains a pointer to the model data on disk.（Source发现目录有新的模型，创建Loader）

2、The Source notifies the Dynamic Manager of the Aspired Version.（告诉Manager）

3、The Dynamic Manager applies the Version Policy and decides to load the new version.（Manager检查是否启动）

4、he Dynamic Manager tells the Loader that there is enough memory. The Loader instantiates the TensorFlow graph with the new weights.(Manager告诉loader可以加载，loader开始加载图)

5、A client requests a handle to the latest version of the model, and the Dynamic Manager returns a handle to the new version of the Servable.（请求返回结果）

loader 与 Manager

loader：管理模型生命周期

Manager：管理服务生命周期

特性

TensorFlow Serving通过Model Version Policy来配置多个模型的多个版本同时serving；

默认只加载model的latest version；

支持基于文件系统的模型自动发现和加载；

请求处理延迟低；

无状态，支持横向扩展；

可以使用A/B测试不同Version Model；

支持从本地文件系统扫描和加载TensorFlow模型；

支持从HDFS扫描和加载TensorFlow模型；

提供了用于client调用的gRPC接口；

接收批量请求

Batcher

Batching of multiple requests into a single request can significantly reduce the cost of performing inference, especially in the presence of hardware accelerators such as GPUs. TensorFlow Serving includes a request batching widget that lets clients easily batch their type-specific inferences across requests into batch requests that algorithm systems can more efficiently process. See the Batching Guide for more information

流程：

Source 加载本地资源文件，并根据模型数量创建相应的Loader, 将Aspired Versions配置信息传给Manager, Manager判断是否有足够的资源，如果充足，则通知Loader去加载模型并卸载旧模型，待加载完成后，Manger 会启动服务对外服务

client 发送请求到ServableHandle, 转发到Manager 预测完成后返回。

源码剖析

ServerCore （main.cc）

int main(int argc, char** argv) {
  ...

  ServerCore::Options options;
  options.model_server_config = model_server_config;
  options.servable_state_monitor_creator = &CreateServableStateMonitor;
  options.custom_model_config_loader = &LoadCustomModelConfig;

  ::google::protobuf::Any source_adapter_config;
  SavedModelBundleSourceAdapterConfig
      saved_model_bundle_source_adapter_config;
  source_adapter_config.PackFrom(saved_model_bundle_source_adapter_config);
  (*(*options.platform_config_map.mutable_platform_configs())
      [kTensorFlowModelPlatform].mutable_source_adapter_config()) =
      source_adapter_config;

  std::unique_ptr<ServerCore> core;
  TF_CHECK_OK(ServerCore::Create(options, &core));
  RunServer(port, std::move(core));

  return 0;
}

根据配置启动服务

FileSystemStoragePathSourceConfig ServerCore::CreateStoragePathSourceConfig(
    const ModelServerConfig& config) const {
  FileSystemStoragePathSourceConfig source_config;
  source_config.set_file_system_poll_wait_seconds(
      options_.file_system_poll_wait_seconds);
  for (const auto& model : config.model_config_list().config()) {
    LOG(INFO) << " (Re-)adding model: " << model.name();
    FileSystemStoragePathSourceConfig::ServableToMonitor* servable =
        source_config.add_servables();
    servable->set_servable_name(model.name());
    servable->set_base_path(model.base_path());
    // TODO(akhorlin): remove this logic once the corresponding deprecated
    // field is removed (b/62834753).
    if (!model.has_model_version_policy()) {
      switch (model.version_policy()) {
        case FileSystemStoragePathSourceConfig::LATEST_VERSION:
          servable->mutable_servable_version_policy()->mutable_latest();
          break;
        case FileSystemStoragePathSourceConfig::ALL_VERSIONS:
          servable->mutable_servable_version_policy()->mutable_all();
          break;
        default:
          LOG(FATAL) << "Unknown version policy: "  // Crash ok.
                     << model.version_policy();
      }
    } else {
      *servable->mutable_servable_version_policy() =
          model.model_version_policy();
    }
  }
  return source_config;
}

启动GRPC服务

void RunServer(int port, std::unique_ptr<ServerCore> core,
           bool use_saved_model) {
  // "0.0.0.0" is the way to listen on localhost in gRPC.
  const string server_address = "0.0.0.0:" + std::to_string(port);
  PredictionServiceImpl service(std::move(core), use_saved_model);
  ServerBuilder builder;
  std::shared_ptr<grpc::ServerCredentials> creds = InsecureServerCredentials();
  builder.AddListeningPort(server_address, creds);
  builder.RegisterService(&service);
  builder.SetMaxMessageSize(tensorflow::kint32max);
  std::unique_ptr<Server> server(builder.BuildAndStart());
  LOG(INFO) << "Running ModelServer at " << server_address << " ...";
  server->Wait();
}

扫描本地并加载模型

void PeriodicFunction::RunLoop(const int64 start) {
  	{
    if (options_.startup_delay_micros > 0) {
      const int64 deadline = start + options_.startup_delay_micros;
      options_.env->SleepForMicroseconds(deadline - start);
    }

    while (!stop_thread_.HasBeenNotified()) {
      VLOG(3) << "Running function.";
      const int64 begin = options_.env->NowMicros();
      function_();

      // Take the max() here to guard against time going backwards which
      // sometimes happens in multiproc machines.
      const int64 end =
          std::max(static_cast<int64>(options_.env->NowMicros()), begin);

      // The deadline is relative to when the last function started.
      const int64 deadline = begin + interval_micros_;

      // We want to sleep until 'deadline'.
      if (deadline > end) {
        if (end > begin) {
          VLOG(3) << "Reducing interval_micros from " << interval_micros_
                  << " to " << (deadline - end);
        }
        options_.env->SleepForMicroseconds(deadline - end);
      } else {
        VLOG(3) << "Function took longer than interval_micros, so not sleeping";
      }
    }

ServerCore, which internally wraps an AspiredVersionsManager. （ServerCore内部包装AspiredVersionsManager）

ServerCore::Create() takes a ServerCore::Options parameter. Here are a few commonly used options:

ModelServerConfig that specifies models to be loaded. Models are declared either through model_config_list, which declares a static list of models, or through custom_model_config, which defines a custom way to declare a list of models that may get updated at runtime.（在model_config_list指定配置加载模型）

PlatformConfigMap that maps from the name of the platform (such as tensorflow) to the PlatformConfig, which is used to create the SourceAdapter. SourceAdapter adapts StoragePath (the path where a model version is discovered) to model Loader (loads the model version from storage path and provides state transition interfaces to the Manager). If PlatformConfig contains SavedModelBundleSourceAdapterConfig, a SavedModelBundleSourceAdapter will be created, which we will explain later.（PlatformConfigMap 平台配置被使用来创建SourceAdapter， SourceAdapter 通过StoragePath服务相应的Loader(提供状态接口给Manger), 如果PlatformConfig 包含SavedModelBundleSourceAdapterConfig信息，则SavedModelBundleSourceAdapter 会被创造）

SavedModelBundle

SavedModelBundle is a key component of TensorFlow Serving. It represents a TensorFlow model loaded from a given path and provides the same Session::Run interface as TensorFlow to run inference. SavedModelBundleSourceAdapter adapts storage path to Loader so that model lifetime can be managed by Manager. Please note that SavedModelBundle is the successor of deprecated SessionBundle. Users are encouraged to use SavedModelBundle as the support for SessionBundle will soon be removed.（建议使用SavedModelBundle而不是SessionBundle，SavedModelBundleSourceAdapter 关联 Loader以至于Loader被Manager管理）

ServerCore 工作如下：

1、Instantiates a FileSystemStoragePathSource that monitors model export paths declared in model_config_list（通过model_config_list实例化FileSystemStoragePathSource ）

2、Instantiates a SourceAdapter using the PlatformConfigMap with the model platform declared in model_config_list and connects the FileSystemStoragePathSource to it. This way, whenever a new model version is discovered under the export path, the SavedModelBundleSourceAdapter adapts it to a Loader.（Loader --> SavedModelBundleSourceAdapter ——> FileSystemStoragePathSource ——>SetAspiredVersionsCallback(开启监听新模型) ）

3、Instantiates a specific implementation of Manager called AspiredVersionsManager that manages all such Loader instances created by the SavedModelBundleSourceAdapter. ServerCore exports the Manager interface by delegating the calls to AspiredVersionsManager.（AspiredVersionsManager -> Loader(被SavedModelBundleSourceAdapter创建)）

附录：

https://blog.csdn.net/wuhuaiyu/article/details/77336372

https://blog.csdn.net/xlie/article/details/81949947

https://blog.csdn.net/appletesttest/article/details/89647758

https://naurril.github.io/howtos/2018/08/22/inside_tfs.html#par20

遇到我的Bug你要裂了

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Tensorflow serving架构与代码解析

Tensorflow serving架构[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-k7GTP05a-1574750311566)(https://www.tensorflow.org/serving/images/serving_architecture.svg)]角色Servables类似于模型服务，产出流式结果streaming results...
复制链接

扫一扫