TorchServe环境构建+模型更新+新模型注册

有来有去9527

已于 2023-07-12 09:59:32 修改

阅读量2.4k

点赞数 3

分类专栏： torch 文章标签：深度学习人工智能

于 2022-10-27 16:07:47 首次发布

本文链接：https://blog.csdn.net/bmfire/article/details/127551418

版权

torch 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

1.背景

由于技术路线调整，需求调整原本的模型推理服务——tensorflow-serving，经过初步调研，可替换的服务框架有：torchserve和triton。本文只设计torchserve的环境部署方式和初级功能使用介绍。

2.torchserve环境搭建

基本运行环境

torch                   1.12.1
torch-model-archiver    0.6.0
torch-workflow-archiver 0.2.4
torchserve              0.6.0
torchvision             0.13.1

jdk>=11

2.1jdk环境搭建

下载jdk

网上参考较多不在赘述

解压配置环境变量

export JAVA_HOME=/usr/local/jdk11.0.10
export PATH=${PATH}:${JAVA_HOME}/bin
export CLASSPATH=.:$JAVA_HOME/jre/lib/ext:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

验证环境

$ java -version
openjdk version "11.0.10" 2021-01-19
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.10+9)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.10+9, mixed mode)

2.2 python 环境搭建

官方文档也很详细，可以参考

serve/README.md at master · pytorch/serve · GitHub

自己配置的话

conda create -n ts python=3.9

pip install torchserve torch-model-archiver torch-workflow-archiver

模型推理以来的pip包可以后续安装

安装完成后，可以先启动服务看一下效果

2.3 启动服务

torchserve --start --model-store model_save

由于我们未配置模型信息，所以调用模型查看接口，返回结果中没有模型信息

$ curl http://127.0.0.1:8081/models
{
  "models": []
}

2.3.1 注册模型

注册模型需要先对模型进行打包

打包模型文件和权重为一个文件

wget https://download.pytorch.org/models/resnet18-f37072fd.pth
torch-model-archiver --model-name resnet-18 --version 1.0 --model-file ./examples/image_classifier/resnet_18/model.py --serialized-file resnet18-f37072fd.pth --handler image_classifier --extra-files ./examples/image_classifier/index_to_name.json
mkdir model_store
mv resnet-18.mar model_store/
torchserve --start --model-store model_store --models resnet-18=resnet-18.mar

2.3.2 模型查看

$ curl http://127.0.0.1:8081/models
{
  "models": [
    {
      "modelName": "resnet-18",
      "modelUrl": "resnet-18.mar"
    }
  ]
}

2.3.3 接口调用

$ curl http://127.0.0.1:8080/predictions/resnet-18 -T ./examples/image_classifier/kitten.jpg
{
  "tabby": 0.40966343879699707,
  "tiger_cat": 0.3467043936252594,
  "Egyptian_cat": 0.13002890348434448,
  "lynx": 0.023919543251395226,
  "bucket": 0.011532200500369072
}

基本功能验证完成。

3 进阶功能

3.1 模型多版本管理

如果同一个模型存在多个注册版本，访问时可以在url中添加version参数进行区分：

/predictions/{model_name}/{version}:

比如2.3.1注册的resnet18 version=1.0的模型，我也可通过url中增加版本号进行访问：

curl http://127.0.0.1:8080/predictions/resnet-18/1.0 -T ./examples/image_classifier/kitten.jpg
{
  "tabby": 0.40966343879699707,
  "tiger_cat": 0.3467043936252594,
  "Egyptian_cat": 0.13002890348434448,
  "lynx": 0.023919543251395226,
  "bucket": 0.011532200500369072
}

更新模型

先要使用2.3.1的方式对新模型打包，注意版本号

$ curl -X POST  "http://localhost:8081/models?url=/home/ubuntu/newspace/pytorchserve/deploy/model_save/resnet-18_2.mar "
2022-10-27T14:58:01,223 [DEBUG] epollEventLoopGroup-3-12 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 2.0 for model resnet-18
2022-10-27T14:58:01,223 [INFO ] epollEventLoopGroup-3-12 org.pytorch.serve.wlm.ModelManager - Model resnet-18 loaded.
2022-10-27T14:58:01,224 [INFO ] epollEventLoopGroup-3-12 ACCESS_LOG - /127.0.0.1:38340 "POST /models?url=/home/ubuntu/newspace/pytorchserve/deploy/model_save/resnet-18_2.mar HTTP/1.1" 200 1832
2022-10-27T14:58:01,224 [INFO ] epollEventLoopGroup-3-12 TS_METRICS - Requests2XX.Count:1|#Level:Host|#hostname:cb,timestamp:1666849981
{
  "status": "Model \"resnet-18\" Version: 2.0 registered with 0 initial workers. Use scale workers API to add workers for the model."
}

调用测试

$ curl http://127.0.0.1:8080/predictions/resnet-18/2.0 -T ./examples/image_classifier/kitten.jpg
{
  "code": 503,
  "type": "ServiceUnavailableException",
  "message": "Model \"resnet-18\" Version 2.0\" has no worker to serve inference request. Please use scale workers API to add workers."
}

根据提示对新版本模型新增worker

$ curl -v -X PUT "http://localhost:8081/models/resnet-18/2.0?min_worker=1"
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8081 (#0)
> PUT /models/resnet-18/2.0?min_worker=1 HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.60.0
> Accept: */*
> 
< HTTP/1.1 202 Accepted
< content-type: application/json
< x-request-id: d2989b0f-f154-4056-9c58-d5f44d558cce
< Pragma: no-cache
< Cache-Control: no-cache; no-store, must-revalidate, private
< Expires: Thu, 01 Jan 1970 00:00:00 UTC
< content-length: 47
< connection: keep-alive
< 
{
  "status": "Processing worker updates..."
}
* Connection #0 to host localhost left intact

$ curl http://localhost:8081/models/resnet-18/2.0
[
  {
    "modelName": "resnet-18",
    "modelVersion": "2.0",
    "modelUrl": "/home/ubuntu/newspace/pytorchserve/deploy/model_save/resnet-18_2.mar",
    "runtime": "python",
    "minWorkers": 1,
    "maxWorkers": 1,
    "batchSize": 1,
    "maxBatchDelay": 100,
    "loadedAtStartup": true,
    "workers": [
      {
        "id": "9003",
        "startTime": "2022-10-27T15:03:22.541Z",
        "status": "UNLOADING",
        "memoryUsage": 0,
        "pid": 19643,
        "gpu": true,
        "gpuUsage": "gpuId::0 utilization.gpu [%]::2 % utilization.memory [%]::0 % memory.used [MiB]::2246 MiB"
      }
    ]
  }
]

$ curl http://127.0.0.1:8080/predictions/resnet-18/2.0 -T ./examples/image_classifier/kitten.jpg
{
  "tabby": 0.40966343879699707,
  "tiger_cat": 0.3467043936252594,
  "Egyptian_cat": 0.13002890348434448,
  "lynx": 0.023919543251395226,
  "bucket": 0.011532200500369072
}

3.2 新模型注册

在实际工程应用中，一个serving会提供多个模型的服务能力，这就需要我们在更新一个模型时，不能中断其他模型的在线推理能力。

新增一个模型tts

$ cd ./examples/text_to_speech_synthesizer/create_mar.sh
$ ./create_mar.sh
$ mv waveglow_synthesizer.mar model_store/

$ curl -X POST  "http://localhost:8081/models?url=/home/ubuntu/newspace/pytorchserve/deploy/model_save/waveglow_synthesizer.mar"


$ curl http://localhost:8081/models/waveglow_synthesizer/1.0
[
  {
    "modelName": "waveglow_synthesizer",
    "modelVersion": "1.0",
    "modelUrl": "/home/ubuntu/newspace/pytorchserve/deploy/model_save/waveglow_synthesizer.mar",
    "runtime": "python",
    "minWorkers": 0,
    "maxWorkers": 0,
    "batchSize": 1,
    "maxBatchDelay": 100,
    "loadedAtStartup": false,
    "workers": []
  }
]


$ curl -v -X PUT "http://localhost:8081/models/waveglow_synthesizer/1.0?min_worker=1"
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8081 (#0)
> PUT /models/waveglow_synthesizer/1.0?min_worker=1 HTTP/1.1
> Host: localhost:8081
> User-Agent: curl/7.60.0
> Accept: */*
> 
< HTTP/1.1 202 Accepted
< content-type: application/json
< x-request-id: 08ad7b99-86b5-4144-aff3-92583ed40aca
< Pragma: no-cache
< Cache-Control: no-cache; no-store, must-revalidate, private
< Expires: Thu, 01 Jan 1970 00:00:00 UTC
< content-length: 47
< connection: keep-alive
< 
{
  "status": "Processing worker updates..."
}
* Connection #0 to host localhost left intact

3.3 模型测试

$ curl http://127.0.0.1:8080/predictions/waveglow_synthesizer -T sample_text.txt -o audio.wav

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    25    0     0    0    25      0      0 --:--:--  0:04:11 --:--:--     0
100  183k  100  183k    0    25    420      0  0:07:26  0:07:25  0:00:01 47879

目录下生成tts生成的音频文件audio.wav