trition模型注册和访问验证（易错点加粗）

有来有去9527

已于 2023-07-12 09:53:51 修改

阅读量875

点赞数

分类专栏： tensorrt torch 文章标签： python 开发语言 Powered by 金山文档

于 2023-03-15 12:00:30 首次发布

本文链接：https://blog.csdn.net/bmfire/article/details/129550364

版权

tensorrt 同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

torch

3 篇文章 0 订阅

订阅专栏

该文详细介绍了如何下载并运行TritonInferenceServer的docker镜像，以及如何挂载模型路径进行模型注册。文中提到了模型配置文件的编写，特别是平台名称与推理框架的对应关系，强调了配置错误可能导致服务启动失败。此外，文章还展示了如何通过HTTP请求检查模型参数，并提供了模型推理的示例代码，强调了请求数据格式的重要性。

摘要由CSDN通过智能技术生成

下载trition server镜像并运行

    docker pull nvcr.io/nvidia/tritonserver:<xx.yy>-py3

docker run挂载模型路径

docker run --gpus all --rm --net=host -v /path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 bash

进入容器后执行

tritonserver --model-repository=/models

...

I0315 03:54:57.814177 829 server.cc:549] 
+-------------+-----------------------------------------------------------------+--------+
| Backend     | Path                                                            | Config |
+-------------+-----------------------------------------------------------------+--------+
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}     |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
| openvino    | /opt/tritonserver/backends/openvino/libtriton_openvino.so       | {}     |
| tensorrt    | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so       | {}     |
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {}     |
| python      | /opt/tritonserver/backends/python/libtriton_python.so           | {}     |
+-------------+-----------------------------------------------------------------+--------+

I0315 03:54:57.814214 829 server.cc:592] 
+---------------+---------+--------+
| Model         | Version | Status |
+---------------+---------+--------+
| cust_py       | 1       | READY  |
| densenet_onnx | 1       | READY  |
| fc_model_pth  | 1       | READY  |
| kfd_trt       | 1       | READY  |
+---------------+---------+--------+

I0315 03:54:57.814295 829 tritonserver.cc:1920] 
+----------------------------------+------------------------------------------------------+
| Option                           | Value                                                |
+----------------------------------+------------------------------------------------------+
| server_id                        | triton                                               |
| server_version                   | 2.15.0                                               |
| server_extensions                | classification sequence model_repository model_repos |
|                                  | itory(unload_dependents) schedule_policy model_confi |
|                                  | guration system_shared_memory cuda_shared_memory bin |
|                                  | ary_tensor_data statistics                           |
| model_repository_path[0]         | /models                                              |
| model_control_mode               | MODE_NONE                                            |
| strict_model_config              | 1                                                    |
| rate_limit                       | OFF                                                  |
| pinned_memory_pool_byte_size     | 268435456                                            |
| cuda_memory_pool_byte_size{0}    | 67108864                                             |
| response_cache_byte_size         | 0                                                    |
| min_supported_compute_capability | 6.0                                                  |
| strict_readiness                 | 1                                                    |
| exit_timeout                     | 30                                                   |
+----------------------------------+------------------------------------------------------+

I0315 03:54:57.815274 829 grpc_server.cc:4117] Started GRPCInferenceService at 0.0.0.0:8001
I0315 03:54:57.815463 829 http_server.cc:2815] Started HTTPService at 0.0.0.0:8000
I0315 03:54:57.856651 829 http_server.cc:167] Started Metrics Service at 0.0.0.0:8002

模型注册(参数配置易出错)

在宿主机模型路径下创建算法模型目录：

.
├── cust_py
│   ├── 1
│   │   ├── model.py
│   └── config.pbtxt
├── densenet_onnx
│   ├── 1
│   │   └── model.onnx
│   └── config.pbtxt
├── fc_model_pth
│   ├── 1
│   │   └── model.pt
│   └── config.pbtxt
└── kfd_trt
    ├── 1
    │   └── model.plan
    └── config.pbtxt

每个模型路径下包含模型版本文件夹和模型配置文件，以tensorrt为例：

    name: "kfd_trt"                 #模型名称
    platform: "tensorrt_plan"       #推理框架名称，不同的推理框架和platform名称对应关系如下
    max_batch_size : 0 
    input [
    {
        name: "input" 
        data_type: TYPE_FP32 
        dims: [1, 3, 224, 224 ] 
    }
    ]
    output [
    {
        name: "465" 
        data_type: TYPE_FP32
        dims: [1, 8]
    }
    ]

推理框架和platform名称对应关系：

pytorch	pytorch_libtorch
tensorrt	tensorr_plan
onnx	onnxruntime_onnx
tensorflow	tensorflow_graphdef

如果对应错误服务启动会报错

E0315 03:36:33.731745 792 model_repository_manager.cc:1890] Poll failed for model directory 'densenet_onnx': unexpected platform type onnxruntime for densenet_onnx

...

error: creating server: Internal - failed to load all models

配置项官方参考如下

https://github.com/triton-inference-server/server/blob/main/docs/README.md#metrics

triton成功加载模型后，可以通过http请求查看模型参数

    config_url = "http://172.17.0.2:8000/v2/models/{}/config".format(model_name)
    res = requests.get(url=config_url)
    print("model {} config:{}".format(model_name, res.json() ))

返回结果如下：

    {
    "name": "kfd_trt",
    "platform": "tensorrt_plan",
    "backend": "tensorrt",
    "version_policy": {
        "latest": {
        "num_versions": 1
        }
    },
    "max_batch_size": 0,
    "input": [
        {
        "name": "input",
        "data_type": "TYPE_FP32",
        "format": "FORMAT_NONE",
        "dims": [
            1,
            3,
            224,
            224
        ],
        "is_shape_tensor": false,
        "allow_ragged_batch": false
        }
    ],
    "output": [
        {
        "name": "465",
        "data_type": "TYPE_FP32",
        "dims": [
            1,
            8
        ],
        "label_filename": "",
        "is_shape_tensor": false
        }
    ],
    "batch_input": [],
    "batch_output": [],
    "optimization": {
        "priority": "PRIORITY_DEFAULT",
        "input_pinned_memory": {
        "enable": true
        },
        "output_pinned_memory": {
        "enable": true
        },
        "gather_kernel_buffer_threshold": 0,
        "eager_batching": false
    },
    "instance_group": [
        {
        "name": "kfd_trt",
        "kind": "KIND_GPU",
        "count": 1,
        "gpus": [
            0
        ],
        "secondary_devices": [],
        "profile": [],
        "passive": false,
        "host_policy": ""
        }
    ],
    "default_model_filename": "model.plan",
    "cc_model_filenames": {},
    "metric_tags": {},
    "parameters": {},
    "model_warmup": []
    }

模型访问（传参类型易出错）

通过httpclent访问trition服务进行推理

    input_data  = np.ones((1,3,224,224), dtype=np.float)
    request_data = {
    "inputs": [{
        "name": "input",
        "shape": [
            1,
            3,
            224,
            224
        ],
        "datatype": "FP32",
        "data": input_data.tolist()
    }],
    "outputs": [{"name": "465"}]
    }
    req_url = "http://172.17.0.2:8000/v2/models/{}/versions/1/infer".format(model_name)
    res = requests.post(url=req_url,json=request_data)
    print("inference result:",res.json())

这里需要注意的是request_data的设置

第3步中通过模型状态查看得到的输入数据格式为

"data_type": "TYPE_FP32"

但是在推理时入参与之不同

要去掉TYPE字样，否则会出现报错

{'error': 'invalid datatype for input input'}