本文基于Triton部署架构开发,模型转换过程为pytorch(.pt)-ONNX(.onnx)-TensorRT(.engine)。
yolov7版本为官方原版。
GitHub - WongKinYiu/yolov7: Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
1.准备
1.1 基础环境(略)
CUDA : 11.7
cudnn : 8.4
2.下载 Anaconda
Anaconda | The World’s Most Popular Data Science Platform
2.1 添加channels:
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
gedit ~/.condarc
2.2 pip临时源:
pip后直接加-i https://mirrors.aliyun.com/pypi/simple;VPN pip可能无效。
-i https://mirrors.aliyun.com/pypi/simple
-i https://pypi.tuna.tsinghua.edu.cn/simple
3.安装 pytorch 1.13.1 + tensorrt 8.4.2.4
3.1 安装pytoch、tensorrt
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install pandas requests opencv-python tqdm pyyaml matplotlib seaborn onnx -i https://mirrors.aliyun.com/pypi/simple
pip install nvidia-tensorrt==8.4.2.4 --index-url https://pypi.ngc.nvidia.com
4.模型格式转化
4.1 通过以下链接教程从yolov7导出带NMS的onnx模型:
GitHub - Monday-Leo/YOLOv7_Tensorrt: A simple implementation of Tensorrt YOLOv7
注意修改export_onnx.py中的参数。
4.2 使用TensorRT将onnx模型转为engine模型
4.2.1 TensorRT下载
下载版本 : 8.4.2.4(8.4 GA Update 1)
https://developer.nvidia.com/nvidia-tensorrt-8x-download
4.2.2 onnx 转 engine
(triton load .trt .plan模型导致内存泄漏,原因不明,.engine可行)
方法1:
(4080显卡不可行,报错ERROR 10 + 2 ; 3060显卡可行)
在 TensorRT-8.4.2.4/bin 下执行:
./trtexec --onnx=./yolov7.onnx --saveEngine=./yolov7_fp16.engine --fp16 --workspace=200
方法2:
基于以下链接代码,导出.engine模型,注意修改export.py中的参数。
https://github.com/Linaom1214/TensorRT-For-YOLO-Series/tree/main
终端命令导出模型:
python ./export.py -o best.onnx -e best.engine -p fp16
至此模型转换结束
5.安装 triton inference server
参考链接:
Triton部署Torch和Onnx模型,集成数据预处理_triton onnx_adam-liu的博客-CSDN博客
5.1 下载 docker
sudo apt update
sudo apt-get install ca-certificates curl gnupg lsb-release
curl -fsSL http://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] http://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get install docker-ce docker-ce-cli containerd.io
systemctl start docker
apt-get -y install apt-transport-https ca-certificates curl software-properties-common
service docker restart
nvidia-docker
5.2 基于Docker下载 Triron server(22.08版本)
docker pull nvcr.io/nvidia/tritonserver:22.08-py3
5.3 创建engine模型文件夹
(1)主文件夹下创建models文件夹;
(2)创建模型文件夹;
(3)创建文件夹1,将engine模型塞进去;
(4)在模型文件夹下创建配置文件config.pbtxt
路径示例:
配置文件示例:(注意修改name、default_model_filename、input的dims,instance_group根据显卡性能和需求配置)
name: "yolov7_firesmoke_1280"
platform: "tensorrt_plan"
max_batch_size : 0
input [
{
name: "images"
data_type: TYPE_FP32
dims: [3,1280,1280]
}
]
output [
{
name: "det_boxes"
data_type: TYPE_FP32
dims: [ 100,4 ]
},
{
name: "det_classes"
data_type: TYPE_INT32
dims: [ 100 ]
},
{
name: "det_scores"
data_type: TYPE_FP32
dims: [ 100 ]
},
{
name: "num_dets"
data_type: TYPE_INT32
dims: [ 1 ]
}
]
default_model_filename: "firesmoke_1280_dec.engine"
instance_group[
{
count:2
kind:KIND_GPU
gpus:[0]
}
]
5.4启动Triton Server
5.4.1 终端手动加载模型(多模型管理)
sudo docker run --gpus=1 --rm --net=host -p8000:8000 -p8001:8001 -p8002:8002 -v /home/isst-robot/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models --strict-model-config=false --model-control-mode explicit
sudo docker run --gpus=1 --rm --net=host -p8000:8000 -p8001:8001 -p8002:8002 -v /home/isst-robot/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models --strict-model-config=false --model-control-mode explicit
其中 --strict-model-config=false --model-control-mode explicit 表示不自动加载模型,转为手动加载。
(这里要使用--gpus好像要装个依赖包)
5.4.2启动单个模型(/load加载模型,/unload卸载模型)
curl -X POST http://localhost:8000/v2/repository/models/yolov7_helmat_1280/load
curl -X POST http://localhost:8000/v2/repository/models/yolov7_mask_1280/load
5.4.3编写Client端代码
相关依赖
pip3 install nvidia-pyindex
pip3 install tritonclient[all]
pip3 install gevent
核心函数如下,使用了python+HTTP,传入image数据进行处理,预处理、后处理根据业务需要写。预处理方面由于Triton传入Tensor必须与配置文件一致,所以必须写resize填充。
def models_infer(triton_client, model_name, confidence_coefficient, image, human=False,
output0='det_boxes', output1='det_classes', output2='det_scores', output3='num_dets',
request_compression_algorithm=None,
response_compression_algorithm=None, ):
inputs = []
outputs = []
input_shape = [1280, 1280]
# ZED数据,图像只要一半
if human:
image = image[:, :image.shape[1] // 2]
image_data, offset = resize_image(image, (input_shape[0], input_shape[1]))
image_data = np.array(image_data)
# 调整Tensor (1, 3, 1280, 1280)
img = np.expand_dims((image_data.astype(np.float32)).transpose((2, 0, 1)), axis=0)
# 创建infer input
inputs.append(httpclient.InferInput('images', [1, 3, 1280, 1280], "FP32"))
inputs[0].set_data_from_numpy(img)
# OUTPUT0、OUTPUT1为配置文件中的输出节点名称
outputs.append(httpclient.InferRequestedOutput(output0, binary_data=False))
outputs.append(httpclient.InferRequestedOutput(output1, binary_data=False))
outputs.append(httpclient.InferRequestedOutput(output2, binary_data=False))
outputs.append(httpclient.InferRequestedOutput(output3, binary_data=False))
# 获取结果
results = triton_client.infer(
model_name=model_name,
inputs=inputs,
outputs=outputs,
request_compression_algorithm=request_compression_algorithm,
response_compression_algorithm=response_compression_algorithm)
# 转化为numpy格式,output0='位置', output1='种类', output2='置信', output3='目标总数'
output0 = results.as_numpy(output0)
output1 = results.as_numpy(output1)
output2 = results.as_numpy(output2)
output3 = results.as_numpy(output3)
# 后处理业务代码
# TODO:
运行结果:
参考:
Triton部署Torch和Onnx模型,集成数据预处理_triton onnx_adam-liu的博客-CSDN博客
ubuntu-desktop20.04 yolox+tensorrt yolov7+tensorrt推理部署_ubuntu yolov7_lyb_8888的博客-CSDN博客