Windows系统下MMDeploy预编译包的使用

Windows系统下MMDeploy预编译包的使用

MMDeploy步入v1版本后安装/使用难度大幅下降,这里以部署MMDetection项目的Faster R-CNN模型为例,将PyTorch模型转换为ONNX进而转换为Engine模型,部署到TensorRT后端,实现高效推理,主要参考了官方文档

说明:制作本教程时,MMDeploy版本是v1.2.0

本机环境
  • Windows 11

  • Powershell 7

  • Visual Studio 2019

  • CUDA版本:11.7

  • CUDNN版本:8.6

  • Python版本:3.8

  • PyTorch版本:1.13.1

  • TensorRT版本:v8.5.3.1

  • mmdeploy版本:v1.2.0

  • mmdet版本:v3.0.0

1. 准备环境

每一步网上教程比较多,不多描述

  • 安装Visual Studio 2019,勾选C++桌面开发,一定要选中Win10 SDK,貌似现在还不支持VS2022

  • 安装CUDA&CUDNN

    • 注意版本对应关系
    • 一定要先安装VS2019,否则visual studio Integration无法安装成功,后面会报错
    • 默认安装选项即可,如果不是默认安装,一定要勾选visual studio Integration
  • Anaconda3/MiniConda3

    安装完毕后,创建一个环境

    conda create -n faster-rcnn-deploy python=3.8 -y
    conda activate faster-rcnn-deploy
    
  • 安装GPU版本的PyTorch

    pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
    
  • 安装OpenCV-Python

    pip install opencv-python
    
2. 安装TensorRT

登录官网下载即可,这里直接给出我用的链接

https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/secure/8.5.3/zip/TensorRT-8.5.3.1.Windows10.x86_64.cuda-11.8.cudnn8.6.zip

下载完成后,解压,进入解压的文件夹

  • 新建一个用户/系统变量TENSORRT_DIR,值为当前目录

  • 然后重启powershell,激活环境,此时可用$env:TENSORRT_DIR访问TensorRT安装目录

  • $env:TENSORRT_DIR\lib加入PATH路径

  • 然后重启powershell,激活环境

  • 安装对应python版本的wheel包

    pip install $env:TENSORRT_DIR\python\tensorrt-8.5.3.1-cp38-none-win_amd64.whl
    
  • 安装pycuda

    pip install pycuda
    
3. 安装mmdeploy及runtime
  • mmdeploy:模型转换API

  • runtime:模型推理API

    pip install mmdeploy==1.2.0
    pip install mmdeploy-runtime-gpu==1.2.0
    
4. 克隆MMDeploy仓库

新建一个文件夹,后面所有的仓库/文件均放在此目录下

克隆mmdeploy仓库主要是需要用到里面的配置文件

git clone -b main https://github.com/open-mmlab/mmdeploy.git
5. 安装MMDetection

需要先安装MMCV:

pip install -U openmim
mim install "mmcv>=2.0.0rc2"

克隆并编译安装mmdet:

git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
git checkout v3.0.0
pip install -v -e .
cd ..
4. 进行转换

文件目录如下:

./faster-rcnn-deploy/
├── app.py
├── checkpoints
├── convert.py
├── infer.py
├── mmdeploy
├── mmdeploy_model
├── mmdetection
├── output_detection.png
└── tmp.py
  • 部署配置文件:mmdeploy/configs/mmdet/detection/detection_tensorrt-fp16_dynamic-320x320-1344x1344.py

  • 模型配置文件:mmdetection/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py

  • 模型权重文件:checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth,这里是用的openmmlab训练好的权重,粘贴到浏览器,或者可以通过windows下的 wget 下载:

    wget -P checkpoints https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth
    
  • 测试图片文件:mmdetection/demo/demo.jpg

  • 保存目录:mmdeploy_model/faster-rcnn-deploy-fp16

convert.py内容如下:

from mmdeploy.apis import torch2onnx
from mmdeploy.apis.tensorrt import onnx2tensorrt
from mmdeploy.backend.sdk.export_info import export2SDK
import os

img = "mmdetection/demo/demo.jpg"
work_dir = "mmdeploy_model/faster-rcnn-deploy-fp16"
save_file = "end2end.onnx"
deploy_cfg = "mmdeploy/configs/mmdet/detection/detection_tensorrt-fp16_dynamic-320x320-1344x1344.py"
model_cfg = "mmdetection/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py"
model_checkpoint = "checkpoints/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth"
device = "cuda"

# 1. convert model to IR(onnx)
torch2onnx(img, work_dir, save_file, deploy_cfg, model_cfg, model_checkpoint, device)

# 2. convert IR to tensorrt
onnx_model = os.path.join(work_dir, save_file)
save_file = "end2end.engine"
model_id = 0
device = "cuda"
onnx2tensorrt(work_dir, save_file, model_id, deploy_cfg, onnx_model, device)

# 3. extract pipeline info for sdk use (dump-info)
export2SDK(deploy_cfg, model_cfg, work_dir, pth=model_checkpoint, device=device)

运行结果:

[08/30/2023-17:36:13] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +84, GPU +109, now: CPU 84, GPU 109 (MiB)
5. 推理测试

infer.py内容如下:

from mmdeploy.apis import inference_model

deploy_cfg = "mmdeploy/configs/mmdet/detection/detection_tensorrt-fp16_dynamic-320x320-1344x1344.py"
model_cfg = "mmdetection/configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py"
backend_files = ["mmdeploy_model/faster-rcnn-fp16/end2end.engine"]
img = "mmdetection/demo/demo.jpg"
device = "cuda"
result = inference_model(model_cfg, deploy_cfg, backend_files, img, device)

print(result)

运行结果:

08/30 17:42:43 - mmengine - INFO - Successfully loaded tensorrt plugins from F:\miniconda3\envs\faster-rcnn-deploy\lib\site-packages\mmdeploy\lib\mmdeploy_tensorrt_ops.dll
08/30 17:42:43 - mmengine - INFO - Successfully loaded tensorrt plugins from F:\miniconda3\envs\faster-rcnn-deploy\lib\site-packages\mmdeploy\lib\mmdeploy_tensorrt_ops.dll
...
...

inference_model每调用一次就会加载一次模型,效率很低,只是用来测试模型可用性,不能用在生产环境。要高效使用模型,可以集成Detector到自己的应用程序里面,一次加载,多次推理。如下:

6. 集成检测器到自己的应用中

app.py内容如下:

from mmdeploy_runtime import Detector
import cv2

# 读取图片
img = cv2.imread("mmdetection/demo/demo.jpg")

# 创建检测器
detector = Detector(
    model_path="mmdeploy_model/faster-rcnn-deploy-fp16",
    device_name="cuda",
    device_id=0,
)
# 执行推理
bboxes, labels, _ = detector(img)
# 使用阈值过滤推理结果,并绘制到原图中
indices = [i for i in range(len(bboxes))]
for index, bbox, label_id in zip(indices, bboxes, labels):
    [left, top, right, bottom], score = bbox[0:4].astype(int), bbox[4]
    if score < 0.3:
        continue
    cv2.rectangle(img, (left, top), (right, bottom), (0, 255, 0))

cv2.imwrite("output_detection.png", img)

调用这个API可以将训练的深度学习模型无缝集成到web后端里面,一次加载,多次推理

原图:

demo
推理检测后:
image-20230830175626695

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值