PyTorch转TensorRT 加速推理

本文介绍了如何将训练好的PyTorch模型转换为ONNX通用格式,以便在不同平台加载。接着,详细阐述了NVIDIA的TensorRT如何用于高性能深度学习推理,提供API以优化模型在特定平台的运行。通过步骤演示了从环境配置、模型转换到TensorRT初始化的主要过程,强调了TensorRT在找到最佳硬件优化方案时可能需要较长的初始化时间。最后,给出了主要的工作流程和相关资源链接,帮助读者理解和实践模型的转换与加速。
摘要由CSDN通过智能技术生成

一、what ONNX and TensorRT are

onnx

You can train your model in any framework of your choice and then convert it to ONNX format.
The huge benefit of having a common format is that the software or hardware that loads your model at run time only needs to be compatible with ONNX.
不同框架(pytorch,tf,mxnet等)转为同一框架(onnx),便于在不同的软硬件平台加载模型

TensorRT

NVIDIA’s TensorRT is an SDK for high performance deep learning inference.
It provides APIs to do inference for pre-trained models and generates optimized runtime engines for your platform.
从精度,显存,硬件几个方面来加速模型推理效率

二、Enviroment

Install PyTorch, ONNX, and OpenCV
Install TensorRT
Download and install NVIDIA CUDA 10.0 or later following by official instruction: link
Download and extract CuDNN library for your CUDA version (login required): link
Download and extract NVIDIA TensorRT library for your CUDA version (login required): link. The minimum required version is 6.0.1.5. Please follow the Installation Guide for your system and don’t forget to install Python’s part
Add the absolute path to CUDA, TensorRT, CuDNN libs to the environment variable PATH or LD_LIBRARY_PATH
Install PyCUDA

三、convert

1.Load and launch a pre-trained model using PyTorch

2. Convert the PyTorch model to ONNX format

3. Visualize ONNX Model

4. Initialize model in TensorRT

Now it’s time to parse the ONNX model and initialize TensorRT Context and Engine. To do it we need to create an instance of Builder. The builder can create Network and generate Engine (that would be optimized to your platform\hardware) from this network. When we create Network we can define the structure of the network by flags, but in our case, it’s enough to use default flag which means all tensors would have an implicit batch dimension. With Network definition we can create an instance of Parser and finally, parse our ONNX file.
Tips: Initialization can take a lot of time because TensorRT tries to find out the best and faster way to perform your network on your platform. To do it only once and then use the already created engine you can serialize your engine. Serialized engines are not portable across different GPU models, platforms, or TensorRT versions. Engines are specific to the exact hardware and software they were built on.

5. Main pipeline

参考 (建议啃一下)

https://learnopencv.com/how-to-convert-a-model-from-pytorch-to-tensorrt-and-speed-up-inference/
https://www.cnblogs.com/mrlonely2018/p/14842107.html
https://learnopencv.com/how-to-run-inference-using-tensorrt-c-api/
https://blog.csdn.net/yanggg1997/article/details/111587687

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值