一、what ONNX and TensorRT are
onnx
You can train your model in any framework of your choice and then convert it to ONNX format.
The huge benefit of having a common format is that the software or hardware that loads your model at run time only needs to be compatible with ONNX.
不同框架(pytorch,tf,mxnet等)转为同一框架(onnx),便于在不同的软硬件平台加载模型
TensorRT
NVIDIA’s TensorRT is an SDK for high performance deep learning inference.
It provides APIs to do inference for pre-trained models and generates optimized runtime engines for your platform.
从精度,显存,硬件几个方面来加速模型推理效率
二、Enviroment
Install PyTorch, ONNX, and OpenCV
Install TensorRT
Download and install NVIDIA CUDA 10.0 or later following by official instruction: link
Download and extract CuDNN library for your CUDA version (login required): link
Download and extract NVIDIA TensorRT library for your CUDA version (login required): link. The minimum required version is 6.0.1.5. Please follow the Installation Guide for your system and don’t forget to install Python’s part
Add the absolute path to CUDA, TensorRT, CuDNN libs to the environment variable PATH or LD_LIBRARY_PATH
Install PyCUDA
三、convert
1.Load and launch a pre-trained model using PyTorch
2. Convert the PyTorch model to ONNX format
3. Visualize ONNX Model
4. Initialize model in TensorRT
Now it’s time to parse the ONNX model and initialize TensorRT Context and Engine. To do it we need to create an instance of Builder. The builder can create Network and generate Engine (that would be optimized to your platform\hardware) from this network. When we create Network we can define the structure of the network by flags, but in our case, it’s enough to use default flag which means all tensors would have an implicit batch dimension. With Network definition we can create an instance of Parser and finally, parse our ONNX file.
Tips: Initialization can take a lot of time because TensorRT tries to find out the best and faster way to perform your network on your platform. To do it only once and then use the already created engine you can serialize your engine. Serialized engines are not portable across different GPU models, platforms, or TensorRT versions. Engines are specific to the exact hardware and software they were built on.
5. Main pipeline
参考 (建议啃一下)
https://learnopencv.com/how-to-convert-a-model-from-pytorch-to-tensorrt-and-speed-up-inference/
https://www.cnblogs.com/mrlonely2018/p/14842107.html
https://learnopencv.com/how-to-run-inference-using-tensorrt-c-api/
https://blog.csdn.net/yanggg1997/article/details/111587687
本文介绍了如何将训练好的PyTorch模型转换为ONNX通用格式,以便在不同平台加载。接着,详细阐述了NVIDIA的TensorRT如何用于高性能深度学习推理,提供API以优化模型在特定平台的运行。通过步骤演示了从环境配置、模型转换到TensorRT初始化的主要过程,强调了TensorRT在找到最佳硬件优化方案时可能需要较长的初始化时间。最后,给出了主要的工作流程和相关资源链接,帮助读者理解和实践模型的转换与加速。

1768

被折叠的 条评论
为什么被折叠?



