Developer Guide :: NVIDIA Deep Learning TensorRT Documentation
一般的步骤:
第一步生成trt engine,第二步使用engine进行推理;
生成trt engine:
- The NetworkDefinition interface (C++, Python) is used to define the model.
- from onnx
- from scratch
- The BuilderConfig interface (C++, Python) is used to specify how TensorRT should optimize the model.
- precision
- memory and runtime tradeoff
- constrain the choice of CUDA kernels
- Call the builder to build the engine
- The builder creates the engine in a serialized form called a plan
执行trt engine:
- Deserialize a plan to create an engine.
- Create an execution context from the engine.
- Populate input buffers for inference.
- Call enqueueV3() on the execution context to run inference.
C++ API:
TensorRT/samples/sampleOnnxMNIST at main · NVIDIA/TensorRT · GitHub
hello world, onnx to trt example.
基本是按照上面的步骤,include相关的函数,类,进行api的调用。