-
TensorRT两种网络部署方式
a. parser对网络模型进行解析导入
b. 利用tensorrt api,逐层定义构建网络 -
模型→Onnx → TRT API → TRT Builder → TRT Engine
a. onnx文件包含了网络的结构和参数,onnx可视化:https://netron.app/
b. 步骤:将训练好的模型转换至onnx格式,onnx输入给Tensorrt,并指定优化参数,构建引擎得到tensorrt Engine,进行推理 -
Onnx 转换为tensorrt Engine代码
声明一个explicit batch网络→ parser解析onnx文件→根据config指定的优化参数序列化→runtime 反序列化得到engine
class Logger : public nvinfer1::ILogger {
void log(Severity severity, const char* msg) noexcept override {
// suppress info-level messages
if (severity <= Severity::kWARNING)
std::cout << msg << std::endl;
}
} logger;
auto builder = std::unique_ptr<nvinfer1::IBuilder>(nvinfer1::createInferBuilder(logger));
// network definition
uint32_t explicitBatch = 1U << static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH); // indicate this network is explicit batch
auto network = std::unique_ptr<nvinfer1::INetworkDefinition>(builder->createNetworkV2(explicitBatch));
// parser to parse the ONNX model
auto parser = std::unique_ptr<nvonnxparser::IParser>(nvonnxparser::createParser(*network, logger));
// import ONNX model
parser->parseFromFile(onnx_filename.c_str(), static_cast<int32_t>(nvinfer1::ILogger::Severity::kWARNING));
// build engine
auto config = std::unique_ptr<nvinfer1::IBuilderConfig>(builder->createBuilderConfig()); // optimization config
auto serializedModel = std::unique_ptr<nvinfer1::IHostMemory>(builder->buildSerializedNetwork(*network, *config));
// deserializing
auto runtime = std::unique_ptr<nvinfer1::IRuntime>(nvinfer1::createInferRuntime(logger));
// load engine
engine = std::shared_ptr<nvinfer1::ICudaEngine>(runtime->deserializeCudaEngine(serializedModel->data(), serializedModel->size()));