TensorRT在线加载模型,并序列化保存支持动态batch的引擎

说一下 必备步骤

1、创建一个builder
3、创建一个netwok
4、建立一个 Parser
5、建立 engine
6、建议contex
区别时建议engine时,config里需要设置一下OptimizationProfile文件

1.序列化保存模型的函数

#include <iostream>
#include "NvInfer.h"
#include "NvOnnxParser.h"
#include "logging.h"
#include "opencv2/opencv.hpp"
#include <fstream>
#include <sstream>
#include "cuda_runtime_api.h"
static Logger gLogger;
using namespace nvinfer1;
 
 
 
 
bool saveEngine(const ICudaEngine& engine, const std::string& fileName)
{
    std::ofstream engineFile(fileName, std::ios::binary);
    if (!engineFile)
    {
        std::cout << "Cannot open engine file: " << fileName << std::endl;
        return false;
    }
 
    IHostMemory* serializedEngine = engine.serialize();
    if (serializedEngine == nullptr)
    {
        std::cout << "Engine serialization failed" << std::endl;
        return false;
    }
 
    engineFile.write(static_cast<char*>(serializedEngine->data()), serializedEngine->size());
    return !engineFile.fail();
}

2.加载onnx模型,并构建动态Trt引擎

 //    1、创建一个builder
    IBuilder* pBuilder = createInferBuilder(gLogger); 
    // 2、 创建一个 network,要求网络结构里,没有隐藏的批量处理维度
    INetworkDefinition* pNetwork = pBuilder->createNetworkV2(1U << static_cast<int>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));
    
    // 3、 创建一个配置文件
    nvinfer1::IBuilderConfig* config = pBuilder->createBuilderConfig();
    // 4、 设置profile,这里动态batch专属
    IOptimizationProfile* profile = pBuilder->createOptimizationProfile();
    // 这里有个OptProfileSelector,这个用来设置优化的参数,比如(Tensor的形状或者动态尺寸),
 
    profile->setDimensions("input_1", OptProfileSelector::kMIN, Dims4(1, 1, 112, 112));
    profile->setDimensions("input_1", OptProfileSelector::kOPT, Dims4(2, 1, 112, 112));
    profile->setDimensions("input_1", OptProfileSelector::kMAX, Dims4(4, 1, 112, 112));
 
    config->addOptimizationProfile(profile);
 
    auto parser = nvonnxparser::createParser(*pNetwork, gLogger.getTRTLogger());
 
    char* pchModelPth = "./res_hjxu_temp_dynamic.onnx";
 
    if (!parser->parseFromFile(pchModelPth, static_cast<int>(gLogger.getReportableSeverity())))
    {
 
        printf("解析onnx模型失败\n");
    }
 
    int maxBatchSize = 4;
    pBuilder->setMaxWorkspaceSize(1 << 30);
    pBuilder->setMaxBatchSize(maxBatchSize);
     设置推理模式
    builder->setFp16Mode(true);
    ICudaEngine* engine = pBuilder->buildEngineWithConfig(*pNetwork, *config);
 
    std::string strTrtSavedPath = "./res_hjxu_temp_dynamic.trt";
    // 序列化保存模型
    saveEngine(*engine, strTrtSavedPath);
    nvinfer1::Dims dim = engine->getBindingDimensions(0);
    // 打印维度
    print_dims(dim);

print_dims打印维度

void print_dims(const nvinfer1::Dims& dim)
{
    for (int nIdxShape = 0; nIdxShape < dim.nbDims; ++nIdxShape)
    {
        
        printf("dim %d=%d\n", nIdxShape, dim.d[nIdxShape]);
        
    }
}

3.类IOptimizationProfile和枚举OptProfileSelector

和固定batch构建engine不同之处,在设置build配置时时,增加了个IOptimizationProfile*配置文件,如设置维度需要增加以下操作

nvinfer1::IOptimizationProfile* createOptimizationProfile()
profile->setDimensions("input_1", OptProfileSelector::kMIN, Dims4(1, 1, 112, 112));
profile->setDimensions("input_1", OptProfileSelector::kOPT, Dims4(2, 1, 112, 112));
profile->setDimensions("input_1", OptProfileSelector::kMAX, Dims4(4, 1, 112, 112));

IOptimizationProfile这个类 代码

class IOptimizationProfile
{
public:
    //! 设置动态inputTensor的最小/最合适/最大维度
    //! 不管时什么网络的输入的tensor,这个函数必须被调用三次(for the minimum, optimum, and maximum) 
    //! 一下列举了三种情况
    //! (1) minDims.nbDims == optDims.nbDims == maxDims.nbDims == networkDims.nbDims
    //! (2) 0 <= minDims.d[i] <= optDims.d[i] <= maxDims.d[i] for i = 0, ..., networkDims.nbDims-1
    //! (3) if networkDims.d[i] != -1, then minDims.d[i] == optDims.d[i] == maxDims.d[i] == networkDims.d[i]
     //! 如果选择了DLA,这三个值必须相同的
    //!
    virtual bool setDimensions(const char* inputName, OptProfileSelector select, Dims dims) noexcept = 0;
    //! \brief Get the minimum / optimum / maximum dimensions for a dynamic input tensor.
    virtual Dims getDimensions(const char* inputName, OptProfileSelector select) const noexcept = 0;
    //! \brief Set the minimum / optimum / maximum values for an input shape tensor.
    virtual bool setShapeValues(
        const char* inputName, OptProfileSelector select, const int32_t* values, int32_t nbValues) noexcept
        = 0;
    //! \brief Get the number of values for an input shape tensor.
    virtual int32_t getNbShapeValues(const char* inputName) const noexcept = 0;
    //! \brief Get the minimum / optimum / maximum values for an input shape tensor.
    virtual const int32_t* getShapeValues(const char* inputName, OptProfileSelector select) const noexcept = 0;
    //! \brief Set a target for extra GPU memory that may be used by this profile.
    //! \return true if the input is in the valid range (between 0 and 1 inclusive), else false
    //!
    virtual bool setExtraMemoryTarget(float target) noexcept = 0;
    //! \brief Get the extra memory target that has been defined for this profile.
    virtual float getExtraMemoryTarget() const noexcept = 0;
    //! \brief Check whether the optimization profile can be passed to an IBuilderConfig object.
    //! \return true if the optimization profile is valid and may be passed to an IBuilderConfig, else false
    virtual bool isValid() const noexcept = 0;
 
protected:
    ~IOptimizationProfile() noexcept = default;
};

OptProfileSelector 这个枚举
这个枚举就三个值。最小、合适、和最大

// 最小和最大,指运行时,允许的最小和最大的范围
// 最佳值,用于选择内核,这里通常为运行时,最期望的大小
// 比如在模型推理时,有两路数据,通常batch就为2,batch为1或者为4出现的概率都比较低,
// 这时候,建议kOPT选择2, min选择1,如果最大路数为4,那max就选择四

enum class OptProfileSelector : int32_t
{
    kMIN = 0, //!< This is used to set or get the minimum permitted value for dynamic dimensions etc.
    kOPT = 1, //!< This is used to set or get the value that is used in the optimization (kernel selection).
    kMAX = 2  //!< This is used to set or get the maximum permitted value for dynamic dimensions etc.
};

 

三、反序列化加载动态batch的引擎,并构建动态context,执行动态推理

定义一个加载engine的函数,和固定batch时一样的 

ICudaEngine* loadEngine(const std::string& engine, int DLACore)
{
    std::ifstream engineFile(engine, std::ios::binary);
    if (!engineFile)
    {
        std::cout << "Error opening engine file: " << engine << std::endl;
        return nullptr;
    }
 
    engineFile.seekg(0, engineFile.end);
    long int fsize = engineFile.tellg();
    engineFile.seekg(0, engineFile.beg);
 
    std::vector<char> engineData(fsize);
    engineFile.read(engineData.data(), fsize);
    if (!engineFile)
    {
        std::cout << "Error loading engine file: " << engine << std::endl;
        return nullptr;
    }
 
    IRuntime* runtime = createInferRuntime(gLogger);
    if (DLACore != -1)
    {
        runtime->setDLACore(DLACore);
    }
 
    return runtime->deserializeCudaEngine(engineData.data(), fsize, nullptr);
}

按以下步骤执行推理

----  创建engine
----  创建context
----  在显卡上创建最大batch的的内存空间,这样只需要创建一次,避免每次推理都重复创建,浪费时间
----  拷贝动态tensor的内存
----  设置动态维度,调用context->setBindingDimensions()函数
----  调用context执行推理

查看输出结果

void test_engine()
{
    std::string strTrtSavedPath = "./res_hjxu_temp_dynamic.trt";
    int maxBatchSize = 4;
    // 1、反序列化加载引擎
    ICudaEngine* engine = loadEngine(strTrtSavedPath, 0);
    // 2、创建context
    IExecutionContext* context = engine->createExecutionContext();
 
    int nNumBindings = engine->getNbBindings();
    std::vector<void*> vecBuffers;
    vecBuffers.resize(nNumBindings);
    int nInputIdx = 0;
    int nOutputIndex = 1;
 
    int nInputSize =   1 * 112 * 112 * sizeof(float);
    // 4、在cuda上创建一个最大的内存空间
    (cudaMalloc(&vecBuffers[nInputIdx], nInputSize * maxBatchSize));
    (cudaMalloc(&vecBuffers[nOutputIndex], maxBatchSize * 2 * sizeof(float)));
 
    char* pchImgPath = "./img.bmp";
    cv::Mat matImg = cv::imread(pchImgPath, -1);
    std::cout << matImg.rows << std::endl;
    cv::Mat matRzImg;
    cv::resize(matImg, matRzImg, cv::Size(112, 112));
    cv::Mat matF32Img;
    matRzImg.convertTo(matF32Img, CV_32FC1);
    matF32Img = matF32Img / 255.;
 
    for (int i = 0; i < maxBatchSize; ++i)
    {
        cudaMemcpy((unsigned char *)vecBuffers[nInputIdx] + nInputSize * i, matF32Img.data, nInputSize, cudaMemcpyHostToDevice);
 
    }
    
    // 动态维度,设置batch = 1    0指第0个tensor的维度
    context->setBindingDimensions(0, Dims4(1, 1, 112, 112));
    nvinfer1::Dims dim = context->getBindingDimensions(0);
 
    context->executeV2(vecBuffers.data());
    //context->execute(1, vecBuffers.data());
    float prob[8];
 
    (cudaMemcpy(prob, vecBuffers[nOutputIndex], maxBatchSize * 2 * sizeof(float), cudaMemcpyDeviceToHost));
 
    for (int i = 0; i < 8; ++i)
    {
        std::cout << prob[i] << "  ";
    }
    std::cout <<"\n-------------------------" << std::endl;
 
    // 动态维度,设置batch = 4
    context->setBindingDimensions(0, Dims4(4, 1, 112, 112));
    context->executeV2(vecBuffers.data());
    //context->execute(1, vecBuffers.data());
 
 
    (cudaMemcpy(prob, vecBuffers[nOutputIndex], maxBatchSize * 2 * sizeof(float), cudaMemcpyDeviceToHost));
    for (int i = 0; i < 8; ++i)
    {
        std::cout << prob[i] << "  ";
    }
    std::cout << "\n-------------------------" << std::endl;
    //std::cout << prob[0] << "  " << prob[1] << std::endl;
 
    // call api to release memory
    /// ...
    return ;
}

 

输出结果

28
10.3685  -62.4368  -177.49  -35.3763  108.109  -51.6993  -86.1457  21.3801
-------------------------
3: [executionContext.cpp::nvinfer1::rt::ExecutionContext::setBindingDimensions::976] Error Code 3: API Usage Error (Parameter check failed at: executionContext.cpp::nvinfer1::rt::ExecutionContext::setBindingDimensions::976, condition: profileMaxDims.d[i] >= dimensions.d[i]. Supplied binding dimension [4,1,28,28] for bindings[0] exceed min ~ max range at index 0, maximum dimension in profile is 1, minimum dimension in profile is 1, but supplied dimension is 4.
)
10.3685  -62.4368  -177.49  -35.3763  108.109  -51.6993  -86.1457  21.3801

  • 2
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
TensorRT提供了多种方法来调整引擎参数,以达到最优的推理效果和速度。下面是一些常用的方法: 1. 调整batch size:可以通过修改执行上下文(ExecutionContext)的setBindingDimensions方法来调整batch size。这个方法可以接受一个维度为4的元组,用于指定输入数据的形状,其中第一个维度为batch size。例如,如果原来的输入形状是(1,3,416,416),要将batch size增加到4,可以使用以下代码: ``` input_shape = (4, 3, 416, 416) context.setBindingDimensions(0, input_shape) ``` 2. 调整输入分辨率:可以通过修改输入数据的形状来调整输入分辨率。这个方法同样可以使用setBindingDimensions方法来实现。例如,如果原来的输入分辨率是416x416,要将其调整为640x640,可以使用以下代码: ``` input_shape = (batch_size, 3, 640, 640) context.setBindingDimensions(0, input_shape) ``` 3. 调整推理精度:可以通过修改推理精度(precision)来调整模型的精度和速度之间的权衡。TensorRT支持FP32、FP16和INT8三种精度。可以使用builder的setFp16Mode和setInt8Mode方法来设置模型的精度。例如,要将模型的精度设置为FP16,可以使用以下代码: ``` builder.fp16_mode = True ``` 需要注意的是,不同的精度会对模型的输出结果产生不同的影响。在使用低精度模型时,需要进行量化和校准等额外的操作,以保证模型的准确性和稳定性。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值