深度学习系列01——TensorRT模型部署流程

7 篇文章 0 订阅
6 篇文章 0 订阅

模型部署流程

1. 模型准备

pytorch -> (onnx) -> trt engine

trtexec --onnx=output.onnx --saveEngine=outfp32.engine --workspace=2048 --minShapes=x:1x3x224x224 --optShapes=x:1x3x224x224 --maxShapes=x:1x3x224x224

trtexec --onnx=output.onnx --saveEngine=outfp16.engine --workspace=2048 --minShapes=x:1x3x224x224 --optShapes=x:1x3x224x224 --maxShapes=x:1x3x224x224 --fp16

trtexec --onnx=output.onnx --saveEngine=outfpi8.engine --workspace=2048 --minShapes=x:1x3x224x224 --optShapes=x:1x3x224x224 --maxShapes=x:1x3x224x224  --int8

trtexec --onnx=output.onnx --saveEngine=outfpbest.engine --workspace=2048 --minShapes=x:1x3x224x224 --optShapes=x:1x3x224x224 --maxShapes=x:1x3x224x224  --best

2. 准备图片输入

  1. 尺寸适配:图片固定长宽比 resize + padding 到模型要求的输入尺寸
  2. 归一化:减均值,除方差 -> float 格式
  3. 展开:按channel展开成一维float数组 (size = 3×w×h)
  4. 最终输入就是一维数组

3. 结果输出

  1. 根据网络结构有几个output head,就绑定几个buffer
    void doInference(IExecutionContext& context, float* input, float* output, const int output_size, Size input_shape) {
        const ICudaEngine& engine = context.getEngine();
    
        // Pointers to input and output device buffers to pass to engine.
        // Engine requires exactly IEngine::getNbBindings() number of buffers.
        assert(engine.getNbBindings() == 2);
        void* buffers[2];
    
        // In order to bind the buffers, we need to know the names of the input and output tensors.
        // Note that indices are guaranteed to be less than IEngine::getNbBindings()
        const int inputIndex = engine.getBindingIndex(INPUT_BLOB_NAME);
    
        assert(engine.getBindingDataType(inputIndex) == nvinfer1::DataType::kFLOAT);
        const int outputIndex = engine.getBindingIndex(OUTPUT_BLOB_NAME);
        assert(engine.getBindingDataType(outputIndex) == nvinfer1::DataType::kFLOAT);
        int mBatchSize = engine.getMaxBatchSize();
    
        // Create GPU buffers on device
        CHECK(cudaMalloc(&buffers[inputIndex], 3 * input_shape.height * input_shape.width * sizeof(float)));
        CHECK(cudaMalloc(&buffers[outputIndex], output_size*sizeof(float)));
    
        // Create stream
        cudaStream_t stream;
        CHECK(cudaStreamCreate(&stream));
    
        // DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host
        CHECK(cudaMemcpyAsync(buffers[inputIndex], input, 3 * input_shape.height * input_shape.width * sizeof(float), cudaMemcpyHostToDevice, stream));
        context.enqueue(1, buffers, stream, nullptr);
        CHECK(cudaMemcpyAsync(output, buffers[outputIndex], output_size * sizeof(float), cudaMemcpyDeviceToHost, stream));
        cudaStreamSynchronize(stream);
    
        // Release stream and buffers
        cudaStreamDestroy(stream);
        CHECK(cudaFree(buffers[inputIndex]));
        CHECK(cudaFree(buffers[outputIndex]));
    }
    
  2. decode输出:阈值处理、非极大值抑制、还原Box位置和尺寸、还原关键点位置

参考

https://blog.csdn.net/HaoZiHuang/article/details/125859167
https://blog.csdn.net/weixin_42492254/article/details/126028199
https://github.com/ifzhang/ByteTrack/blob/main/deploy/TensorRT/cpp/src/bytetrack.cpp

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值