TensorRT教程11：使用C++API部署推理（重点）

最新推荐文章于 2024-05-28 14:47:52 发布

米斯特龙_ZXL

最新推荐文章于 2024-05-28 14:47:52 发布

阅读量5.4k

点赞数 10

分类专栏： TensorRT教程文章标签： c++ 计算机视觉人工智能深度学习物联网

本文链接：https://blog.csdn.net/weixin_41562691/article/details/119084146

版权

TensorRT教程专栏收录该内容

20 篇文章 117 订阅

订阅专栏

使用C++API部署推理（重点）

step1：创建runtime

step2：反序列化创建engine

step3：创建context

step4：获取输入输出索引

step5：创建buffers

step6：为输入输出开辟GPU显存

step7：创建cuda流

step8：从CPU到GPU----拷贝input数据

step9：异步推理

step10：从GPU到CPU----拷贝output数据

step10：同步cuda流

step11：释放资源

//step1:创建runtime
IRuntime* runtime = createInferRuntime(gLogger);
assert(runtime != nullptr);
//step2:反序列化创建engine
ICudaEngine* engine = runtime->deserializeCudaEngine(modelData, modelSize, nullptr);
assert(engine != nullptr);
// 打印绑定输入输出
printf("Bindings after deserializing:\n");
for (int bi = 0; bi < engine->getNbBindings(); bi++) 
{
    if (engine->bindingIsInput(bi) == true) 
    {
        printf("Binding %d (%s): Input.\n",  bi, engine->getBindingName(bi));
    } 
    else 
    {
        printf("Binding %d (%s): Output.\n", bi, engine->getBindingName(bi));
    }
}



//step3:创建context,创建一些空间来存储中间激活值
IExecutionContext *context = engine->createExecutionContext();
assert(context != nullptr);

//step4:根据输入输出blob名字获取输入输出索引
int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);
int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);
//step5:使用这些索引，创建buffers指向 GPU 上输入和输出缓冲区
void* buffers[2];
buffers[inputIndex] = inputBuffer;
buffers[outputIndex] = outputBuffer;
//step6：为输入输出开辟GPU显存
CUDA_CHECK(cudaMalloc(&buffers[inputIndex], batchSize * inputDim.c() * inputDim.h() * inputDim.w() * sizeof(float)));
CUDA_CHECK(cudaMalloc(&buffers[outputIndex], batchSize * outputDim.c() * outputDim.h() * outputDim.w() * sizeof(float)));
//step6：创建cuda流
cudaStream_t stream;
CUDA_CHECK(cudaStreamCreate(&stream));
//step7：从CPU到GPU----拷贝input数据
CUDA_CHECK(cudaMemcpyAsync(buffers[inputIndex],//显存上的存储区域，用于存放输入数据
                           input, //读入内存中的数据
                           batchSize * inputDim.c() * inputDim.h() * inputDim.w() * sizeof(float),
                           cudaMemcpyHostToDevice, 
                           stream));
//step8:异步推理
context->enqueueV2(buffers, stream, nullptr);

//step9：从GPU到CPU----拷贝output数据
CUDA_CHECK(cudaMemcpyAsync(output,//是内存中的数据
                           buffers[outputIndex],//是显存中的存储区,存放模型输出
                           batchSize * outputDim.c() * outputDim.h() * outputDim.w() * sizeof(float),
                           cudaMemcpyDeviceToHost,
                           stream));
//step10：同步cuda流
CUDA_CHECK(cudaStreamSynchronize(stream));
//step11：释放资源
cudaStreamDestroy(stream);
context->destroy();
engine->destroy();
runtime->destroy();
CUDA_CHECK(cudaFree(buffers[inputIndex]));
CUDA_CHECK(cudaFree(buffers[outputIndex]));

米斯特龙_ZXL

关注

10
点赞
踩
50

收藏

觉得还不错? 一键收藏
打赏
5
评论
TensorRT教程11：使用C++API部署推理（重点）

使用C++API部署推理（重点）step1：创建runtimestep2：反序列化创建enginestep3：创建contextstep4：获取输入输出索引step5：创建buffersstep6：为输入输出开辟GPU显存step7：创建cuda流step8：从CPU到GPU----拷贝input数据step9：异步推理step10：从GPU到CPU----拷贝output数据step10：同步cuda流step11：释放资源//step1:创建runtimeIRuntime*
复制链接

扫一扫