【STR文字识别项目】之最新SOTA项目PARSeq（二）转TensorRT并用C++调用

NPC里的玩家

已于 2023-06-23 23:45:21 修改

阅读量809

点赞数 1

文章标签： python 深度学习开发语言

于 2023-05-15 09:52:45 首次发布

本文链接：https://blog.csdn.net/ailaier/article/details/130677052

版权

前言

续接上篇

准备环境

Python 3.10.9

CUDA 11.6

CUDNN 8.9.0

TensorRT 8.5.3.1

优化ONNX模型

用onnx-simplifier对模型结构进行优化，项目地址

https://github.com/daquexian/onnx-simplifier

直接用pip安装即可

pip install onnxsim

安装完后直接执行命令进行优化

# input_onnx_model为输入model路径 
# output_onnx_model为输出model路径
onnxsim input_onnx_model output_onnx_model

可以用Netron对模型结构进行预览，查看前后的变化，项目地址

GitHub - lutzroeder/netron: Visualizer for neural network, deep learning, and machine learning models

ONNX转TensorRT

在TensorRT根目录下的bin文件夹中，有一个trtexec应用程序。

可直接在该路径下执行，或把bin目录设置到环境变量中，在任意地方执行。

# 生成静态batchsize的engine
./trtexec 	--onnx=<onnx_file> \ 						
        	--explicitBatch \ 						
        	--saveEngine=<tensorRT_engine_file>

用C++调用推理

和上篇ONNX一样，需要先图片预处理，推理，和结果后处理。

关键推理代码

#include<cuda_runtime_api.h>
#include<NvInfer.h>
#include<../samples/common/logger.h>
#define CHECK(status) \
    do\
    {\
        auto ret = (status);\
        if (ret != 0)\
        {\
            std::cerr << "Cuda failure: " << ret << std::endl;\
            abort();\
        }\
    } while (0)

static const int IN_H = 32;
static const int IN_W = 128;
static const int OUT_LENGTH = 26;
static const int OUT_CHARS = 95;


char* trtModelStream{ nullptr };
IRuntime* runtime{ nullptr };
ICudaEngine* engine{ nullptr };
IExecutionContext* context{ nullptr };

// 初始化
void init(){
	// create a model using the API directly and serialize it to a stream
	size_t size{ 0 };

	std::ifstream file("your_path/parseq_sim_fp16.engine", std::ios::binary);
	if (file.good()) {
		file.seekg(0, file.end);
		size = file.tellg();
		file.seekg(0, file.beg);
		trtModelStream = new char[size];
		assert(trtModelStream);
		file.read(trtModelStream, size);
		file.close();
	}

	Logger m_logger;
	runtime = createInferRuntime(m_logger);
	assert(runtime != nullptr);
	engine = runtime->deserializeCudaEngine(trtModelStream, size, nullptr);
	assert(engine != nullptr);
	context = engine->createExecutionContext();
	assert(context != nullptr);
}

// 调用推理
void doInference(IExecutionContext& context, float* input, float* output, int batchSize)
{
	const ICudaEngine& engine = context.getEngine();

	// Pointers to input and output device buffers to pass to engine.
	// Engine requires exactly IEngine::getNbBindings() number of buffers.
	assert(engine.getNbBindings() == 2);
	void* buffers[2];

	// In order to bind the buffers, we need to know the names of the input and output tensors.
	// Note that indices are guaranteed to be less than IEngine::getNbBindings()
	const int inputIndex = engine.getBindingIndex(IN_NAME);
	const int outputIndex = engine.getBindingIndex(OUT_NAME);

	// Create GPU buffers on device
	CHECK(cudaMalloc(&buffers[inputIndex], batchSize * 3 * IN_H * IN_W * sizeof(float)));
	CHECK(cudaMalloc(&buffers[outputIndex], batchSize * OUT_LENGTH * OUT_CHARS * sizeof(float)));

	// Create stream
	cudaStream_t stream;
	CHECK(cudaStreamCreate(&stream));

	// DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host
	CHECK(cudaMemcpyAsync(buffers[inputIndex], input, batchSize * 3 * IN_H * IN_W * sizeof(float), cudaMemcpyHostToDevice, stream));
	context.enqueueV2(buffers, stream, nullptr);
	CHECK(cudaMemcpyAsync(output, buffers[outputIndex], batchSize * OUT_LENGTH * OUT_CHARS * sizeof(float), cudaMemcpyDeviceToHost, stream));
	cudaStreamSynchronize(stream);

	// Release stream and buffers
	cudaStreamDestroy(stream);
	CHECK(cudaFree(buffers[inputIndex]));
	CHECK(cudaFree(buffers[outputIndex]));
}

上面代码关键部分来自MMDeploy的教程，既授人以鱼又授人以渔，良心推荐。

mmdeploy/06_introduction_to_tensorrt.md at main · open-mmlab/mmdeploy · GitHub

不过要注意，context.enqueue这里应该是TensorRT版本问题，新版已经改成context.enqueueV2，要改一下，不改会报错。

也可以用上面链接里面的Python代码先对转换的模型进行验证，没问题再用C++部署。

最终实现效果推理一次2 - 3ms。