onnxruntime c++ 推理例子

最新推荐文章于 2025-03-05 09:34:04 发布

拧螺丝专业户2024

最新推荐文章于 2025-03-05 09:34:04 发布

阅读量1.3k

点赞数 12

文章标签：人工智能

本文链接：https://blog.csdn.net/weixin_43632469/article/details/142389172

版权

资源释放的问题。onnxruntime的对象release是无效的，从接口源码上只是将指针赋空。并未实际释放。要实现释放，需要以指针形式实现。

一个例子如下：

#include <onnxruntime_cxx_api.h>
void testimage()
{
Mat image = imread("ae14.jpg", IMREAD_UNCHANGED);
	// 创建会话选项
	Ort::SessionOptions session_options;
	Ort::Env * env = new Ort::Env(ORT_LOGGING_LEVEL_WARNING, "segment");
	
	//session_options.SetIntraOpNumThreads(1);
	// 
	// 设定单个操作(op)内部并行执行的最大线程数,可以提升速度
	session_options.DisableCpuMemArena();
	session_options.SetIntraOpNumThreads(1);
	session_options.SetGraphOptimizationLevel(ORT_ENABLE_EXTENDED);
	// 是否使用GPU
	OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0);
	// 加载模型
	Ort::Session * session = new Ort::Session(*env, _T("seg.onnx"), session_options);

	// 获取输入节点信息
	int input_nodes_num = session->GetInputCount();
	int output_nodes_num = session->GetOutputCount();
	std::vector<std::string> input_node_names;
	std::vector<std::string> output_node_names;
	Ort::AllocatorWithDefaultOptions allocator;

	int input_h = 0;
	int input_w = 0;

	// 获得输入信息
	for (int i = 0; i < input_nodes_num; i++) {
		auto input_name = session->GetInputNameAllocated(i, allocator);
		input_node_names.push_back(input_name.get());
		auto inputShapeInfo = session->GetInputTypeInfo(i).GetTensorTypeAndShapeInfo().GetShape();
		int ch = inputShapeInfo[1];
		input_h = inputShapeInfo[2];
		input_w = inputShapeInfo[3];
	}

	// 获得输出信息 多输出
	for (int i = 0; i < output_nodes_num; i++) {
		auto output_name = session->GetOutputNameAllocated(i, allocator);
		output_node_names.push_back(output_name.get());
		auto outShapeInfo = session->GetOutputTypeInfo(i).GetTensorTypeAndShapeInfo().GetShape();
		int ch = outShapeInfo[1];
		int output_h = outShapeInfo[2];
		int output_w = outShapeInfo[3];
		
	}
	
	// 填充数据输入
	Mat input0;
	cv::cvtColor(image, image, cv::COLOR_GRAY2RGB);
	image.convertTo(input0, CV_32F, 1.0 / 255, 0);
	cv::Mat blob = cv::dnn::blobFromImage(input0);

	std::vector<float> input_tensor_values(input_w * input_h * 3);

	size_t tpixels = input_w * input_h * 3;
	std::vector<int64_t> input_tensor_dims = { 1, 3, input_h, input_w }; // 批次大小, 通道数, 高度, 宽度
	auto memory_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);
	Ort::Value input_tensor = Ort::Value::CreateTensor<float>(memory_info, blob.ptr<float>(), tpixels, input_tensor_dims.data(), input_tensor_dims.size());


	// 输入一个数据
	const std::array<const char*, 1> inputNames = { input_node_names[0].c_str() };
	// 输出多个数据
	const std::array<const char*, 1> outNames = { output_node_names[0].c_str() };
	// 运行模型
	std::vector<Ort::Value> ort_outputs; 
	ort_outputs = session->Run(Ort::RunOptions{ nullptr }, inputNames.data(), &input_tensor, inputNames.size(), outNames.data(), outNames.size());

	// 获取输出数据
	// 选择最后一个输出作为最终的mask
	const int32_t* mask_data = ort_outputs[0].GetTensorMutableData<int32_t>();
	auto outShape = ort_outputs[0].GetTensorTypeAndShapeInfo().GetShape();
	int num_cn = outShape[1];
	int out_w = 1500;
	int out_h = 2000;
	LOG(WARNING) << out_w << " " << out_h<<" ";
	cv::Mat segmentation_result(input_h, input_w, CV_8UC1);
	for (int row = 0; row < out_h; row++) {
		for (int col = 0; col < out_w; col++) {
		//	int c1 = mask_data[row * out_w + col];
			segmentation_result.at<uchar>(row, col) = static_cast<uchar>(mask_data[row * out_w + col]);
		}
	}
	cv::Mat out_eq_img;
	cv::equalizeHist(segmentation_result, out_eq_img);
	
	
	// 释放资源
	delete session;
	delete env;
}

Ort::SessionOptions。

DisableProfiling 是一个会话选项，用于控制是否在模型推理过程中收集性能分析数据。当启用性能分析时，ONNX Runtime 会记录执行期间的各种性能指标，如操作符的执行时间、内存使用情况等。这些数据对于调试和优化模型推理性能非常有用。

如果你在代码中设置 DisableProfiling，那么你可能是想关闭性能分析功能。

DisableCpuMemArena 是 ONNX Runtime 中的一个会话选项，用于控制是否在 CPU 上启用内存竞技场（memory arena）。内存竞技场是一块预先分配的内存区域，用于存储模型推理过程中的所有中间数据。这种机制可以减少频繁的内存分配和释放操作，从而提高性能，尤其是在处理多个推理请求时。

然而，在某些情况下，如多线程环境中，内存竞技场可能会导致内存使用不断增加，因为分配的内存不会立即释放回操作系统。这可能会导致内存泄漏的假象，因为即使推理完成，内存也不会释放。

通过设置 DisableCpuMemArena，ONNX Runtime 将改为使用标准的 malloc 和 free 操作来管理内存，而不是使用内存竞技场。这样可以在每次推理后释放内存，从而减少内存占用。这对于内存使用非常敏感的应用或在多线程环境中运行多个推理任务时特别有用。

DisableMemPattern 是 ONNX Runtime 中的一个会话选项，用于控制是否在模型推理过程中使用内存模式优化。内存模式优化是指 ONNX Runtime 会根据模型的内存访问模式来分配和重用内存，以减少内存分配和释放的开销，从而提高推理性能。

当你设置 DisableMemPattern 时，你实际上是在告诉 ONNX Runtime 不要使用这种优化策略。这通常在某些特定的执行提供程序中是必需的，比如 DirectML 执行提供程序，它不支持内存模式优化或并行执行。在这种情况下，如果你不禁用内存模式优化，可能会导致错误或性能问题。