Onnxruntime部署C++，基于YOLOV5，YoloV8（ONNXruntime逐行讲解）

才疏学浅小菜鸡

于 2024-09-06 16:11:02 发布

阅读量1.2k

点赞数 25

文章标签： YOLO c++

本文链接：https://blog.csdn.net/qq_54700713/article/details/141959567

版权

一、ONNXruntime介绍

二、基于ONNXruntime，使用C++部署YOLO检测

2.1基本内容介绍

onnxruntime版本：1.20.0 网址csdn - 安全中心https://link.csdn.net/?target=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fonnxruntime%2Freleases%3Fpage%3D2

opencv版本：4.5.5

将压缩得到的include和lib添加到程序运行所需环境中。我使用的是Clion，使用的cmkelist，代码如下。

include_directories(${CMAKE_SOURCE_DIR}/onnxruntime-win-x64-gpu-1.12.0/include)
link_directories(${CMAKE_SOURCE_DIR}/onnxruntime-win-x64-gpu-1.12.0/lib)

2.2 ONNXruntime初始化代码讲解

首先展示一下完整的初始化代码，如下所示

YOLO::YOLO(float nms_threshold, float objThreshold, const std::string& model_path, const std::vector<std::string>& class_names, int inpWidth, int inpHeight,bool useCuda)
        : nms_threshold(nms_threshold), objThreshold(objThreshold), class_names(class_names), inpWidth(inpWidth), inpHeight(inpHeight),useCuda(useCuda)
{
    this->env = Env(ORT_LOGGING_LEVEL_ERROR, "yolo");//日志记录级别为Error

    std::wstring widestr = std::wstring(model_path.begin(), model_path.end());

    if (useCuda){
        // 设置 GPU（CUDA）执行提供程序
        OrtCUDAProviderOptions cuda_options;
        cuda_options.device_id = 0;  // 使用CUDA设备
        sessionOptions.AppendExecutionProvider_CUDA(cuda_options);//配置选项
    }

    sessionOptions.SetGraphOptimizationLevel(ORT_ENABLE_BASIC);
    ort_session = std::make_unique<Ort::Session>(env, widestr.c_str(), sessionOptions);
    // 获取输入和输出节点的信息
    Ort::AllocatorWithDefaultOptions allocator;//内存分配器，用于管理和分配内存


    std::cout << "Model Input Nodes:" << std::endl;
    for (size_t i = 0; i < ort_session->GetInputCount(); ++i) {
        char* input_name = ort_session->GetInputName(i, allocator);
        input_names.push_back(input_name);
        std::cout << "Input " << i << ": " << input_name << std::endl;
    }


    for (size_t i = 0; i < ort_session->GetInputCount(); ++i)
    {
        input_names.push_back(ort_session->GetInputName(i, allocator));

    }



    for (size_t i = 0; i < ort_session->GetOutputCount(); ++i)
    {
        output_names.push_back(ort_session->GetOutputName(i, allocator));
        Ort::TypeInfo output_type_info = ort_session->GetOutputTypeInfo(i);
        auto output_dims = output_type_info.GetTensorTypeAndShapeInfo().GetShape();
        output_node_dims.push_back(output_dims);
    }


    num_anchors = output_node_dims.at(0).at(1);
}

下面进行逐行讲解

this->env = Env(ORT_LOGGING_LEVEL_ERROR, "yolo");

这句是用于声明ONNXruntime运行的基本环境，ORT_LOGGING_LEVEL_ERROR代表检测检测过程是否出现error以上的错误，这可以保证运行过程中问题的发现检查。

 sessionOptions.AppendExecutionProvider_CUDA(cuda_options);//配置选项
 sessionOptions.SetGraphOptimizationLevel(ORT_ENABLE_BASIC);

sessionOptions可以认为是一种配置选项，用于设置运行过程中的一些基本配置，比如是否启动CUDA，线程数量设置，优化等级等（上面两行代码就表示启动cuda和优化等级为基础等级），在onnxruntime_cxx_api.h中可以看到可配置的选项，如图所示

在将需要用到的配置设置好后，就需要将所需要的一起加载（基本环境，模型，配置选项），也就是这句

ort_session = std::make_unique<Ort::Session>(env, widestr.c_str(), sessionOptions);

同时采用了智能指针的方式，可以自动释放内存。

随后就是将加载后的内容进行读取，在初始化阶段主要需要读取模型的输入和输出，也就是这样两句

input_names.push_back(ort_session->GetInputName(i, allocator));
output_names.push_back(ort_session->GetOutputName(i, allocator));

如果将input_names output_names的内容打印出来，可以看到输入为images

输出为

可以看到输出有4个，也就是4个output，对其内容基本解释为

Output 0： 25200 个预测框（3*（80*80+40*40+20*20）），每个框有 85 个特征（如 x, y, w, h, objectness score，以及 80 个类别得分）。

Output 1：一次处理1张每个单元格产生3个anchor box 80x80大小 85个输出

Output 2：一次处理1张每个单元格产生3个anchor box 40x40大小 85个输出

Output 3：一次处理1张每个单元格产生3个anchor box 20x20大小 85个输出

再进一步解释，Output0是由另外三个组成的，这三个分别是不同大小尺度，用于检测不同形状大小的物体就如下图所示。

output0主要承载的信息，就是25200个预测框，每个框有85个数据（coco数据集）。如果要找一张图具体描述他的样子，我认为是在来自同济子豪兄的无私分享-关于YOLOv1模型的学习（一）_yolov1 陈子豪师兄-CSDN博客

中的这一张图片。图中是49*30的长方体，而我们最终得到的是一个25200*85的长方形

这一项就是初始化最终目的，也就是最后一行代码

num_anchors = output_node_dims.at(0).at(1);

这个num_anchors就是output[0]中的25200元素，每个元素包含了85个信息

2.3 ONNXruntime推理代码讲解

首先，检测的代码如下所示

void YOLO::detect(Mat& srcimg)
{
    int newh = 0, neww = 0, top = 0, left = 0;

    Mat cv_image = srcimg.clone();

    cvtColor(cv_image, cv_image, COLOR_BGR2RGB);

    Mat dst = resize_image(cv_image, &newh, &neww, &top, &left);

    vector<float> input_image_ = normalize_(dst);

    array<int64_t, 4> input_shape_{ 1, 3, inpHeight, inpWidth };

    auto allocator_info = MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);//内存分配cpu

    Value input_tensor_ = Value::CreateTensor<float>(allocator_info, input_image_.data(), input_image_.size(), input_shape_.data(), input_shape_.size());

    vector<Value> ort_outputs = ort_session->Run(RunOptions{ nullptr }, &input_names[0], &input_tensor_, 1, output_names.data(), output_names.size());

    const float* outs = ort_outputs[0].GetTensorMutableData<float>();

    float ratioh = static_cast<float>(srcimg.rows) / newh, ratiow = static_cast<float>(srcimg.cols) / neww;
    int nout = class_names.size() + 5;

    for (int i = 0; i < num_anchors; i++)
    {
        const float* pdata = outs + i * nout;
        float obj_conf = pdata[4];

        if (obj_conf > objThreshold)
        {
            int max_ind = 0;
            float max_class_socre = 0;

            for (int j = 0; j < class_names.size(); j++)
            {
                if (pdata[5 + j] > max_class_socre)
                {
                    max_class_socre = pdata[5 + j];
                    max_ind = j;
                }
            }

            float x0 = max<float>((pdata[0] - 0.5f * pdata[2] - left) * ratiow, 0.f);
            float y0 = max<float>((pdata[1] - 0.5f * pdata[3] - top) * ratioh, 0.f);
            float x1 = min<float>((pdata[0] + 0.5f * pdata[2] - left) * ratiow, static_cast<float>(cv_image.cols));
            float y1 = min<float>((pdata[1] + 0.5f * pdata[3] - top) * ratioh, static_cast<float>(cv_image.rows));

            generate_boxes.push_back(BoxInfo{ x0, y0, x1, y1, max_class_socre * obj_conf, max_ind });
        }
    }

    nms(generate_boxes);

    for (const auto& box : generate_boxes)
    {
        rectangle(srcimg, Point(static_cast<int>(box.x1), static_cast<int>(box.y1)), Point(static_cast<int>(box.x2), static_cast<int>(box.y2)), Scalar(0, 0, 255), 2);
        string label = class_names[box.label] + ": " + format("%.2f%%", box.score * 100);
        putText(srcimg, label, Point(static_cast<int>(box.x1), static_cast<int>(box.y1) - 5), FONT_HERSHEY_SIMPLEX, 0.75, Scalar(0, 255, 0), 1);
    }
}

首先用opencv对图像进行标准化归一化后（由于本文主要讲解ONNXruntime的相关内容，先跳过。。），首先就是创建一些输入的相关内容，

    auto allocator_info = MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);//内存分配cpu

这行代码的主要功能是设置内存分配器。MemoryInfo::CreateCpu 创建了一个内存分配器，指示使用 CPU 内存进行分配。

    Value input_tensor_ = Value::CreateTensor<float>(allocator_info, input_image_.data(), input_image_.size(), input_shape_.data(), input_shape_.size());

这一行代码创建了一个输入张量，用于将图像数据传递给模型进行推理。它包含了输入数据及其相关信息。其中

allocator_info 定义的内存分配器信息，表示内存是在 CPU 上分配的

input_image_.data( ) 输入图像的像素数据

input_image_.size( ) 输入图像的大小（像素总数）

input_shape_.data( ) 输入张量的形状，表示输入数据的维度，例如 [1, 3, 640, 640]，即批大小为 1、3 通道的 RGB 图像，尺寸为 640x640。

input_shape_.size( ) 输入张量形状的维度数，这里为 4（即 [1, 3, 640, 640] ）

    vector<Value> ort_outputs = ort_session->Run(RunOptions{ nullptr }, &input_names[0], &input_tensor_, 1, output_names.data(), output_names.size());

RunOptions{ nullptr }代表推理时的选项，这里设置为nullptr使用默认推理方式

&input_names[0]指向输入张量的名称数组

&input_tensor_输入张量，包含图像的像素数据、形状等信息

1表示输入张量的数量，这里是 1。

output_names.data()输出张量的名称数组，表示模型推理后要输出的内容，通常是一个或多个输出结果。

output_names.size()表示输出张量的数量。