Opencv4.4.0+YOLOV4视频测试

Opencv4.4.0+YOLOV4视频测试

接上一篇文章,配置完opencv4.4.0之后,我在QT环境测试了一下yolov4及其tiny版本,顺便也跑了下yolov3及其tiny版本,可能是因为用的视频比较简单,我并没有感觉出两个版本的明显差距。

本文主要参考OpenCV4实现YoloV3算法
但他的代码视频处理部分有一些bug,我进行了一些改动。

1. 下载YOLOV4

从这里AlexeyAB/darknet下载yolo的配置文件和权重文件。
整个下载下来就行,我们要用的只有其中的cfg文件。
权重文件在下面的Readme里,找一找就能看到。

2. 测试代码

yolo.cpp

#include "yolo.h"

YOLOV3::YOLOV3(float confThreshold, float nmsThreshold, int inpWidth, int inpHeight)
{
    mfConfThreshold = confThreshold;
    mfNmsThreshold = nmsThreshold;

    mInpWidth = inpWidth;
    mInpHeight = inpHeight;
}

void YOLOV3::detect_image(string image_path, string modelWeights, string modelConfiguration, string classesFile, std::string& outputFile)
{
    // Load names of vClasses
    ifstream ifs(classesFile.c_str());
    std::string line;
    while (getline(ifs, line)) vClasses.push_back(line);

    // Load the network
    dnn::Net net = dnn::readNetFromDarknet(modelConfiguration, modelWeights);
    net.setPreferableBackend(dnn::DNN_BACKEND_OPENCV);
    net.setPreferableTarget(dnn::DNN_TARGET_CPU);

    // Create a window
    static const string kWinName = "Deep learning object detection in OpenCV";
    namedWindow(kWinName, WINDOW_AUTOSIZE);

    // Create a 4D blob from a frame.
    cv::Mat blob;
    cv::Mat frame = cv::imread(image_path);

    // Scale transformation, scaling, subtracting mean, channel transformation
    dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(mInpWidth, mInpHeight), Scalar(0, 0, 0), true, false);

    // Sets the input to the network
    net.setInput(blob);

    // Runs the forward pass to get output of the output layers
    vector<Mat> outs;
    net.forward(outs, getOutputsNames(net));

    // Remove the bounding boxes with low confidence
    postprocess(frame, outs);

    // Put efficiency information. The function getPerfProfile returns the overall time for inference(t) and the timings for each of the layers(in layersTimes)
    std::vector<double> layersTimes;
    double freq = getTickFrequency() / 1000;
    double t = net.getPerfProfile(layersTimes) / freq;
    std::string label = format("Inference time for a frame : %.2f ms", t);

    putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 0, 255));

    // Write the frame with the detection boxes
    imshow(kWinName, frame);
    cv::imwrite(outputFile, frame);
}

void YOLOV3::detect_video(string video_path, string modelWeights, string modelConfiguration, string classesFile, std::string& outputFile)
{
    // Load names of vClasses
    ifstream ifs(classesFile.c_str());
    string line;
    while (getline(ifs, line)) vClasses.push_back(line);

    // Load the network
    dnn::Net net = dnn::readNetFromDarknet(modelConfiguration, modelWeights);
    net.setPreferableBackend(dnn::DNN_BACKEND_OPENCV);
    net.setPreferableTarget(dnn::DNN_TARGET_CPU);

    // Open a video file or an image file or a camera stream.
    VideoCapture cap;
    VideoWriter video;

    Mat frame, blob;

    try {
        // Open the video file
        ifstream ifile(video_path);
        if (!ifile) throw("error");
        cap.open(video_path);
    }
    catch (...) {
        cout << "Could not open the input image/video stream" << endl;
        return;
    }

    // Get the video writer initialized to save the output video
    video.open(outputFile, VideoWriter::fourcc('M', 'J', 'P', 'G'), 28, Size(cap.get(CAP_PROP_FRAME_WIDTH), cap.get(CAP_PROP_FRAME_HEIGHT)));

    // Create a window
    static const string kWinName = "Deep learning object detection in OpenCV";
    namedWindow(kWinName, WINDOW_NORMAL);

    // Process frames.
    while (waitKey(1) < 0)
    {
        // get frame from the video
        cap >> frame;

        // Stop the program if reached end of video
        if (frame.empty()) {
            cout << "Done processing !!!" << endl;
            cout << "Output file is stored as " << outputFile << endl;
            waitKey(3000);
            break;
        }

        // Create a 4D blob from a frame.
        dnn::blobFromImage(frame, blob, 1 / 255.0, cv::Size(mInpWidth, mInpHeight), Scalar(0, 0, 0), true, false);

        //Sets the input to the network
        net.setInput(blob);

        // Runs the forward pass to get output of the output layers
        vector<Mat> outs;
        net.forward(outs, getOutputsNames(net));

        // Remove the bounding boxes with low confidence
        postprocess(frame, outs);

        // Put efficiency information. The function getPerfProfile returns the overall time for inference(t) and the timings for each of the layers(in layersTimes)
        vector<double> layersTimes;
        double freq = getTickFrequency() / 1000;
        double t = net.getPerfProfile(layersTimes) / freq;

        string label = format("Inference time for a frame : %.2f ms", t);

        cv::putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 0, 255));

        // Write the frame with the detection boxes
        Mat detectedFrame;
        frame.convertTo(detectedFrame, CV_8U);
        video.write(detectedFrame);

        imshow(kWinName, frame);
    }

    cap.release();
    video.release();
}

// Scan through all the bounding boxes output from the network and keep only the
// ones with high confidence scores. Assign the box's class label as the class
// with the highest score for the box.
void YOLOV3::postprocess(cv::Mat& frame, const std::vector<cv::Mat>& outs)
{
    for (size_t i = 0; i < outs.size(); ++i)
    {
        float* data = (float*)outs[i].data;
        for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols)
        {
            cv::Point2i classIdPoint;
            double confidence;
            cv::Mat scores = outs[i].row(j).colRange(5, outs[i].cols);

            // Get the maximum score value in a matrix or vector and locate it
            minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);

            if (confidence > mfConfThreshold)
            {
                // Get the parameters of the rectangular box.
                int centerX = (int)(data[0] * frame.cols);
                int centerY = (int)(data[1] * frame.rows);
                int width = (int)(data[2] * frame.cols);
                int height = (int)(data[3] * frame.rows);

                int left = centerX - width / 2;
                int top = centerY - height / 2;

                if (left < 0) left = 0;
                if (top < 0) top = 0;

                vClassIds.push_back(classIdPoint.x);
                vConfidences.push_back((float)confidence);
                vBoxes.push_back(Rect(left, top, width, height));
            }
        }
    }

    // Perform non maximum suppression to eliminate redundant overlapping boxes with lower confidences
    dnn::NMSBoxes(vBoxes, vConfidences, mfConfThreshold, mfNmsThreshold, vIndices);

    for (size_t i = 0; i < vIndices.size(); ++i)
    {
        int idx = vIndices[i];
        Rect box = vBoxes[idx];

        int right = box.x + box.width;
        int bottom = box.y + box.height;

        if (right > frame.cols) right = frame.cols;
        if (bottom > frame.rows) bottom = frame.rows;


        drawPred(vClassIds[idx], vConfidences[idx], box.x, box.y, right, bottom, frame);
    }
    vIndices.clear();
    vBoxes.clear();
    vClasses.clear();
    vConfidences.clear();
}

// Draw the predicted bounding box
void YOLOV3::drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame)
{
    //Draw a rectangle displaying the bounding box
    rectangle(frame, Point(left, top), Point(right, bottom), Scalar(255, 178, 50), 3);

    //Get the label for the class name and its confidence
    string label = format("%.2f", conf);
    if (!vClasses.empty())
    {
        CV_Assert(classId < (int)vClasses.size());
        label = vClasses[classId] + ":" + label;
    }

    //Display the label at the top of the bounding box
    Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, NULL);
    top = max(top, labelSize.height);

    cv::rectangle(frame, Point(left, top - round(1.5*labelSize.height)), Point(left + round(1.5*labelSize.width), top + labelSize.height), Scalar(255, 255, 255), FILLED);
    cv::putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.75, Scalar(0, 0, 0), 1);
}

// Get the names of the output layers
std::vector<String> YOLOV3::getOutputsNames(const cv::dnn::Net& net)
{
    static vector<String> names;
    if (names.empty())
    {
        //Get the indices of the output layers, i.e. the layers with unconnected outputs
        vector<int> outLayers = net.getUnconnectedOutLayers();

        //get the names of all the layers in the network
        vector<String> layersNames = net.getLayerNames();

        // Get the names of the output layers in names
        names.resize(outLayers.size());
        for (size_t i = 0; i < outLayers.size(); ++i)
            names[i] = layersNames[outLayers[i] - 1];
    }
    return names;
}

yolo.h

#ifndef YOLO_H
#define YOLO_H

#include <fstream>
#include <sstream>
#include <iostream>

#include <opencv2/opencv.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/dnn.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>

using namespace cv;
using namespace std;

class YOLOV3
{
public:
    YOLOV3(float confThreshold = 0.5, float nmsThreshold = 0.4, int inpWidth = 416, int inpHeight = 416);

    void detect_image(std::string image_path, std::string modelWeights, std::string modelConfiguration, std::string classesFile, std::string& outputFile);
    void detect_video(std::string video_path, std::string modelWeights, std::string modelConfiguration, std::string classesFile, std::string& outputFile);

    // Remove the bounding boxes with low confidence using non-maxima suppression
    void postprocess(cv::Mat& frame, const std::vector<cv::Mat>& outs);

    // Get the names of the output layers
    std::vector<String> getOutputsNames(const cv::dnn::Net& net);

    // Draw the predicted bounding box
    void drawPred(int classId, float conf, int left, int top, int right, int bottom, cv::Mat& frame);

private:

    // Initialize the parameters
    float mfConfThreshold;          // Confidence threshold
    float mfNmsThreshold;           // Non-maximum suppression threshold
    int mInpWidth;                  // Width of network's input image
    int mInpHeight;                 // Height of network's input image

    std::vector<int> vClassIds;     // The index corresponding to the category name
    std::vector<string> vClasses;   // Classification name of a category
    std::vector<float> vConfidences;// Maximum confidence greater than confidence threshold
    std::vector<cv::Rect> vBoxes;   // Various category boxes
    std::vector<int> vIndices;      // Candidate box index after non-maximum suppression
};


#endif // YOLO_H

main.cpp

#include <string>
#include "yolo.h"

#define YOLOV3CFG      "/home/cloud/darknet-master4/cfg/yolov3.cfg"
#define YOLOV3WEIGHTS  "/home/cloud/darknet-master4/yolov3.weights"
#define YOLOV3tCFG     "/home/cloud/darknet-master4/cfg/yolov3-tiny.cfg"
#define YOLOV3tWEIGHTS "/home/cloud/darknet-master4/yolov3-tiny.weights"
#define YOLOV4CFG      "/home/cloud/darknet-master4/cfg/yolov4.cfg"
#define YOLOV4WEIGHTS  "/home/cloud/darknet-master4/yolov4.weights"
#define YOLOV4tCFG     "/home/cloud/darknet-master4/cfg/yolov4-tiny.cfg"
#define YOLOV4tWEIGHTS "/home/cloud/darknet-master4/yolov4-tiny.weights"

int main()
{
    // Give the configuration and weight files for the model
    string modelConfiguration =  YOLOV4CFG;
    string modelWeights =  YOLOV4WEIGHTS;
    string classesFile = "/home/cloud/darknet-master4/data/coco.names";

    // Enter an image or video
    string image_path = "/home/cloud/darknet-master4/data/person.jpg";
    string video_path = "/home/cloud/darknet-master4/1.mp4";

    // Output path settings
    std::string image_outputFile = "/home/cloud/darknet-master4/yolov4.jpg";
    std::string video_outputFile = "/home/cloud/darknet-master4/yolov4_out.avi";

    //Confidence threshold;Non-maximum suppression threshold;Width of network's input image;Height of network's input image
    YOLOV3 yolov3(0.5, 0.3, 416, 416);
	//测试图片
//    yolov3.detect_image(image_path, modelWeights, modelConfiguration, classesFile, image_outputFile);
    //测试视频
    yolov3.detect_video(video_path, modelWeights, modelConfiguration, classesFile, video_outputFile);

    cv::waitKey(0);
    return 0;
}

3. 效果图

在这里插入图片描述

  • 0
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
告知:需要学习YOLOv4进行TT100K数据集上中国交通标志识别的学员请前往(1) Ubuntu系统《YOLOv4目标检测实战:中国交通标志识别》课程链接:https://edu.csdn.net/course/detail/29362(2)《Windows版YOLOv4目标检测实战:中国交通标志识别》课程链接:https://edu.csdn.net/course/detail/29363在无人驾驶中,交通标志识别是一项重要的任务。本课程中的项目以美国交通标志数据集LISA为训练对象,采用YOLOv3目标检测方法实现实时交通标志识别。具体项目过程包括包括:安装Darknet、下载LISA交通标志数据集、数据集格式转换、修改配置文件、训练LISA数据集、测试训练出的网络模型、性能统计(mAP计算和画出PR曲线)和先验框聚类。YOLOv3基于深度学习,可以实时地进行端到端的目标检测,以速度快见长。本课程将手把手地教大家使用YOLOv3实现交通标志的多目标检测。本课程的YOLOv3使用Darknet,在Ubuntu系统上做项目演示。 Darknet是使用C语言实现的轻型开源深度学习框架,依赖少,可移植性好,值得深入学习和探究。除本课程《YOLOv3目标检测实战:交通标志识别》外,本人推出了有关YOLOv3目标检测的系列课程,请持续关注该系列的其它课程视频,包括:《YOLOv3目标检测实战:训练自己的数据集》《YOLOv3目标检测:原理与源码解析》《YOLOv3目标检测:网络模型改进方法》另一门课程《YOLOv3目标检测实战:训练自己的数据集》主要是介绍如何训练自己标注的数据集。而本课程的区别主要在于学习对已标注数据集的格式转换,即把LISA数据集从csv格式转换成YOLOv3所需要的PASCAL VOC格式和YOLO格式。本课程提供数据集格式转换的Python代码。请大家关注以上课程,并选择学习。下图是使用YOLOv3进行交通标志识别的测试结果

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值