opencv---关于DNN的若干学习笔记

最新推荐文章于 2024-10-13 22:56:17 发布

夜雨_小学徒

最新推荐文章于 2024-10-13 22:56:17 发布

阅读量1.6k

点赞数 2

分类专栏： opencv 图像处理文章标签： opencv 深度学习

本文链接：https://blog.csdn.net/m0_37992521/article/details/105156693

版权

opencv 同时被 2 个专栏收录

11 篇文章 2 订阅

订阅专栏

图像处理

11 篇文章 1 订阅

订阅专栏

1.什么是DNN？

DNN全称deep neural network，深度神经网络。是深度学习的基础。

2.opencv中关于DNN的常用api。

（1）加载网络模型的api

Net 
cv::dnn::readNet (const String &model, const String &config="", const String &framework="")
Net 
cv::dnn::readNetFromCaffe (const String &prototxt, const String &caffeModel=String())
Net 
cv::dnn::readNetFromTensorflow (const String &model, const String &config=String())
Net 
cv::dnn::readNetFromTorch (const String &model, bool isBinary=true, bool evaluate=true)
Net 
cv::dnn::readNetFromDarknet (const String &cfgFile, const String &darknetModel=String())

model二进制文件包含经过训练的权重，对于来自不同框架的模型需要使用不同的文件拓展名。

config文本文件包含网络配置，针对不同的框架也有不同的拓展名。

（1）*.caffemodel：caffe框架；*.prototxt;

（2）*.pb：tensorflow框架；*.pbtxt;

（3）*.weights：Darknet框架；*.cfg;

（4）*.t7：torch框架；

（5）*.bin：DLDT框架。*.xml.

（2）将输入图像转换为模型的标准输入

Mat cv::dnn::blobFromImage (InputArray image, double scalefactor = 1.0, 
const Size & size = Size(), 
const Scalar & mean = Scalar(), 
bool swapRB = false, 
bool crop = false, 
int ddepth = CV_32F )

除了第一个参数表示输入图像，这个函数的其他输入参数完全是由所选择的网络模型的参数决定。第二个参数表示对像素值进行缩放的比例；第三个参数表示对图像进行均值处理所需要的均值大小，第四个参数表示R通道和B通道是否需要交换；第五个参数表示是否需要对图像进行剪切。第六个参数表示输出的深度。

（3）设置模型的输入

void cv::dnn::Net::setInput (InputArray blob, const String & name = "", 
double scalefactor = 1.0, const Scalar & mean = Scalar() )

第一个参数模型的标准输入数据，是blobFromImage函数处理的结果；第二个参数表示模型输入层的名字，需要查找模型得到。

（4）设置模型的输出

Mat cv::dnn::Net::forward (const String & outputName = String())

参数表示模型输出层的名字，函数是为模型选择输出层。输出的结果则是一个四维的数据，前两个的维度是一，第三个表示检测到的box数量，第四个表示每个box的分类标签、得分信息、坐标位置。这里每个box的坐标均是浮点数的比率，若要显示需要先转换成像素值坐标。

3.DNN解析网络输出结果

（1）如果对象检测网络是SSD/RCNN/Faster-RCNN，输出的是N*7模式，所以其解析方式如下

Mat detectionMat(out.size[2],out.size[3],CV_32F,out.ptr<float>())

其中7表示七列输出，第一列表示下标，第二列表示分类标签，第三列表示置信度，第四列至第七列表示box的坐标位置。

（2）如果对象检测网络是基于Region的YOLO网络，则对象解析方式变为

Mat scores=outs[i].row(j).colRange(5,outs[i].cols);

表示第i层输出的第j行的5-cols列，均表示评分。前五个是cx，cy，w，h，置信度。

4.应用：基于SSD的目标检测

代码

#include<opencv.hpp>
#include<dnn.hpp>
#include<iostream>
using namespace cv;
using namespace cv::dnn;
using namespace std;

const size_t width = 300;//SSD模型的输入大小是300*300*3
const size_t height = 300;
string label_file = "labelmap_det.txt";
string model_file = "MobileNetSSD_deploy.caffemodel";
string model_text_file = "MobileNetSSD_deploy.prototxt";
String objNames[] = { "background",
"aeroplane", "bicycle", "bird", "boat",
"bottle", "bus", "car", "cat", "chair",
"cow", "diningtable", "dog", "horse",
"motorbike", "person", "pottedplant",
"sheep", "sofa", "train", "tvmonitor" };

int main()
{
	Mat frame = imread("2.jpg");
	if (frame.empty())
	{
		cout << "could not read image..." << endl;
		return -1;
	}
	Net net = readNetFromCaffe(model_text_file, model_file);
	Mat blobImage = blobFromImage(frame, 0.007843, Size(width, height), Scalar(127.5, 127.5, 127.5), true, false);//将输入图像转换为模型的标准输入
	cout << "blobImage width=" << blobImage.cols << ",height=" << blobImage.rows << endl;
	net.setInput(blobImage, "data");//设置模型输入
	Mat detection = net.forward("detection_out");//设置模型输出,输出一共有4维,分别是：标签、置信度、目标数量和目标的信息
	vector<double>layersTiming;//计算时间
	double freq = getTickFrequency() / 1000;
	double time = net.getPerfProfile(layersTiming) / freq;
	cout << "excute time:" << time << endl;
	
	Mat detectionMat(detection.size[2], detection.size[3], CV_32F, detection.ptr<float>());//将检测到的目标转成矩阵表示
	float confidence_threshold = 0.5;
	for (int i = 0; i < detectionMat.rows; i++)
	{
		float confidence = detectionMat.at<float>(i, 2);//矩阵的第三列表示目标的置信度
		if(confidence>confidence_threshold)
		{
			size_t objectIdx = (size_t)detectionMat.at<float>(i, 1);//第二列表示目标的分类标签
			float tl_x = detectionMat.at<float>(i, 3)*frame.cols;//后面四列表示box的四个坐标位置
			float tl_y = detectionMat.at<float>(i, 4)*frame.rows;//坐标值是浮点数的比率,需要转换成像素坐标
			float br_x = detectionMat.at<float>(i, 5)*frame.cols;
			float br_y = detectionMat.at<float>(i, 6)*frame.rows;
			Rect object_box((int)tl_x, (int)tl_y, (int)br_x, (int)br_y);
			rectangle(frame, object_box, Scalar(0, 0, 255), 2, 8, 0);
			putText(frame, format("confidence %.2f,%s", confidence, objNames[objectIdx].c_str()), Point(tl_x - 10, tl_y - 5),
				FONT_HERSHEY_SIMPLEX, 0.7, Scalar(255, 0, 0), 2, 8);
			cout << "confidence:" << confidence << ",object name:" << objNames[objectIdx].c_str() << endl;
		}
	}
	imshow("frame", frame);
	waitKey(0);
	return 0;
}

结果