1 ncnn
ncnn 是一个为手机端极致优化的高性能神经网络前向计算框架。ncnn 从设计之初深刻考虑手机端的部署和使用。无第三方依赖,跨平台,手机端 cpu 的速度快于目前所有已知的开源框架。基于 ncnn,开发者能够将深度学习算法轻松移植到手机端高效执行,开发出人工智能 APP,将 AI 带到你的指尖。ncnn 目前已在腾讯多款应用中使用,如 QQ,Qzone,微信,天天P图等。
功能概述
- 支持卷积神经网络,支持多输入和多分支结构,可计算部分分支
- 无任何第三方库依赖,不依赖 BLAS/NNPACK 等计算框架
- 纯 C++ 实现,跨平台,支持 android ios 等
- ARM NEON 汇编级良心优化,计算速度极快
- 精细的内存管理和数据结构设计,内存占用极低
- 支持多核并行计算加速,ARM big.LITTLE cpu 调度优化
- 支持基于全新低消耗的 vulkan api GPU 加速
- 可扩展的模型设计,支持 8bit 量化 和半精度浮点存储,可导入caffe/pytorch/mxnet/onnx/darknet/keras/tensorflow(mlir) 模型
- 支持直接内存零拷贝引用加载网络模型
- 可注册自定义层实现并扩展
2 MTCNN人脸检测
原版论文:MTCNN_face_detection_alignment
MTCNN,全称为Multi-task convolutional neural network,通过多任务学习训练级联卷积神经网络来集成人脸检测和人脸对齐任务。MTCNN沿袭VJ人脸检测器的思想,通过三个弱分类器级联从而组成一个的强分类器,只不过在深度学习时代,将卷积神经网络替代弱分类器,发挥出卷积神经网络强大的提取特征的能力。MTCNN包含三个阶段,在第一阶段P-Net生成大量候选人脸窗口,第二阶段R-Net拒绝大量非人脸窗口,第三阶段O-Net生成最终人脸边界框和五个人脸关键点坐标。
P-Net
P-Net的全称为Proposal Network,其构造为全卷积网络。输入图像为12
×
\times
× 12
×
\times
× 3,经过卷积层conv1
,卷积核3
×
\times
× 10
×
\times
× 3
×
\times
× 3,激活函数层PReLU1
,最大池化层pool1
(2
×
\times
× 2),输出5
×
\times
× 5
×
\times
× 10;经过卷积层conv2
,卷积核10
×
\times
× 16
×
\times
× 3
×
\times
× 3,激活函数层PReLU2
,输出3
×
\times
× 3
×
\times
× 16;经过卷积层conv3
,卷积核16
×
\times
× 32
×
\times
× 3
×
\times
× 3,激活函数层PReLU3
,输出1
×
\times
× 1
×
\times
× 32。从这里开始,网络发生分割。第一条通路,经过卷积层conv4-1
,输出1
×
\times
× 1
×
\times
× 2,再经过Softmax
,输出1
×
\times
× 1
×
\times
× 2;第二条通路,经过卷积层conv4-2
,输出1
×
\times
× 1
×
\times
× 4。其中卷积层步长为1,池化层步长为2,均无填充。
在训练和推理阶段,虽然P-Net网络结构相同,但是输入有所区别。在训练阶段必须输入尺寸固定,均为12
×
\times
× 12
×
\times
× 3,经过网络输出是1
×
\times
× 1
×
\times
× 2和1
×
\times
× 1
×
\times
× 4,得到这个12
×
\times
× 12的人脸分类结果和人脸框相对偏移值。而在推理阶段,因P-Net是全卷积网络,可以使用不定大小的图片作为输入,通过卷积代替滑动可以得到12
×
\times
× 12的输入图像,结果是
n
×
m
×
n\times{m}\times
n×m× 2和
n
×
m
×
n\times{m}\times
n×m× 4,代表在图像上
n
×
m
n\times{m}
n×m个框的输出结果。
P-Net参数量如下图所示
网络层 | 参数量(千个) |
---|---|
conv1 | 0.27 |
conv2 | 1.44 |
conv3 | 4.61 |
conv4-1和4-2 | 0.51 |
虽然P-Net参数量仅为6.83k个,按照一个浮点数参数量占4字节内存来计算,只需要26kb内存。但是经测试会发现,实际推理耗时占比很高,究其原因,在于图像输入P-Net之前需要做图像金字塔变换,得到的不同尺寸的图像均需要输入P-Net进行推理从而得到原图上的候选框,非常耗时。
为什么需要图像金字塔?何为图像金字塔?金字塔层数与哪些参数有关?
图像中人脸的尺度有大有小,人脸检测算法需要不被目标尺度影响。MTCNN使用图像金字塔解决目标多尺度问题,即把原图像按照比例,多次等比缩放得到多尺度图像。
在MTCNN算法中,金字塔层数决定了有多少张缩放后的图像需要输入到P-Net进行推理。金字塔层数越少,P-Net运行速度会越快。金字塔层数与三个参数有关:输入图像尺寸、minsize
、factor
。图像金字塔的生成过程:先把原图像等比缩放12/minsize
,再按缩放因子factor
用上一次缩放结果不断缩放,直至最短边小于或等于12。根据上述过程,输入图像尺寸、minsize
和factor
会共同决定图像金字塔的层数,minsize
越大、factor
越小,生成的金字塔层数越少,计算量越少。
官方缩放因子选择0.709,因为0.709 ≈ 2 / 2 \approx \sqrt{2}/2 ≈2/2,这样缩放后面积变为原来的1/2,兼顾计算效率和金字塔层数的选择。
因此,可以看出,当图片分辨较大时,如1080p,金字塔层数会相应增多,使得P-Net变得相当耗时。此时调整minsize
绝对是优化速度的最佳选择。
实际推理时MTCNN中非极大值抑制和边界框回归是如何作用的?
不同尺寸上的人脸区域位置经过还原得到原图上的人脸位置后,必须经过NMS和Boundingbox regression。NMS,全称为non maximum suppression,人脸检测中的非极大值抑制用于抑制冗余的框,首先将所有人脸框按置信度排序,选中最高分的框并保存住;遍历所有的框,若和当前的最高分框的IOU大于预设阈值,将此框删除;再从未处理的框中继续选一个得分高的,重复遍历直至所有框都清除,从而得到抑制后的人脸框。
边界框回归用于修正P-Net输出的边界框位置。P-Net还会输出一个4个二维矩阵dx1,dy1,dx2,dy2,尺寸与人脸得分矩阵一致,分别代表人脸区域的左上角坐标和右下角坐标的相对值。
x 1 ( c a l ) = x 1 ( o r i g i n ) + b b w × d x 1 x_{1(cal)} = x_{1(origin)} + bbw\times{dx_1} x1(cal)=x1(origin)+bbw×dx1, y 1 ( c a l ) = y 1 ( o r i g i n ) + b b h × d y 1 y_{1(cal)} = y_{1(origin)} + bbh\times{dy_1} y1(cal)=y1(origin)+bbh×dy1,
x 2 ( c a l ) = x 2 ( o r i g i n ) + b b w × d x 2 x_{2(cal)} = x_{2(origin)} + bbw\times{dx_2} x2(cal)=x2(origin)+bbw×dx2, y 2 ( c a l ) = y 2 ( o r i g i n ) + b b h × d y 2 y_{2(cal)} = y_{2(origin)} + bbh\times{dy_2} y2(cal)=y2(origin)+bbh×dy2,
其中, b b w = x 2 ( o r i g i n ) − x 1 ( o r i g i n ) bbw = x_{2(origin)} - x_{1(origin)} bbw=x2(origin)−x1(origin), b b h = y 2 ( o r i g i n ) − y 1 ( o r i g i n ) bbh = y_{2(origin)} - y_{1(origin)} bbh=y2(origin)−y1(origin)
至此,便可通过P-Net得到人脸推荐框。通过对候选框依次resize成 24 × \times × 24得到R-Net的输入。
R-Net
R-Net的全称为Refine Network,其构造为卷积神经网络。输入图像为24
×
\times
× 24
×
\times
× 3,经过卷积层conv1
,卷积核3
×
\times
× 28
×
\times
× 3
×
\times
× 3,激活函数层PReLU1
,最大池化层pool1
(3
×
\times
× 3),输出11
×
\times
× 11
×
\times
× 28;经过卷积层conv2
,卷积核28
×
\times
× 48
×
\times
× 3
×
\times
× 3,激活函数层PReLU2
,最大池化层pool2
(3
×
\times
× 3),输出4
×
\times
× 4
×
\times
× 48;经过卷积层conv3
,卷积核48
×
\times
× 64
×
\times
× 2
×
\times
× 2,激活函数层PReLU3
,输出3
×
\times
× 3
×
\times
× 64;经过全连接层conv4
,激活函数层PReLU4
。从这里开始,网络发生分割。经过卷积层conv5-1
,输出1
×
\times
× 1
×
\times
× 2,再经过Softmax
,输出1
×
\times
× 1
×
\times
× 2;经过卷积层conv5-2
,输出1
×
\times
× 1
×
\times
× 4。其中卷积层步长为1,池化层步长为2,均无填充。
R-Net参数量如下图所示
网络层 | 参数量(千个) |
---|---|
conv1 | 0.76 |
conv2 | 12 |
conv3 | 12.2 |
conv4 | 73.7 |
conv5-1和5-2 | 2 |
R-Net参数量仅为100.66k个,按照一个浮点数参数量占4字节内存来计算,需要393kb内存。
可见,R-Net相比P-Net多了一个全连接层,因此R-Net的输入必须是固定尺寸,即24 × \times × 24。经过R-Net会拒绝第一阶段中的大量非人脸框,再次使用NMS和非极大值抑制生成更精细的人脸框。通过对候选框依次resize成48 × \times × 48得到O-Net的输入。
O-Net
O-Net的全称为Output Network,其构造为卷积神经网络。输入图像为48
×
\times
× 48
×
\times
× 3,经过卷积层conv1
,卷积核3
×
\times
× 32
×
\times
× 3
×
\times
× 3,激活函数层PReLU1
,最大池化层pool1
(3
×
\times
× 3),输出23
×
\times
× 23
×
\times
× 32;经过卷积层conv2
,卷积核32
×
\times
× 64
×
\times
× 3
×
\times
× 3,激活函数层PReLU2
,最大池化层pool2
(3
×
\times
× 3),输出10
×
\times
× 10
×
\times
× 64;经过卷积层conv3
,卷积核64
×
\times
× 64
×
\times
× 3
×
\times
× 3,激活函数层PReLU3
,最大池化层pool3
(2
×
\times
× 2),输出4
×
\times
× 4
×
\times
× 64;经过卷积层conv4
,卷积核128
×
\times
× 64
×
\times
× 2
×
\times
× 2,激活函数层PReLU4
,输出3
×
\times
× 3
×
\times
× 128;经过全连接层conv5
,dropout
层。从这里开始,网络发生分割。经过卷积层conv6-1
,再经过Softmax
,输出1
×
\times
× 1
×
\times
× 2;经过卷积层conv6-2
,输出1
×
\times
× 1
×
\times
× 4;经过卷积层conv6-3
,输出1
×
\times
× 1
×
\times
× 10。其中卷积层步长为1,池化层步长为2,均无填充。
O-Net参数量如下图所示
网络层 | 参数量(千个) |
---|---|
conv1 | 0.9 |
conv2 | 18.5 |
conv3 | 36.9 |
conv4 | 32.9 |
conv5 | 295.2 |
conv6-1~3 | 4.1 |
O-Net参数量为388.5k个,按照一个浮点数参数量占4字节内存来计算,需要1.51MB内存。
可见,O-Net相比R-Net多了一个卷积层,而且O-Net的输入必须是固定尺寸,即48 × \times × 48。其中O-Net会再次使用NMS和非极大值抑制生成更精细的人脸框,从而生成最终人脸框及五个人脸标志点。
损失函数
- 人脸分类
L i d e t = − ( y i d e t l o g ( p i ) + ( 1 − y i d e t ) ( 1 − l o g ( p i ) ) ) , y i d e t ∈ ( 0 , 1 ) L_{i}^{det}=-(y_{i}^{det}log(p_i)+(1-y_{i}^{det})(1-log(p_i))),y_{i}^{det}\in(0,1) Lidet=−(yidetlog(pi)+(1−yidet)(1−log(pi))),yidet∈(0,1)
- 边界框回归
L i b o x = ∥ y ^ i b o x − y i b o x ∥ 2 2 , y i b o x ∈ R 4 L_{i}^{box}=\|\hat{y}_i^{box}-{y}_i^{box}\|_2^2,y_i^{box}\in{R^4} Libox=∥y^ibox−yibox∥22,yibox∈R4
- 地标定位
L i l a n d m a r k = ∥ y ^ i l a n d m a r k − y i l a n d m a r k ∥ 2 2 , y i l a n d m a r k ∈ R 10 L_{i}^{landmark}=\|\hat{y}_i^{landmark}-{y}_i^{landmark}\|_2^2,y_i^{landmark}\in{R^{10}} Lilandmark=∥y^ilandmark−yilandmark∥22,yilandmark∈R10
- 多源训练
m i n ∑ i = 1 n ∑ j ∈ { d e t , b o x , l a n d m a r k } α j β i j L i j min\displaystyle\sum_{i=1}^n\displaystyle\sum_{j\in\{det,box,landmark\}}{\alpha_j\beta_i^{j}{L}_i^j} mini=1∑nj∈{det,box,landmark}∑αjβijLij
具体来讲,
P-Net: α d e t = 1 \alpha_{det}=1 αdet=1, α b o x = 0.5 \alpha_{box}=0.5 αbox=0.5, α l a n d m a r k = 0 \alpha_{landmark}=0 αlandmark=0
R-Net: α d e t = 1 \alpha_{det}=1 αdet=1, α b o x = 0.5 \alpha_{box}=0.5 αbox=0.5, α l a n d m a r k = 0 \alpha_{landmark}=0 αlandmark=0
O-Net: α d e t = 1 \alpha_{det}=1 αdet=1, α b o x = 0.5 \alpha_{box}=0.5 αbox=0.5, α l a n d m a r k = 1 \alpha_{landmark}=1 αlandmark=1
3 基于ncnn的推理代码
参考项目地址:qaz734913414/Ncnn_FaceTrack,moli232777144/mtcnn_ncnn,ElegantGod/ncnn
mctnn.h
#ifndef __MTCNN_NCNN_H__
#define __MTCNN_NCNN_H__
#include "net.h"
#include <string>
#include <vector>
#include <time.h>
#include <algorithm>
#include <map>
struct Bbox
{
float score; //置信度
int x1;
int y1;
int x2;
int y2; //框左上和右下两个坐标点
float area; //框的面积
float ppoint[10]; //人脸的5个特征点
float regreCoord[4]; //4个坐标的修正信息
};
class MTCNN {
private:
ncnn::Mat img; //定义输入图像
int img_w, img_h; //原始输入图像的宽和高
int minsize = 80; //原始图像中需要检测出的人脸的最小尺寸
const int MIN_DET_SIZE = 12;
const float pre_facetor = 0.7090f; //0.709 = 1.414213/2
const float threshold[3] = { 0.8f, 0.9f, 0.9f }; //人脸框得分阈值,三个网络可单独设定
const float nms_threshold[3] = { 0.5f, 0.7f, 0.7f }; //三次非极大值抑制阈值,三个网络可分别设置
const float mean_vals[3] = { 127.5, 127.5, 127.5 };
const float norm_vals[3] = { 0.0078125, 0.0078125, 0.0078125 }; // 1/128 = 0.0078125
std::vector<Bbox> firstBbox_, secondBbox_, thirdBbox_; //三个阶段的候选框容器
ncnn::Net Pnet, Rnet, Onet; //定义三个网络
void PNet();
void RNet();
void ONet();
void generateBbox(ncnn::Mat score, ncnn::Mat location, std::vector<Bbox>& boundingBox_, float scale);
void nms(std::vector<Bbox>& boundingBox_, const float overlap_threshold, std::string modelname = "Union");
void refine(std::vector<Bbox>& vecBbox, const int& height, const int& width, bool square);
float iou(Bbox& b1, Bbox& b2, std::string modelname = "Union");
public:
MTCNN(const std::string& model_path);
~MTCNN();
void SetMinFace(int minSize);
void detect(ncnn::Mat& img_, std::vector<Bbox>& finalBbox);
};
#endif //__MTCNN_NCNN_H__
mtcnn.cpp
#include <cmath>
#include "mtcnn.h"
#include <opencv2/opencv.hpp>
using namespace cv;
bool cmpScore(Bbox lsh, Bbox rsh) {
if (lsh.score < rsh.score)
return true;
else
return false;
}
bool cmpArea(Bbox lsh, Bbox rsh) {
if (lsh.area < rsh.area)
return false;
else
return true;
}
MTCNN::MTCNN(const std::string& model_path) {
//由其他框架转化为ncnn的模型,包括两个文件.param和.bin
std::vector<std::string> param_files = {
model_path + "/det1.param",
model_path + "/det2.param",
model_path + "/det3.param"
};
std::vector<std::string> bin_files = {
model_path + "/det1.bin",
model_path + "/det2.bin",
model_path + "/det3.bin"
};
Pnet.load_param(param_files[0].data());
Pnet.load_model(bin_files[0].data());
Rnet.load_param(param_files[1].data());
Rnet.load_model(bin_files[1].data());
Onet.load_param(param_files[2].data());
Onet.load_model(bin_files[2].data());
}
MTCNN::~MTCNN() {
Pnet.clear();
Rnet.clear();
Onet.clear();
}
void MTCNN::SetMinFace(int minSize) {
minsize = minSize;
}
void MTCNN::PNet() {
firstBbox_.clear(); //清除第一阶段的候选框
float minl = img_w < img_h ? img_w : img_h; //minl为原始图像宽和高中的较小值
float m = (float)MIN_DET_SIZE / minsize; //MIN_DET_SIZE值为12,12为最小缩放尺寸
//输入图像尺寸、minsize和factor共同决定图像金字塔的阶层数
//缩放后的尺寸minl = org_L * (12/minsize) * factor^(n),n=0,1,2,...
minl *= m; //先将原图等比缩放 12/minsize
float factor = pre_facetor; //0.709
std::vector<float> scales_; //每次缩放系数
while (minl > MIN_DET_SIZE) { //直至最短边小于或等于12
scales_.push_back(m); //m取值:12/minsize、12/minsize * 0.709、12/minsize * 0.709 * 0.709、...
minl *= factor;
m = m * factor;
} //得到一系列缩小比例,根据比例缩小的每一张图片输入到Pnet网络中
for (size_t i = 0; i < scales_.size(); i=i++) {
//C库函数 double ceil(double x) 返回大于或等于x的最小的整数值
int hs = (int)ceil(img_h * scales_[i]);
int ws = (int)ceil(img_w * scales_[i]); //缩小宽高
ncnn::Mat in; //定义模型输入
resize_bilinear(img, in, ws, hs); //图像resize
ncnn::Extractor ex = Pnet.create_extractor(); //定义模型输出
//ex.set_num_threads(2);
ex.set_light_mode(true);
ex.input("data", in);
ncnn::Mat score_, location_;
ex.extract("prob1", score_); //判断各区域是否有人脸的概率,输出维度为n*m*2
ex.extract("conv4-2", location_); //判断各回归框的修正信息,输出维度为n*m*4
std::vector<Bbox> boundingBox_; //定义候选框容器
//根据Pnet网络输出和相应阈值,生成候选框
generateBbox(score_, location_, boundingBox_, scales_[i]);
nms(boundingBox_, nms_threshold[0]); //非极大值抑制
firstBbox_.insert(firstBbox_.end(), boundingBox_.begin(), boundingBox_.end()); //将第一阶段生成的候选框添加到firstBbox_中
boundingBox_.clear(); //清除一下临时保存候选框的容器
}
}
void MTCNN::RNet() {
secondBbox_.clear();
int count = 0;
for (std::vector<Bbox>::iterator it = firstBbox_.begin(); it != firstBbox_.end(); it++) { //对第一阶段候选框遍历
ncnn::Mat tempIm;
copy_cut_border(img, tempIm, (*it).y1, img_h - (*it).y2, (*it).x1, img_w - (*it).x2); //原图裁剪得到边界框中图像
ncnn::Mat in;
resize_bilinear(tempIm, in, 24, 24); //输出24*24
ncnn::Extractor ex = Rnet.create_extractor();
//ex.set_num_threads(2);
ex.set_light_mode(true);
ex.input("data", in);
ncnn::Mat score, bbox;
ex.extract("prob1", score); //1*1*2
ex.extract("conv5-2", bbox); //1*1*4
if ((float)score[1] > threshold[1]) {
for (int channel = 0; channel < 4; channel++) {
it->regreCoord[channel] = (float)bbox[channel]; //*(bbox.data+channel*bbox.cstep);
}
it->area = (it->x2 - it->x1) * (it->y2 - it->y1);
it->score = score.channel(1)[0]; //*(score.data+score.cstep);
secondBbox_.push_back(*it);
}
}
}
void MTCNN::ONet() {
thirdBbox_.clear();
for (std::vector<Bbox>::iterator it = secondBbox_.begin(); it != secondBbox_.end(); it++) { //遍历第二阶段候选框
ncnn::Mat tempIm;
copy_cut_border(img, tempIm, (*it).y1, img_h - (*it).y2, (*it).x1, img_w - (*it).x2);
ncnn::Mat in;
resize_bilinear(tempIm, in, 48, 48); //48*48
ncnn::Extractor ex = Onet.create_extractor();
//ex.set_num_threads(2);
ex.set_light_mode(true);
ex.input("data", in);
ncnn::Mat score, bbox, keyPoint;
ex.extract("prob1", score); //1*1*2
ex.extract("conv6-2", bbox); //1*1*4
ex.extract("conv6-3", keyPoint); //10
if ((float)score[1] > threshold[2]) {
for (int channel = 0; channel < 4; channel++) {
it->regreCoord[channel] = (float)bbox[channel];
}
it->area = (it->x2 - it->x1) * (it->y2 - it->y1);
it->score = (float)score[1];
for (int num = 0; num < 5; num++) {
(it->ppoint)[num] = it->x1 + (it->x2 - it->x1) * keyPoint[num];
(it->ppoint)[num + 5] = it->y1 + (it->y2 - it->y1) * keyPoint[num + 5];
} //五个人脸关键点
thirdBbox_.push_back(*it);
}
}
}
void MTCNN::detect(ncnn::Mat& img_, std::vector<Bbox>& finalBbox_) {
img = img_;
img_w = img.w;
img_h = img.h; //获取原始输入图像的宽和高
double t5 = (double)cv::getTickCount();
img.substract_mean_normalize(mean_vals, norm_vals); //作归一化,将图片输入模型对每个像素做(x-127.5)/128,加快收敛速度
printf("normalize total %gms\n", ((double)cv::getTickCount() - t5) * 1000 / cv::getTickFrequency());
double t1 = (double)cv::getTickCount();
PNet(); //产生人脸候选框 firstBbox_
printf("P-Net total %gms\n", ((double)cv::getTickCount() - t1) * 1000 / cv::getTickFrequency());
printf("firstBbox_.size()=%d\n", firstBbox_.size());
//the first stage's nms
if (firstBbox_.size() < 1) return;
double t1_1 = (double)cv::getTickCount();
nms(firstBbox_, nms_threshold[0]); //通过非极大值抑制,清除一些交并比过大的候选框
refine(firstBbox_, img_h, img_w, true); //将框的坐标通过回归修正
printf("NMS_1+refine_1 total %gms\n", ((double)cv::getTickCount() - t1_1) * 1000 / cv::getTickFrequency());
//second stage
double t2 = (double)cv::getTickCount();
RNet(); //产生人脸候选框 secondBbox_
printf("R-Net total %gms\n", ((double)cv::getTickCount() - t2) * 1000 / cv::getTickFrequency());
printf("secondBbox_.size()=%d\n", secondBbox_.size());
if (secondBbox_.size() < 1) return;
double t2_2 = (double)cv::getTickCount();
nms(secondBbox_, nms_threshold[1]); //通过非极大值抑制,清除一些交并比过大的候选框
refine(secondBbox_, img_h, img_w, true); //将框的坐标通过回归修正
printf("NMS_2+refine_2 total %gms\n", ((double)cv::getTickCount() - t2_2) * 1000 / cv::getTickFrequency());
//third stage
double t3 = (double)cv::getTickCount();
ONet(); //产生人脸候选框 thirdBbox_
printf("O-Net total %gms\n", ((double)cv::getTickCount() - t3) * 1000 / cv::getTickFrequency());
printf("thirdBbox_.size()=%d\n", thirdBbox_.size());
if (thirdBbox_.size() < 1) return;
double t3_3 = (double)cv::getTickCount();
refine(thirdBbox_, img_h, img_w, true); //将框的坐标通过回归修正
nms(thirdBbox_, nms_threshold[2], "Min"); //非极大值抑制
printf("NMS_3+refine_3 total %gms\n", ((double)cv::getTickCount() - t3_3) * 1000 / cv::getTickFrequency());
finalBbox_ = thirdBbox_;
}
void MTCNN::generateBbox(ncnn::Mat score, ncnn::Mat location, std::vector<Bbox>& boundingBox_, float scale) {
const int stride = 2; //步长
const int cellsize = 12; //最小检测范围
//score p
float* p = score.channel(1);//score.data + score.cstep; //得到是否是人脸的概率
//float *plocal = location.data;
Bbox bbox;
float inv_scale = 1.0f / scale; //放大的倍率
for (int row = 0; row < score.h; row++) {
for (int col = 0; col < score.w; col++) {
if (*p > threshold[0]) { //人脸概率大于阈值才产生候选框
bbox.score = *p;
//卷积代替滑动窗口过程,所以每个概率值,对应的就是图中的一个12*12*3窗口
//* inv_scale是为了定位在原图中的坐标
bbox.x1 = round((stride * col + 1) * inv_scale);
bbox.y1 = round((stride * row + 1) * inv_scale);
bbox.x2 = round((stride * col + 1 + cellsize) * inv_scale);
bbox.y2 = round((stride * row + 1 + cellsize) * inv_scale); //round函数四舍五入
bbox.area = (bbox.x2 - bbox.x1) * (bbox.y2 - bbox.y1); //候选框面积
const int index = row * score.w + col; //生成索引号
for (int channel = 0; channel < 4; channel++) {
bbox.regreCoord[channel] = location.channel(channel)[index]; //从location中获取候选框的修正信息
}
boundingBox_.push_back(bbox); //将生成候选框依次添加到容器中
}
p++;
//plocal++;
}
}
}
float MTCNN::iou(Bbox& b1, Bbox& b2, std::string modelname)
{
float IOU = 0;
float maxX = 0;
float maxY = 0;
float minX = 0;
float minY = 0;
maxX = max(b1.x1, b2.x1);
maxY = max(b1.y1, b2.y1);
minX = min(b1.x2, b2.x2);
minY = min(b1.y2, b2.y2); //得到重叠部分的左上和右下两个坐标点
//maxX1 and maxY1 reuse
maxX = ((minX - maxX + 1) > 0) ? (minX - maxX + 1) : 0;
maxY = ((minY - maxY + 1) > 0) ? (minY - maxY + 1) : 0; //+1防止为0
IOU = maxX * maxY; //重叠部分面积
if (!modelname.compare("Union"))
IOU = IOU / (b1.area + b2.area - IOU); //计算交并比
else if (!modelname.compare("Min")) {
IOU = IOU / ((b1.area < b2.area) ? b1.area : b2.area); //取较小面积作分母
}
return IOU;
}
//nms非极大值抑制
void MTCNN::nms(std::vector<Bbox>& boundingBox_, const float overlap_threshold, std::string modelname) {
if (boundingBox_.empty()) {
return;
}
sort(boundingBox_.begin(), boundingBox_.end(), cmpScore); //将候选框按照置信度升序排序
float IOU = 0;
std::vector<int> vPick; //存放经过nms处理boundingBox_中的待选框的索引号
int nPick = 0; //vPick容器索引
std::multimap<float, int> vScores; //键值对容器,存放每个框的置信度
const int num_boxes = boundingBox_.size();
vPick.resize(num_boxes); //将vPick容器中实际元素个数改成和候选框个数一样
for (int i = 0; i < num_boxes; ++i) {
vScores.insert(std::pair<float, int>(boundingBox_[i].score, i));
}
while (vScores.size() > 0) {
int last = vScores.rbegin()->second; //rbegin反向双向迭代器
vPick[nPick] = last; //每轮将置信度最高的框m的索引号保存起来
nPick += 1;
for (std::multimap<float, int>::iterator it = vScores.begin(); it != vScores.end(); ) { //遍历所有边界框
int it_idx = it->second;
IOU = iou(boundingBox_.at(it_idx), boundingBox_.at(last), modelname);
if (IOU > overlap_threshold) {
it = vScores.erase(it); //遍历vScores所有边界框,分别与框m计算交并比,高于阈值,则认为此框与m重叠,去除此框
}
else {
it++;
}
}
}
vPick.resize(nPick); //npick为选出来的框数量,resize一下
std::vector<Bbox> tmp_;
tmp_.resize(nPick);
for (int i = 0; i < nPick; i++) {
tmp_[i] = boundingBox_[vPick[i]]; //通过索引号获取到需要的框
}
boundingBox_ = tmp_; //将选出来的边界框赋值给boundingBox_
}
//边界框回归
void MTCNN::refine(std::vector<Bbox>& vecBbox, const int& height, const int& width, bool square) {
if (vecBbox.empty()) {
return;
}
float bbw = 0, bbh = 0, maxSide = 0;
float h = 0, w = 0;
float x1 = 0, y1 = 0, x2 = 0, y2 = 0;
for (std::vector<Bbox>::iterator it = vecBbox.begin(); it != vecBbox.end(); it++) {
bbw = (*it).x2 - (*it).x1 + 1;
bbh = (*it).y2 - (*it).y1 + 1; //得到候选框的长宽
x1 = (*it).x1 + (*it).regreCoord[0] * bbw;
y1 = (*it).y1 + (*it).regreCoord[1] * bbh;
x2 = (*it).x2 + (*it).regreCoord[2] * bbw;
y2 = (*it).y2 + (*it).regreCoord[3] * bbh; //修正候选框的比例
if (square) {
w = x2 - x1 + 1;
h = y2 - y1 + 1; //防止宽和高为0
maxSide = (h > w) ? h : w;
x1 = x1 + w * 0.5 - maxSide * 0.5;
y1 = y1 + h * 0.5 - maxSide * 0.5; //让x1或y1更小
x2 = x1 + maxSide - 1;
y2 = y1 + maxSide - 1; //让x2或y2更大
(*it).x1 = round(x1);
(*it).y1 = round(y1);
(*it).x2 = round(x2);
(*it).y2 = round(y2); //取整赋值
}
//boundary check
if ((*it).x1 < 0)(*it).x1 = 0;
if ((*it).y1 < 0)(*it).y1 = 0;
if ((*it).x2 > width)(*it).x2 = width - 1;
if ((*it).y2 > height)(*it).y2 = height - 1; //限制坐标在图片中
it->area = (it->x2 - it->x1) * (it->y2 - it->y1);
}
}
main.cpp
#include "mtcnn.h"
#include <iostream>
#include <opencv2/opencv.hpp>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include "time.h"
using namespace std;
using namespace cv;
cv::Rect SquarePadding(cv::Rect facebox, int margin_rows, int margin_cols, bool max_b);
int main()
{
std::string model_path = "./models";
string path = "5-12.png";
cv::Mat img = cv::imread(path);
std::vector<Bbox> finalBbox; //存放检测到的人脸
double t0 = (double)cv::getTickCount();
double t4 = (double)cv::getTickCount();
ncnn::Mat ncnn_img = ncnn::Mat::from_pixels(img.data, ncnn::Mat::PIXEL_BGR2RGB, img.cols, img.rows);
printf("from_pixels total %gms\n", ((double)cv::getTickCount() - t4) * 1000 / cv::getTickFrequency());
double t6 = (double)cv::getTickCount();
//MTCNN* detector = new MTCNN(model_path);
MTCNN detector(model_path);
printf("new mtcnn total %gms\n", ((double)cv::getTickCount() - t6) * 1000 / cv::getTickFrequency());
detector.SetMinFace(40); //这里设置检测最小人脸
detector.detect(ncnn_img, finalBbox);
printf("total %gms\n", ((double)cv::getTickCount() - t0) * 1000 / cv::getTickFrequency());
printf("------------------\n");
const int num_box = finalBbox.size();
std::vector<cv::Rect> bbox; //待修正人脸矩形框
bbox.resize(num_box);
for (int i = 0; i < num_box; i++) {
bbox[i] = cv::Rect(finalBbox[i].x1, finalBbox[i].y1, finalBbox[i].x2 - finalBbox[i].x1 + 1,
finalBbox[i].y2 - finalBbox[i].y1 + 1); //根据finalBox容器中信息给bbox赋值
bbox[i] = SquarePadding(bbox[i], img.rows, img.cols, true); //扩充为正方形
finalBbox[i].x1 = bbox[i].x;
finalBbox[i].y1 = bbox[i].y;
finalBbox[i].x2 = bbox[i].x + bbox[i].width;
finalBbox[i].y2 = bbox[i].y + bbox[i].height;
}
cv::Scalar color = Scalar(0, 0, 255);
srand((unsigned int)time(0));//初始化种子为随机值
for (int i = 0; i < num_box; i++)
{
const Bbox& info = finalBbox[i];
cv::Rect rect;
rect.x = info.x1;
rect.y = info.y1;
rect.width = info.x2 - info.x1;
rect.height = info.y2 - info.y1;
rectangle(img, rect, color, 2);
for (int j = 0; j < 5; j++)
{
cv::Point p = cv::Point(info.ppoint[j], info.ppoint[j + 5]);
cv::circle(img, p, 2, color, 2);
}
}
imshow("image", img);
waitKey(0);
//delete detector;
return 0;
}
cv::Rect SquarePadding(cv::Rect facebox, int margin_rows, int margin_cols, bool max_b) //扩充为正方形
{
int c_x = facebox.x + facebox.width / 2;
int c_y = facebox.y + facebox.height / 2;
int large = 0;
if (max_b)
large = max(facebox.height, facebox.width) / 2;
else
large = min(facebox.height, facebox.width) / 2;
cv::Rect rectNot(c_x - large, c_y - large, c_x + large, c_y + large); //得到宽和高一样的正方形
rectNot.x = max(0, rectNot.x);
rectNot.y = max(0, rectNot.y); //防止超出图片
rectNot.height = min(rectNot.height, margin_rows - 1);
rectNot.width = min(rectNot.width, margin_cols - 1); //控制宽和高,防止越过设置的边界
if (rectNot.height - rectNot.y != rectNot.width - rectNot.x)
return SquarePadding(cv::Rect(rectNot.x, rectNot.y, rectNot.width - rectNot.x, rectNot.height - rectNot.y), margin_rows, margin_cols, false);
return cv::Rect(rectNot.x, rectNot.y, rectNot.width - rectNot.x, rectNot.height - rectNot.y);
}
单张1080P的检测效果及速度,minsize = 40