人脸检测MTCNN简介

总述

PAPER:
Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks

总体结构:
在这里插入图片描述

他的网络优势在于:
1.使用级联的分类器,单分类器都相对简单,因此速度较快。
2.使用多任务训练,查找LANDMARK和人脸分类的内在联系,提升了准确性。
3.提供了LANDMARK~做人脸识别的时候直接可以做FACE Alignment,不需要再加一层网络。

网络及损失函数

整体网络结构:
在这里插入图片描述

再来看一下各级网络的LOSS。
因为是多任务训练,所以三级网络都兼顾了分类,检测框和LANDMARK。

  1. Face classification: The learning objective is formulated as a two-class classification problem. For each sample , we use the cross-entropy loss:
    在这里插入图片描述
    where is the probability produced by the network that indi-cates a sample being a face. The notation denotes the ground-truth label.
  2. Bounding box regression: For each candidate window, we predict the offset between it and the nearest ground truth (i.e., the bounding boxes’ left top, height, and width). The learning objective is formulated as a regression problem, and we employ the Euclidean loss for each sample :
    在这里插入图片描述

where ̂ regression target obtained from the network and is the ground-truth coordinate. There are four coordinates, including left top, height and width, and thus .
3) Facial landmark localization: Similar to the bounding box regression task, facial landmark detection is formulated as aregression problem and we minimize the Euclidean loss:
在这里插入图片描述

where ̂ is the facial landmark’s coordinate obtained from the network and is the ground-truth coordinate. There are five facial landmarks, including left eye, right eye, nose, left mouth corner, and right mouth corner, and thus .

整体LOSS:
在这里插入图片描述

训练

总述

训练数据库

论文中作者主要使用了Wider_face 和CelebA数据库,其中Wider_face主要用于检测任务的训练,CelebA主要用于关键点的训练.训练集分为四种:负样本,正样本 ,部分样本,关键点样本. 三个样本的比例为3: 1: 1: 2

we use four different kinds of data annotation in our training process: (i) Negatives: Regions that the Intersec-tion-over-Union (IoU) ratio less than 0.3 to any ground-truth faces; (ii) Positives: IoU above 0.65 to a ground truth face; (iii) Part faces: IoU between 0.4 and 0.65 to a ground truth face; and (iv) Landmark faces: faces labeled 5 landmarks’ positions.

训练主要包括三个任务

人脸分类任务:利用正样本和负样本进行训练
人脸边框回归任务:利用正样本和部分样本进行训练
关键点检测任务:利用关键点样本进行训练

训练步骤细节可以参考:
https://github.com/AITTSMD/MTCNN-Tensorflow

1.Download Wider Face Training part only from Official Website , unzip to replace WIDER_train and put it into prepare_data folder.
2.Download landmark training data from here,unzip and put them into prepare_data folder.
3.Run prepare_data/gen_12net_data.py to generate training data(Face Detection Part) for PNet.
4.Run gen_landmark_aug_12.py to generate training data(Face Landmark Detection Part) for PNet.
5.Run gen_imglist_pnet.py to merge two parts of training data.
6.Run gen_PNet_tfrecords.py to generate tfrecord for PNet.
7.After training PNet, run gen_hard_example to generate training data(Face Detection Part) for RNet.
8.Run gen_landmark_aug_24.py to generate training data(Face Landmark Detection Part) for RNet.
9.Run gen_imglist_rnet.py to merge two parts of training data.
10.Run gen_RNet_tfrecords.py to generate tfrecords for RNet.(you should run this script four times to generate tfrecords of neg,pos,part and landmark respectively)
11.After training RNet, run gen_hard_example to generate training data(Face Detection Part) for ONet.
12.Run gen_landmark_aug_48.py to generate training data(Face Landmark Detection Part) for ONet.
13.Run gen_imglist_onet.py to merge two parts of training data.
14.Run gen_ONet_tfrecords.py to generate tfrecords for ONet.(you should run this script four times to generate tfrecords of neg,pos,part and landmark respectively)

P-NET

PNET的训练集采集包括两步:
1.随机裁剪正负例样本,并根据约定的IOU正负例范围进行约束。注意一下,可以根据实际项目的需求对裁剪尺寸进行约束,默认是从12 * 12开始裁剪。
2.根据LANDMARK的数据集,裁剪出人像,缩放至12 * 12,并修改对应的LANDMARK位置。
需要稍微注意一下,MTCNN的激活函数是:
prelu.

备注:
在https://github.com/AITTSMD/MTCNN-Tensorflow 中作者提到 Celeba有标注错误,使用了 http://mmlab.ie.cuhk.edu.hk/archive/CNN_FacePoint.htm , 我个人理解这个数据集样本是不是太少了?数据平衡的问题如何解决?我个人倾向于还是使用Celeba的数据。
作者针对样本不足在样本处理中增加了一些数据增强的方法,这个值得借鉴。

R-NET

R-Net: We use first stage of our framework to detect faces from WIDER FACE [24] to collect positives, negatives and part face while landmark faces are detected from CelebA [23].
即使用P-NET的网络拿到的检测框,做一次正负例,复杂样本的分类。接着塞给R-NET训练。
对于LANDMARK的采样方式和P-NET相似,只是缩放大小变为24 * 24.

O-NET

O-NET的训练方式和R-NET相似,只是他的正负例是通过两层网络训练得出的。

online hard sample mining

In particular, in each mini-batch, we sort the loss computed in the forward propagation phase from all samples and select the top 70% of them as hard samples. Then we only compute the gradient from the hard samples in the backward propagation phase. That means we ignore the easy samples that are less helpful to strengthen the detector while training.
其实也就是在LOSS计算的时候只记录排名前70%的数据。(PS:我个人觉得这个有点玄乎,当然玄乎的东西多了。。。)

先记录这么多,后面真正训练跑的时候再看看有没有其他的坑。

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值