CVPR2018_Real-Time Rotation-Invariant Face Detection with Progressive Calibration Network

CVPR2018,一种基于渐进式校准神经网络的实时旋转不变性人脸检测方法(我这中文翻译水平,男默女泪。。。),中科院山世光老师组作品;

名词术语:
PCN:Progressive Calibration Network,渐进式校准神经网络;
RIP angle:rotation-in-plane angle,平面旋转角度,偏转角度;

Abstract
作者提出了渐进式校准网络(PCN, Progressive Calibration Networks)来进行旋转不变的人脸检测,PCN包含三个阶段(in a coarse-to-fine manner),每个阶段做三件事:face/non-face分类、人脸bbox回归、人脸偏转角度计算(stage1、2只做离散分类的角度粗估计、stage3做连续回归的角度细估计,对人脸方向校准(stage1、2旋转人脸180°、90°操作等)属于后操作,不在校准网络里面做),使之渐进地校准为一个朝上的人脸(calibrates the RIP orientation of each face candidate to upright);
如果使用一个模型预测各种旋转角度的人脸,可能在精度和耗时上都有所损耗,因此本论文中采用的将校准过程解析为三个渐进式的步骤(dividing the calibration process into several progressive steps),在stage1、2上只做粗略的方向分类(离散的方向分类,如180°、-180°、90°等),stage3才做连续的方向回归,输出校准后的人脸偏转角度,因为偏转角度已校准至[-45, 45],后续可以直接使用人脸检测器检测出人脸,不用再接校准操作,PCN不仅可以达到实时,而且效果不错;
因为是渐进式地校准人脸角度,逐步降低人脸的偏转角度,本方法可以处理任意角度旋转的人脸(with gradually decreasing RIP ranges, PCN can accurately detect faces with full 360 RIP angles.)
在两个有挑战的数据集上-----多角度旋转的FDDB+作者手工筛选并标注的wider face test子集上(multi-oriented FDDB and a challenging subset of WIDER FACE containing rotated faces in the wild),本方案取得了不错的效果(比它快的没它准,比它准的没它快);

1 Introduction
基于CNN的人脸检测器受益于CNN强大的非线性特征表达能力,但在检测旋转人脸时效果一般,因为各个角度旋转的人脸在特征在模型训练时不容易收敛(The CNN-based detectors enjoy the natural advantage of strong capability in non-linear feature learning, detecting faces with full rotation-in-plane (RIP) angles leading to significant divergence in face appearances.)
目前已有三种针对旋转人脸检测的方案:数据扩充、分而治之、旋转角度探测器(rotation router)
数据扩充:最简单粗暴也最直观的方法,将包含向上的人头图像均匀地做360°全角度旋转生成训练数据,再用一个性能强劲的模型学习;优点:现有的upright人脸检测模型可以直接学习,无需额外操作(the same scheme as that of the upright face detectors can be directly used without extra operations)。缺点:为了拟合如此旋转角度的人脸场景,模型性能需要比较强悍,耗时较大,就无法实时了(to characterize such large variations of face appearances in single detector, one usually needs to use large neural networks with high time cost);如Fig 2(a);
分而治之:训练多个检测器,每个检测器只检测一小部分偏转角度的人脸,所有检测器结果融合后,就可以覆盖所有方向的人脸(trains multiple detectors, one for a small range of RIP angles, to cover the full RIP range)。优点:每个检测器只需要检测特定旋转范围内的人脸,对每个检测器而言,只需要一个耗时少的浅层模型即可;缺点:所有检测器都需要跑一遍,整体耗时就增加了;如Fig 2(b);
旋转角度探测器:直截了当:用一个CNN(rotation router)计算出旋转人脸的偏转角,将旋转的人脸按偏转角校准至向上后,再使用现有的upright face detector检测校准后的人脸candidates即可;优点:符合认知常识,添加一个rotation router计算人脸偏转角度即可,不需要额外开销;缺点:精准的人脸角度计算很有挑战性,为了精准的计算人脸偏转角,通常都需要使用性能强悍的CNN,耗时就又成为了瓶颈;;如Fig 2(c);
我们的解决方案:既然利用rotation router想一步到位计算精准的人脸偏转角度有难度,那么我们渐进式地基于cascade从粗到精一步一步计算;第一层网络先初略判断一个偏转角,再校准一下;第二层网络同样操作,进一步校准,以减少人脸偏转角度范围;第三层网络精准计算偏转角度,基于前两步骤校准后,再使用第三层网络直接输出人脸分类、偏转角度、bbox即可;整体下来模型耗时也少,可以实时;
PCN progressively calibrates the RIP orientation of each face candidate to upright.
具体地:
step1:对face candidates(类似mtcnn图像金字塔+滑窗)筛选candidates(face/non-face二分类),将方向朝下人脸校准为方向朝上人脸(updown clip即可);halving the range of RIP angles from [-180; 180] 1 to [-90; 90];
step2:与step1类似,人脸筛选(face/non-face二分类)+将step1中的upright人脸进一步校准至 [-45; 45], shrinking the RIP range by half again;
step3:输出人脸分类、偏转角度(the continuouts precise RIP angle)、bbox即可;
优势:only predicting coarse orientations in early stages1、2,stage3才做精准预测;stage1、2只做+-90°、+-180°旋转,耗时少;在校准后(gradually decreasing RIP ranges)的人脸上performing binary classification of face vs. non-face,精度高,耗时少;
注:stage1、2只输出需要校准的离散角度信息,具体图像的校准操作(flip,rotate)不在CNN模型操作里面完成;stage3只需要做连续角度回归估计即可,后续不需要再校准操作了;
本文spotlight:
1 PCN进阶式分步骤校准人脸偏转角度,每个小步骤都是一个基于浅层CNN的简单任务,最终可以让偏转的人脸角度逐步降低并校准,准确率高、效果好、耗时少;
2 step1、2只做粗校准(如下->上180°、左->右90°校准),优势有二:1 粗校准操作耗时少,易实现(easier to implement as flipping original image with quite low time cost);2 对角度的粗分类也容易实现,可以直接在人脸分类、bbox回归的multi-task中加一个分支即可(a robust and accurate RIP angle prediction for this coarse calibration is easier to attain without extra time cost, by jointly learning calibration task with the classification task and bounding box regression task in a multi-task learning manner.);
3 在两个有挑战的数据集上-----多角度旋转的FDDB+作者手工筛选并标注的wider face test子集上(multi-oriented FDDB and a challenging subset of WIDER FACE containing rotated faces in the wild),本方案取得了不错的效果(比它快的没它准,比它准的没它快);

2 Progressive Calibration Networks (PCN)
2.1. Overall Framework
Given an image, all face candidates are obtained according to the sliding window and image pyramid principle , and each candidate window goes through the detector stage by stage. In each stage of PCN, the detector simultaneously rejects most candidates with low face confidences (类似adaboost), regresses the bounding boxes of remaining face candidates, and calibrates the RIP orientations of the face candidates. After each stage, non-maximum suppression (NMS) is used to merge those highly overlapped candidates as most existing methods do. 类似mtcnn的cascade结构,each stage分类、校准、回归face candidates之后,再来一波NMS操作;

2.2 PCN1 in 1st stage
对于每个滑窗输入x,pcn1做三件事情:face/non-face分类、bbox回归、校准:
[f; t; g] = F1(x);
F1:stage1的CNN模型;
f:face confidence score,用于区分face/non-face;
t:bbox回归向量(a vector representing the prediction of bounding box regression);
g:方向得分(0~1二分类问题、输出up、down即可);
第一个目标函数,区分face/non-face:
第二个目标函数,bbox回归loss,跟fast rcnn保持一致,差别就是bbox是一个矩形,w=h:
第三个目标函数,对pcn1来说,就是简单的up-down二分类问题,使用softmax即可:
整体目标函数:
以上操作的意思:pcn1可以类似adaboost一样,第一步大量去除容易分类的fp candidates(face/non-face),再做一次bbox归回,最后根据up-down分类结果,对candidates做upright flip,确保所有人脸bbox都是朝上,经此操作,人脸旋转角度变为[-90, 90];
将常用的upright人脸数据集做[-180, 180]旋转,以扩充为旋转数据集(Based on the upright faces dataset, we rotate the training images with different RIP angles, forming a new training set containing faces with 360 RIP angles)
pos samples:iou vs gt > 0.7
neg samples:iou vs gt < 0.3
suspected samples:iou vs gt ∈ (0.4, 0.7)
face/non-face classification:pos & neg;
bbox regression && calibration:pos & suspected;
特别地,对于calibration网络,pos & suspected samples:
face-up:RIP angles ∈ (-65, 65)
face-down:RIP angles ∈ (-180, -115) & (115, 180)
不在此角度范围内的RIP angles不用于训练calibration分支;

2.3. PCN2 in 2nd stage
与pcn1几乎一致,同样做三个目标,只是在calibration分支是一个三分类问题(Differently, the coarse orientation prediction in this stage is a ternary三元组 classification of the RIP angle range, i.e. [-90;-45], [-45; 45], or [45; 90]. Rotation calibration is conducted with the predicted RIP angle in the second stage)
举个栗子:一个人脸是-65°,经pcn2的calibration分支分类,id=3,+90°旋转,就变成了25°,这样就在[-45, 45]范围内了;
将常用的upright人脸数据集做[-90, 90]旋转,以扩充为旋转数据集;利用pcn1过一遍数据,选取hard gegative samples,calibration:pos & suspected;
calibration分支分类id含义:
0:[-90, -60],需要+90来一波;
1:[-30, 30],do nothing;
2:[60, 90],-90来一波;
不在此范围内的数据,不考虑用于训练;

2.4. PCN3 in 3rd stage
经过stage1、2两波操作,人脸RIP已经被校准至[-45, 45]之间(calibratedto an upright quarter of RIP angle range),此时人脸已经比较容易检测,使用pcn-3的网络就可以准确检测并回归人脸bbox,同时可以准确地回归continuous face RIP(the PCN-3 in the third stage can easily make the final decision as most existing face detectors do to accurately determine whether it is a face and regress the bounding box. Since the RIP angle has been reduced to a small range in previous stages, PCN-3 attempts to directly regress the precise RIP angles of face candidates instead of coarse orientations)
The RIP angle regression is in a coarse-to-fine cascade regression style,最终人脸角度把三个阶段的计算角度结果累加即可;
During the third stage’s training phase, we rotate the initial training images uniformly in the range of [-45; 45], and filter out the hard negative samples via the trained PCN-2. The calibration branch is a regression task trained with smooth l1 loss.
pcn3的calibration分支训练,是一个回归问题,使用sooth l1 loss;
PS:如果看懂了论文,能理解这幅图做了什么,但绘图的操作过程很容易引起歧义;
2.5. Accurate and Fast Calibration
Our proposed PCN progressively calibrates the face candidates in a cascade scheme, aiming for fast and accurate calibration: 又快又准
1) the early stages only predict coarse RIP orientations, which is robust to the large diversity and further benefits the prediction of successive stages, 前面的步骤只需要做粗分类,速度快,效果也会比细分类好;
2) the calibration based on the coarse RIP prediction can be efficiently achieved via flipping original image three times, which brings almost no additional time cost.经过+-180°、+-90°的操作,就可以把人脸角度校准至[-45, 45]之间;stage3可以直接出人脸检测结果,之后就不需要再做旋转校准了;
3 Experiments
3.1. Implementation Details
类似mtcnn、cascade cnn use the training set of WIDER FACE for training, and annotated faces are adjusted to squares(w=h)
During the training process, the ratio of positive samples, negative samples, and suspected samples is about 2 : 2 : 1 in every mini-batch
3.2. Methods for Comparison
对比1) Data Augmentation;2) Divide-and-Conquer;3) Rotation Router三类方法,复现了一些方法;略;
3.3. Benchmark Datasets
Multi-Oriented FDDB:做-90、90、180的旋转扩充;
For evaluation of the detection results, we apply the official evaluation tool to obtain the ROC curves. To be compatible with the evaluation rules, we ignore the RIP angles of detection boxes, and simply use horizontal boxes for evaluation.不使用旋转的bbox,使用水平bbox;
Rotation WIDER FACE:We manually select some images that contain rotated faces from the WIDER FACE test set, obtain a rotation subset with 370 images and 987 rotated faces in the wild,we manually annotate the faces in this subset,use the same evaluation tool of
FDDB to obtain the ROC curves.从test数据集里选了370张图像,做标注用于测试;

3.4. Evaluation Results
3.4.1 Results of Rotation Calibration
For our PCN, the orientation classification accuracy in 1st stage and the 2nd stage is 95% and 96%. The mean error of calibration in 3rd stage is 8.
fig 7作者实现的router network对角度估计的continuous angle regression manner,误差较大,说明在原始图像上直接预测偏转角度,确实难度有点大。
3.4.2 Results on Multi-Oriented FDDB
如果不考虑时耗,我怎么感觉是夸frcnn、ssd的强大生命力的?
3.4.3 Results on Rotation WIDER FACE
frcnn棒棒哒~~~
3.4.4 Speed and Accuracy Comparison
our PCN can run with almost the same speed as Cascade CNN, benefited from the fast calibration operation of image flipping
结果看看:
4. Conclusion
In this paper, we propose a novel rotation-invariant face detector, i.e. progressive calibration networks (PCN). Our PCN progressively calibrates the RIP orientation of each face candidate to upright for better distinguishing faces from non-faces. PCN divides the calibration process into several progressive steps and implements calibration as flipping original image, resulting in accurate calibration but with quite low time cost. By performing binary classification of face vs. non-face with gradually decreasing RIP ranges, the proposed PCN can accurately detect faces with arbitrary RIP angles. Compared with the similarly structured upright face detector, the time cost of PCN almost remains the same, naturally resulting in a real-time rotation-invariant face detector.
我觉得写的挺好的,直接贴上。

总结:
旋转人脸问题如果一个模型all in,耗时会很大(如用frcnn、ssd去cover),因此作者提出了pcn,采用mtcnn、cascade cnn的操作流程,逐步渐进地调整旋转人脸的角度,使得最终的人脸角度调整至[-45. 45]之间,再使用普通的人脸检测器即可检测出人脸;渐进调整在stage1、2十分快速、操作简单、预测也准确(分类问题),耗时少;stage3再在调整好之后的candidates上进一步检测人脸、调整bbox,整体上效果就非常之好了;

有demo可以跑;

论文参考
1 CVPR2018_Real-Time Rotation-Invariant Face Detection with Progressive Calibration Network


  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值