A Convolutional Neural Network Cascade for Face Detection: Notes
一、Achievement
In the Introduction section,the author explain two difficulties in face detection:the first is the large variations of human faces in the cluttered backgrounds;and the other one is the large search space of possible face positions and face sizes.To address these two conflicting challenges,this paper propose a cascade architecture built on convolutional neural networks with very powerful discriminative capability,while maintaining high performance.The main idea in this paper is to reject false detection quickly in the early,low resolution stages and carefully verify the detection in the later,high-resolution stages.In this work,their contribution are four-fold:
(1)their propose a CNN cascade for fast face detection;
(2)their introduce a CNN-based face bounding box calibration step in the cascade to help accelerate the CNN cascade and obtain high quality localization;
(3)their present a multi-resolution CNN architecture that can be more discriminative than the sigle resolution CNN with only a fractional overhead;
(4)their further improve the state-of-the-art performance on the Face Detection Data Set and Benchmark(FDDB).
What’s more,the author proposed method learns the classifier directly from the image instead of relying on hand-crafted features,which can be faster than the model-based and exemplar-based detection systems.
二、Architectures and Methods
1.Architectures
The pipeline of their detector is shown in Fig.1:
Already described the main ideas in the above,the 12-net and the 24-net scans the whole image densely across different scales to quickly reject more than 90% of the detection windows.At the stage of 48-net,continue to evaluate detection windows.As for calibration-net,the 12-calibration-net adjust its size and location to approach a potential face nearby.Before introduce 24-calibration-net,we learn that Non-maximum suppression(NMS) is applied to eliminate highly overlapped detection windows.So the 24-calibration-net use NMS to further reduce the number of detection windows,and at the same time ,the remain detection windows also be adjusted.Before 48-calibration-net,the author use Global NMS tpo eliminates overlapped detection windows with an Intersection-Over-Union(IoU) ratio exceeding a pre-set threshold.The details of the network is in Fig.2 and Fig.3.
The authors use 6 CNNs in the cascade including 3 CNNs for face vs. non-face binary classification and 3 CNNs for bounding box calibration,which is formulated as multi-class classification of discretized displacement pattern.
2.Methods
After each stage of calibration,the author add an operation named Non-maximum suppression(NMS) for eliminate highly overlapped detection windows and remain the highest confidence score.The goal of the NMS is to reduce this overlap by preserving the best bounding boxes and eliminating other redundant bounding boxes.In this paper,the author doesn’t explain more details about it,therefore,I will show the detail work flow in Fig.4.
3.Dataset
(1)AFLW: The AFLW face database is a large scale face database including multi-pose and multi-view, and each face is marked with 21 feature points. This database is very informative, including a variety of posture, expression, lighting, race and other factors affected by the image.
(2)AFW: The AFW dataset is a face image library built using images from Flickr, a photo sharing site owned by Yahoo, and contains 205 images, including 473 tagged faces. For each face, there is a rectangular bounding box, 6 landmarks and associated pose angles.
(3)FDDB: The FDDB dataset is mainly used for constrained face detection research. The dataset selects 2845 images taken in the field environment, from which 5171 individual face images are selected.
4.Measurement
The author use tow measures to evaluate the network-precision/recall and two types of evaluations in FDDB. Given that we have learned lots about the former one,so there is nothing about it.In the discontinuous score evaluation,it counts the number of detected face versus the number of false alarms.The detection bounding boxes are regarded as true positive only if it has an Intersection-over-Union(IoU) ratio above 0.5 to a ground-truth face.In the continuous score evaluation,it evaluates how well the faces are located by considering the IoU ratio as the matching metric of the detection bounding box.
三、Sentence Expression
sharing the advantages of… 具有…的优势(在论文中,通常用于自己的方法继承了之前方法的优点的表达)
总结
该论文工作是作者在实习的时候完成的,论文展示了一种级联卷积神经网络用于快速人脸检测。在FBBD数据集上,所提出的检测器性能优于最先进的方法。