(1) Ultra-Light-Fast-Generic-Face-Detector,程序里简写为ultraface
(2) LFFD:A Light and Fast Face Detector for Edge Devices,程序里简写为lffdface
(3) CenterFace, 程序里简写为centerface
(4) DBFace, 程序里简写位dbface
(5) RetinaFace, 程序里简写为retinaface
(6) MTCNN, 程序里简写为mtcnn
(7) SSD, 程序里简写为ssdface
(8) facebox,程序里简写为facebox
(9) yoloface,程序里简写为yoloface
(10) 于仕琪老师提出的libfacedetection, 程序里简称为libface
考虑检测准确率和运行耗时的折中权衡,retinaface和lffdface是最优选择。retinaface运行耗时虽然比lffdface的要略长一点,但是retinaface的输出里有5个关键点。retinaface的dnn版本和libface的输入尺寸有320和640这两种选择,可以通过输入参数开控制尺寸的选择。如果不是在密集场景里做人脸检测,那么retinaface的dnn版本和libface是最佳选择
opencv3.4 版之前自带的人脸检测器是基于Haar+Adaboost的,速度还可以,但是检出率不是很高,脸的角度稍大就检不出来,光线暗点就检不出来,误检也很多,经常会把一些乱七八糟的东西当做人脸,实在不敢恭维. 好在随着深度学习领域的发展,涌现了一大批效果相当不错的人脸检测算法,比如MTCNN,给了我们更多施展的空间. 看看下面这图就是基于其检测出来的,看着是不是很震撼呢?源码点此
MTCNN效果着实不错,但其是基于caffe训练的,caffe可是以配置繁琐著称的,大大小小依赖库就有10几个,每个又有好几个版本,版本间的不兼容比比皆是,初学者没个把星期基本是配不好的,这无形间加大了普及的进度,好在有人做了一整套MTCNN在各个平台上的部署(见github),大大简化了所需的工作量。不过要是opencv能有个基于深度学习的检测器该多好呀?
所谓千呼万唤始出来,3.4版主要增强了dnn模块,特别是添加了对faster-rcnn的支持,并且带有openCL加速,效果还不错,其自带的res10_300x300_ssd_iter_140000.caffemodel实测效果还不错,美中不足的是缺训练代码,无法根据自己的数据微调模型.
分析其FLOPs可知conv是要重点优化的地方, 需要把通道数砍下来
layer name Filter Shape Output Size Params Flops Ratio
conv1_h (32, 3, 7, 7) (1, 32, 80, 80) 4704 30105600 9.186%
layer_64_1_conv1_h (32, 32, 3, 3) (1, 32, 40, 40) 9216 14745600 4.499%
layer_64_1_conv2_h (32, 32, 3, 3) (1, 32, 40, 40) 9216 14745600 4.499%
layer_128_1_conv1_h (128, 32, 3, 3) (1, 128, 20, 20) 36864 14745600 4.499%
layer_128_1_conv2 (128, 128, 3, 3) (1, 128, 20, 20) 147456 58982400 17.996%
layer_128_1_conv_expand_h (128, 32, 1, 1) (1, 128, 20, 20) 4096 1638400 0.5%
layer_256_1_conv1 (256, 128, 3, 3) (1, 256, 10, 10) 294912 29491200 8.998%
layer_256_1_conv2 (256, 256, 3, 3) (1, 256, 10, 10) 589824 58982400 17.996%
layer_256_1_conv_expand (256, 128, 1, 1) (1, 256, 10, 10) 32768 3276800 1.0%
layer_512_1_conv1_h (128, 256, 3, 3) (1, 128, 10, 10) 294912 29491200 8.998%
layer_512_1_conv2_h (256, 128, 3, 3) (1, 256, 10, 10) 294912 29491200 8.998%
layer_512_1_conv_expand_h (256, 256, 1, 1) (1, 256, 10, 10) 65536 6553600 2.0%
conv6_1_h (128, 256, 1, 1) (1, 128, 10, 10) 32768 3276800 1.0%
conv6_2_h (256, 128, 3, 3) (1, 256, 5, 5) 294912 7372800 2.25%
conv7_1_h (64, 256, 1, 1) (1, 64, 5, 5) 16384 409600 0.125%
conv7_2_h (128, 64, 3, 3) (1, 128, 3, 3) 73728 663552 0.202%
conv8_1_h (64, 128, 1, 1) (1, 64, 3, 3) 8192 73728 0.022%
conv8_2_h (128, 64, 3, 3) (1, 128, 3, 3) 73728 663552 0.202%
conv9_1_h (64, 128, 1, 1) (1, 64, 3, 3) 8192 73728 0.022%
conv9_2_h (128, 64, 3, 3) (1, 128, 3, 3) 73728 663552 0.202%
conv4_3_norm_mbox_loc (16, 128, 3, 3) (1, 16, 20, 20) 18432 7372800 2.25%
conv4_3_norm_mbox_conf (8, 128, 3, 3) (1, 8, 20, 20) 9216 3686400 1.125%
fc7_mbox_loc (24, 256, 3, 3) (1, 24, 10, 10) 55296 5529600 1.687%
fc7_mbox_conf (12, 256, 3, 3) (1, 12, 10, 10) 27648 2764800 0.844%
conv6_2_mbox_loc (24, 256, 3, 3) (1, 24, 5, 5) 55296 1382400 0.422%
conv6_2_mbox_conf (12, 256, 3, 3) (1, 12, 5, 5) 27648 691200 0.211%
conv7_2_mbox_loc (24, 128, 3, 3) (1, 24, 3, 3) 27648 248832 0.076%
conv7_2_mbox_conf (12, 128, 3, 3) (1, 12, 3, 3) 13824 124416 0.038%
conv8_2_mbox_loc (16, 128, 3, 3) (1, 16, 3, 3) 18432 165888 0.051%
conv8_2_mbox_conf (8, 128, 3, 3) (1, 8, 3, 3) 9216 82944 0.025%
conv9_2_mbox_loc (16, 128, 3, 3) (1, 16, 3, 3) 18432 165888 0.051%
conv9_2_mbox_conf (8, 128, 3, 3) (1, 8, 3, 3) 9216 82944 0.025%
Layers num: 53
Total number of parameters: 2656352
Total number of FLOPs: 327745024
为此本文基于ssd-models和widerface库训练了一个超小的检测器,仅有2.8M,计算量也只有84M FLOPs. 模型下载见releases, 使用模型Face_aizoo28_320x320_iter_5120.caffemodel, 训练日志见Face_aizoo28_320x320.log
dnn自带模型的python版代码如下,在80x80输入下仅需6ms,完全达到了超实时的要求.
import numpy as np
import argparse
import cv2
import time
input_shape = (80, 80)
mean = (104.0, 177.0, 123.0)
def get_args():
ap = argparse.ArgumentParser()
ap.add_argument("--prototxt", default="face_detector/deploy.prototxt")
ap.add_argument("--model", default="face_detector/res10_300x300_ssd_iter_140000.caffemodel")
ap.add_argument("--confidence", default=0.5)
args = ap.parse_args()
return args
if __name__=="__main__":
args = get_args()
net = cv2.dnn.readNetFromCaffe(args.prototxt, args.model)
cap = cv2.VideoCapture(0)
while True:
ret, image = cap.read()
if not ret:
brack
(h, w) = image.shape[:2]
blob = cv2.dnn.blobFromImage(cv2.resize(image, input_shape), 1.0,input_shape, mean)
net.setInput(blob)
start = time.time()
detections = net.forward()
end = time.time()
cost = "%0.2fms" %((end-start)*1000)
detections = detections.reshape(-1, 7)
for i in range(0, detections.shape[0]):
confidence = detections[i, 2]
if confidence > args.confidence:
box = detections[i, 3:7] * np.array([w, h, w, h])
(startX, startY, endX, endY) = box.astype("int")
text = "{:.2f} ".format(confidence * 100)+cost
y = startY - 10 if startY - 10 > 10 else startY + 10
cv2.rectangle(image, (startX, startY), (endX, endY),
(255, 0, 0), 2)
cv2.putText(image, text, (startX, y), 1, 1, (0, 0, 255), 2)
cv2.imshow("img", image)
cv2.waitKey(1)