在人脸检测模型中,MTCNN和faceboxes检测精度高且检测速度快,因此是实际工程中常用的两个模型。
论文链接:http://cn.arxiv.org/pdf/1708.05234v4
论文中最关键的一张图:
关于faceboxes,网上已有不少优质的文章进行了详细说明,可自行搜索。本文主要介绍我在阅读这些文章后仍然觉得比较费解的两个问题:感受野、anchor。
如上图所示,Inception3、Conv3_2、Conv4_2三个层各有七个感受野,一直很奇怪怎么计算得到的,直到看到这篇文章:https://www.jianshu.com/p/cd76d16cbc46。
关于anchor,可参考原文关于Anchor densification strateg的解释。如果看了原文解释还不理解,可以结合开源代码来看,开源链接为:https://github.com/zisianw/FaceBoxes.PyTorch
其中prior_box.py的定义如下,我添加了相关注释:
class PriorBox(object):
def __init__(self, cfg, image_size=None, phase='train'):
super(PriorBox, self).__init__()
#self.aspect_ratios = cfg['aspect_ratios']
self.min_sizes = cfg['min_sizes'] # [[32,64,128],[256],[512]]
self.steps = cfg['steps'] # [32,64,128]
self.clip = cfg['clip'] # False
self.image_size = image_size # [1024,1024]
self.feature_maps = [[ceil(self.image_size[0]/step), ceil(self.image_size[1]/step)] for step in self.steps]
def forward(self):
anchors = []
for k, f in enumerate(self.feature_maps): # feature_maps: [[32,32],[16,16],[8,8]]
# k=0,f=[32,32]
# k=1,f=[16,16]
# k=2,f=[8,8]
min_sizes = self.min_sizes[k]
# k=0,min_sizes=[32,64,128]
# k=1,min_sizes=[256]
# k=2,min_sizes=[512]
for i, j in product(range(f[0]), range(f[1])):
# 以k=2,f=[8,8],min_sizes=[512]为例
# i,j的取值,笛卡尔积
# (0,0),(0,1),(0,2),(0,3),(0,4),(0,5),(0,6),(0,7)
# (1,0),(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7)
# ......
# (7,0),(7,1),(7,2),(7,3),(7,4),(7,5),(7,6),(7,7)
for min_size in min_sizes:
# min_size=512
s_kx = min_size / self.image_size[1]
s_ky = min_size / self.image_size[0]
if min_size == 32:
dense_cx = [x*self.steps[k]/self.image_size[1] for x in [j+0, j+0.25, j+0.5, j+0.75]]
dense_cy = [y*self.steps[k]/self.image_size[0] for y in [i+0, i+0.25, i+0.5, i+0.75]]
for cy, cx in product(dense_cy, dense_cx):
anchors += [cx, cy, s_kx, s_ky]
elif min_size == 64:
dense_cx = [x*self.steps[k]/self.image_size[1] for x in [j+0, j+0.5]]
dense_cy = [y*self.steps[k]/self.image_size[0] for y in [i+0, i+0.5]]
for cy, cx in product(dense_cy, dense_cx):
anchors += [cx, cy, s_kx, s_ky]
else:
cx = (j + 0.5) * self.steps[k] / self.image_size[1]
cy = (i + 0.5) * self.steps[k] / self.image_size[0]
# cx,cy的取值,笛卡尔积
# (0.5,0.5),(0.5,1.5),(0.5,2.5),(0.5,3.5),(0.5,4.5),(0.5,5.5),(0.5,6.5),(0.5,7.5)
# (1.5,0.5),(1.5,1.5),(1.5,2.5),(1.5,3.5),(1.5,4.5),(1.5,5.5),(1.5,6.5),(1.5,7.5)
# ......
# (7.5,0.5),(7.5,1.5),(7.5,2.5),(7.5,3.5),(7.5,4.5),(7.5,5.5),(7.5,6.5),(7.5,7.5)
anchors += [cx, cy, s_kx, s_ky]
# back to torch land
output = torch.Tensor(anchors).view(-1, 4)
if self.clip:
output.clamp_(max=1, min=0)
return output
谈谈我对Anchor尺寸的理解:三个分支输出的feature的大小分别为32*32、16*16、8*8,则对应到输入1024*1024的尺寸,两点间的距离(step)为32、64、128,若是直接以step为anchor大小,最大的anchor尺寸才128*128,大脸的漏检率会很高,因此将基准anchor的大小设为step*4即:128*128、256*256、512*512。同时,为了保证小脸的检测率,对于最小的anchor又增加了64*64及32*32两个尺寸,且分别采用2倍和4倍的anchor densification strategy,所以,最后形成的anchor尺寸为[[32,64,128],[256],[512]]。