注:此后更新代码版本均在github上,更快更准,博客不做更新~
2019.3.20
本文代码参考:
参考代码1:https://github.com/AITTSMD/MTCNN-Tensorflow
参考代码2:https://github.com/Seanlinx/mtcnn
参考代码3:https://github.com/CongWeilin/mtcnn-caffe
参考代码4:https://github.com/kpzhang93/MTCNN_face_detection_alignment
在此对其表示衷心的感谢。
基于MTCNN的人脸检测
项目环境及配置:Window10+GTX 1060+Python3.6+Anaconda5.2.0+Spyder+Tensorflow1.9-gpu
本文是对《Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks》论文的复现,此时网上已有N篇论文解读以及各种版本的代码复现,优劣参差不齐,大家可择优选读。
1、数据获取
本文训练集采用WIDER_FACE与lfw_net,选用与论文不同的数据集是因为Celeba数据集标注有很多错误,我在参考代码时看到了另一种数据集也可以很好的使用。
WIDER_FACE的标注格式在其wider_face_split 文件夹下的readme.txt 内。标注格式如下所示:
The format of txt ground truth.
File name
Number of bounding box
x1, y1, w, h, blur, expression, illumination, invalid, occlusion, pose
第一行为图片名称,第二行为人脸框数量,第三行为标注,本文只关注前4个标注。
lfw_net的标注格式可以在其下载网站的Face detector code 的readme.txt 内。标注格式如下所示:
Each line starts with the image name
followed by the left, right, top, and bottom boundary positions of the face bounding boxes.
每一行第一个字符串为图片名称,接下来分别为左、右、上、下坐标。
注:参考代码中数据集制作时引用的数据标注格式全都是错的,而生成hard_sample样本时格式又是对的?!我不知道这几位大佬是引用的是哪个改了名了标注文本或者用的.mat格式,反正正常下载后打开的txt格式跟他们代码写的完全不一样。遂想直接参考他们代码的童鞋还是先把我的数据集制作这块看完再跑路也不迟。
2、Pnet数据集制作
本项目数据集全部使用Tensorflow的TFRecord格式,比较方便。
TFRecord在输出数据时存在着shuffle不均匀的情况,我在项目1阶段制作数据的时候已经发现,所以必须放一点positive,放一点negative。在参考大佬代码时,大佬也提出全放进一个TFRecord内训练ONet与RNet时比例不均,所以本文三个网络的数据集全部由四个TFRecord构成。
由于本文有回归任务,所以需要记录每张截取人脸图片的bounding_box和landmark,参考大佬的思想,引入txt文档记录。
本文代码地址在文章后尾,可直接参考。
首先还是引入IoU的概念。
(tool.py):
# -*- coding: utf-8 -*-
"""
@author: friedhelm
"""
import numpy as np
import cv2
import tensorflow as tf
def IoU(box, boxes):
"""
Compute IoU between detect box and face boxes
Parameters:
----------
box: numpy array , shape (4, ): x1, y1, x2, y2
random produced box
boxes: numpy array, shape (n, 4): x1, y1, w, h
input ground truth face boxes
Returns:
-------
ovr: numpy.array, shape (n, )
IoU
"""
box_area = (box[2] - box[0] + 1) * (box[3] - box[1] + 1)
area = boxes[:, 2]*boxes[:, 3]
x_right=boxes[:, 2]+boxes[:, 0]
y_bottom=boxes[:, 3]+boxes[:, 1]
xx1 = np.maximum(box[0], boxes[:, 0])
yy1 = np.maximum(box[1], boxes[:, 1])
xx2 = np.minimum(box[2], x_right)
yy2 = np.minimum(box[3], y_bottom)
# compute the width and height of the bounding box
w = np.maximum(0, xx2 - xx1 + 1)
h = np.maximum(0, yy2 - yy1 + 1)
inter = w * h
ovr = inter / (box_area + area - inter)
return ovr
def NMS(box,_overlap):
if len(box) == 0:
return []
#xmin, ymin, xmax, ymax, score, cropped_img, scale
box.sort(key=lambda x :x[4])
box.reverse()
pick = []
x_min = np.array([box[i][0] for i in range(len(box))],np.float32)
y_min = np.array([box[i][1] for i in range(len(box))],np.float32)
x_max = np.array([box[i][2] for i in range(len(box))],np.float32)
y_max = np.array([box[i][3] for i in range(len(box))],np.float32)
area = (x_max-x_min)*(y_max-y_min)
idxs = np.array(range(len(box)))
while len(idxs) > 0:
i = idxs[0]
pick.append(i)
xx1 = np.maximum(x_min[i],x_min[idxs[1:]])
yy1 = np.maximum(y_min[i],y_min[idxs[1:]])
xx2 = np.minimum(x_max[i],x_max[idxs[1:]])
yy2 = np.minimum(y_max[i],y_max[idxs[1:]])
w = np.maximum(xx2-xx1,0)
h = np.maximum(yy2-yy1,0)
overlap = (w*h)/(area[idxs[1:]] + area[i] - w*h)
idxs = np.delete(idxs, np.concatenate(([0],np.where(((overlap >= _overlap) & (overlap <= 1)))[0]+1)))
return [box[i] for i in pick]
def featuremap(sess,graph,img,scale,map_shape,stride,threshold):
left=0
up=0
boundingBox=[]
images=graph.get_tensor_by_name("input/image:0")
label= graph.get_tensor_by_name("output/label:0")
roi= graph.get_tensor_by_name("output/roi:0")
landmark= graph.get_tensor_by_name("output/landmark:0")
img1=np.reshape(img,(-1,img.shape[0],img.shape[1],img.shape[2]))
a,b,c=sess.run([label,roi,landmark],feed_dict={images:img1})
a=np.reshape(a,(-1,2))
b=np.reshape(b,(-1,4))
c=np.reshape(c,(-1,10))
for idx,prob in enumerate(a):
if prob[1]>threshold:
biasBox=[]
biasBox.extend([float(left*stride)/scale,float(up*stride)/scale, float(left*stride+map_shape)/scale, float(up*stride+map_shape)/scale,prob[1]])
biasBox.extend(b[idx])
biasBox.extend(c[idx])
boundingBox.append(biasBox)
#防止左越界与下越界
if (left*stride+map_shape<img.shape[1]):
left+=1
elif (up*stride+map_shape<img.shape[0]):
left=0
up+=1
else : break
return boundingBox
def flip(img,facemark):
img=cv2.flip(img,1)
facemark[[0,1]]=facemark[[1,0]]
facemark[[3,4]]=facemark[[4,3]]
return (img,facemark)
def read_single_tfrecord(addr,_batch_size,shape):
filename_queue = tf.train.string_input_producer([addr],shuffle=True)
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(serialized_example,
features={
'img':tf.FixedLenFeature([],tf.string),
'label':tf.FixedLenFeature([],tf.int64),
'roi':tf.FixedLenFeature([4],tf.float32),
'landmark':tf.FixedLenFeature([10],tf.float32),
})
img=tf.decode_raw(features['img'],tf.uint8)
label=tf.cast(features['label'],tf.int32)
roi=tf.cast(features['roi'],tf.float32)
landmark=tf.cast(features['landmark'],tf.float32)
img = tf.reshape(img, [shape,shape,3])
min_after_dequeue = 10000
batch_size = _batch_size
capacity = min_after_dequeue + 10 * batch_size
image_batch, label_batch, roi_batch, landmark_batch = tf.train.shuffle_batch([img,label,roi,landmark],
batch_size=batch_size,
capacity=capacity,
min_after_dequeue=min_after_dequeue,
num_threads=7)
label_batch = tf.reshape(label_batch, [batch_size])
roi_batch = tf.reshape(roi_batch,[batch_size,4])
landmark_batch = tf.reshape(landmark_batch,[batch_size,10])
return image_batch, label_batch, roi_batch, landmark_batch
def read_multi_tfrecords(addr,_batch_size,shape):
pos_dir,part_dir,neg_dir,landmark_dir = addr
pos_batch_size,part_batch_size,neg_batch_size,landmark_batch_size = _batch_size
pos_image,pos_label,pos_roi,pos_landmark = read_single_tfrecord(pos_dir, pos_batch_size, shape)
part_image,part_label,part_roi,part_landmark = read_single_tfrecord(part_dir, part_batch_size, shape)
neg_image,neg_label,neg_roi,neg_landmark = read_single_tfrecord(neg_dir, neg_batch_size, shape)
landmark_image,landmark_label,landmark_roi,landmark_landmark = read_single_tfrecord(landmark_dir, landmark_batch_size, shape)
images = tf.concat([pos_image,part_image,neg_image,landmark_image], 0, name="concat/image")
labels = tf.concat([pos_label,part_label,neg_label,landmark_label],0,name="concat/label")
rois = tf.concat([pos_roi,part_roi,neg_roi,landmark_roi],0,name="concat/roi")
landmarks = tf.concat([pos_landmark,part_landmark,neg_landmark,landmark_landmark],0,name="concat/landmark")
return images,labels,rois,landmarks
def image_color_distort(inputs):
inputs = tf.image.random_contrast(inputs, lower=0.5, upper=1.5)
inputs = tf.image.random_brightness(inputs, max_delta=0.2)
inputs = tf.image.random_hue(inputs,max_delta= 0.2)
inputs = tf.image.random_saturation(inputs,lower = 0.5, upper= 1.5)
return inputs
首先进行人脸样本制作,需要生成pos、part以及neg样本。
x,y,w,h为图片参数,x1,y1,w1,h1为人脸框参数
1、neg样本制作时可以随机选择size大小,其范围在[12,min(w,h)/2]之间即可;左顶点的坐标范围选在[0,w-size]之间即可,因为负样本框不可超出原图像,右底点最大坐标为min(w,h)/2+w|h-min(w,h)/2。
2、另外需要制作一些hard样本,即在每个人脸框周围选择IoU小于0.3的负样本框。此时引入偏移量,size范围还是[12,min(w,h)/2],左顶点偏移量delta范围设为[max(-size,-x1|y1),w1|h1],则左顶点坐标为max(0,x1|y1+delta),这样计算是因为可以产生较多的hard样本,此时左顶点的坐标范围为[x1|y1+max(-size,-x1|y1),x1|y1+w1|h1]。这一段的意思是让hard样本始终有一部分与人脸框相交。
3、pos样本制作,即在每个人脸框周围选择IoU大于0.65的正样本框。还是需要引入偏移量,已知人脸框中心点为x1|y1+(w1|h1)/2,我们令其size可在[(x1|y1)*0.8,(w1|h1)*1.2]范围内,即为人脸框长宽的0.8~1.2倍。引入偏移量delta范围设为[(x1|y1)*-0.2,(w1|h1)*0.2],即偏移量为人脸框长宽的-0.2~0.2倍,此时人脸框中心点坐标加上偏移量坐标再减去size/2即为正样本框的左顶点坐标。
4、part样本与pos样本制作方法一致,在每个人脸框周围选择IoU小于0.65大于0.3的样本框即可。
5、在保存回归框坐标的时候保存的是偏移量坐标,即现在的框与真实的人脸框的偏移量,所以令nx1为样本框的x坐标,须计算offset_x1 = (x1 - nx1) / float(size),保存。
(gen_classify_regression_data.py):
# -*- coding: utf-8 -*-
"""
@author: friedhelm
"""
from core.tool import IoU
import numpy as np
from numpy.random import randint
import cv2
import os
import time
def main():
f1 = open(os.path.join(save_dir, 'pos_%d.txt'%(img_size)), 'w')
f2 = open(os.path.join(save_dir, 'neg_%d.txt'%(img_size)), 'w')
f3 = open(os.path.join(save_dir, 'par_%d.txt'%(img_size)), 'w')
with open(WIDER_spilt_dir) as filenames:
p=0
neg_idx=0
pos_idx=0
par_idx=0
for line in filenames.readlines():
line=line.strip().split(' ')
if(p==0):
pic_dir=line[0]
p=1
boxes=[]
elif(p==1):
k=int(line[0])
p=2
elif(p==2):
b=[]
k=k-1
if(k==0):
p=0
for i in range(4):
b.append(int(line[i]))
boxes.append(b)
# format of boxes is [x,y,w,h]
if(p==0):
img=cv2.imread(os.path.join(WIDER_dir,pic_dir).replace('/','\\'))
h,w,c=img.shape
#save num negative pics whose IoU less than 0.3
num=50
while(num):
size=randint(12,min(w,h)/2)
x=randint(0,w-size)
y=randint(0,h-size)
if(np.max(IoU(np.array([x,y,x+size,y+size]),np.array(boxes)))<0.3):
resized_img = cv2.resize(img[y:y+size,x:x+size,:], (img_size, img_size))
cv2.imwrite(os.path.join(negative_dir,'neg_%d.jpg'%(neg_idx)),resized_img)
f2.write(os.path.join(negative_dir,'neg_%d.jpg'%(neg_idx)) + ' 0\n')
neg_idx=neg_idx+1
num=num-1
for box in boxes:
if((box[0]<0)|(box[1]<0)|(max(box[2],box[3])<20)|(min(box[2],box[3])<=5)):
continue
x1, y1, w1, h1 = box
# crop images near the bounding box if IoU less than 0.3, save as negative samples
for i in range(10):
size = randint(12, min(w, h) / 2)
delta_x = randint(max(-size, -x1), w1)
delta_y = randint(max(-size, -y1), h1)
nx1 = int(max(0, x1 + delta_x))
ny1 = int(max(0, y1 &#