前言
目标:给出图片,用框框住人脸部分
开始实现人脸检测
1·数据格式
1·1制作人脸图片
已经分好类存储的人脸和非人脸图片
标签格式
aaa.jpg x1,y1 x2,y2
后两者坐标代表了一个标注的人脸框,需要机器学习其中的特点
我们需要自己准备数据吗?
如果是我们自己的项目当然如此,如果是学习,其实我们想做的前人已经努力过了,可以直接使用他们标注好的数据集
1,benchmark是行业基准,(数据库,论文,源码,基准,结果)
论文,论文就是前人已经做过实验的记录,可以下载他们附带的数据集,一般都是国外的,可以用学校的邮箱申请学术交流,因为他们不允许商业行为,不是edu邮箱,估计不会通过
2.论坛,会有人交流和提供的
假设现在已经有了1w人脸图片
图中有人脸框,那么假设随机裁剪一个正方形,和人脸框重复的部分称为iou
1,裁剪画框之中的人脸内容作为人脸数据,随机裁剪,如果iou重复大于0.8也认为是一张人脸
2.裁剪画框中和人脸框重合小于0.3的图片,作为非人脸数据
这样就有了1.5w人脸数据,1.5w非人脸数据,数据扩容非常厉害
同时实际生活中,发生人脸遮蔽也非常正常,这样反而能提高识别的健壮性
然而由于原始的图片标注并非一定准确,因为都是人工标注的,漏标人脸,或者标错人脸也会存在的,这就导致非人脸的数据集中会出现部分人脸,这是非常致命的
对于训练集可以忍受,
以上是图片的制作
2·生成lmdb文件
已经有详细的解释了,执行此步后会拥有训练集和数据集lmdb
https://blog.csdn.net/lidashent/article/details/121464092
3·编辑神经图层文件
直接使用alexnet就行了,把最后的全连接层改成自己需要的多分类就行了,最后有几个分类就设置为几
通用的,用来做人脸检测也没问题
train.protxt
############################# DATA Layer #############################
name: "face_train_val"
layer {
top: "data"
top: "label"
name: "data"
type: "Data"
data_param {
source: "C:/Users/Administrator.DESKTOP-KMH7HN6/Downloads/li_test_net/my_face_detect/data_set/face_train_lmdb"
backend:LMDB
batch_size: 64
}
transform_param {
#是否要减均值,实际发现效果不大,不过数据改变之后围绕零点分布很漂亮,比较均匀
#mean_file: "C:/Users/Administrator.DESKTOP-KMH7HN6/Downloads/li_test_net/my_face_detect/data_set/imagenet_mean.binaryproto"
#镜像变换了一下,数据增加一倍
mirror: true
}
include: { phase: TRAIN }
}
layer {
top: "data"
top: "label"
name: "data"
type: "Data"
data_param {
source: "C:/Users/Administrator.DESKTOP-KMH7HN6/Downloads/li_test_net/my_face_detect/data_set/face_val_lmdb"
backend:LMDB
batch_size: 64
}
transform_param {
#mean_file: "C:/Users/Administrator.DESKTOP-KMH7HN6/Downloads/li_test_net/my_face_detect/data_set/imagenet_mean.binaryproto"
mirror: true
}
include: {
phase: TEST
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
#设定是基础学习率的几倍,这里是1倍,
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 96
kernel_size: 11
stride: 4
# w参数和b参数初始化
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 2
kernel_size: 5
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "norm2"
type: "LRN"
bottom: "conv2"
top: "norm2"
lrn_param {
local_size: 5
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "norm2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "pool2"
top: "conv3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 384
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
convolution_param {
num_output: 256
pad: 1
kernel_size: 3
group: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 4096
weight_filler {
type: "gaussian"
std: 0.005
}
bias_filler {
type: "constant"
value: 0.1
}
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
dropout_ratio: 0.5
}
}
#全连接层
layer {
name: "fc8-expr"
type: "InnerProduct"
bottom: "fc7"
top: "fc8-expr"
param {
lr_mult: 10
decay_mult: 1
}
param {
lr_mult: 20
decay_mult: 0
}
#因为最后是检测人脸,二分类,所以是分为两个类别,2
inner_product_param {
num_output: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8-expr"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8-expr"
bottom: "label"
top: "loss"
}
求解器文件
net: "C:/Users/Administrator.DESKTOP-KMH7HN6/Downloads/li_test_net/my_face_detect/data_set/train.prototxt"
#一次要测试多少batch,如果电脑配置好可以调大一点,能一次跑完测试集才nb
test_iter: 100
#迭代多少次进行一次测试正确率
test_interval: 500
# lr for fine-tuning should be lower than when starting from scratch
#基础学习率,一般如此,最终每个层的学习率都是在这里*倍数得到的,所以设置在这里的都是超参数
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
# stepsize should also be lower, as we're closer to being done
stepsize: 20000
#训练多少次显示一次训练结果
display: 100
#最多迭代的次数
max_iter: 100000
momentum: 0.9
weight_decay: 0.0005
#多少次保存一次模型
snapshot: 10000
snapshot_prefix: "C:/Users/Administrator.DESKTOP-KMH7HN6/Downloads/li_test_net/my_face_detect/data_set/model"
# uncomment the following to default to CPU mode solving
# solver_mode: CPU
4·网络训练
caffe.exe的目录我加入系统变量了,我可不想每次都要带上路径,太扯淡了
caffe.exe train --solver=C:\Users\Administrator.DESKTOP-KMH7HN6\Downloads\li_test_net\my_face_detect\data_set\solver.prototxt
等待训练完成
网络训练的快慢影响因素:
1.模型大小,一个几百层的神经图层和一个几层的训练速度,不必多言
2.输入的数据大小,227x227与32x32,后者当然更快,而且速度相差百倍
此时判别图中是否包含人脸的model就训练成功了
然而这不是目的,还需要知道人脸的位置
5·人脸位置检测
滑动窗口
可以假定一个画框,在图上移动,当移动到人脸上的时候标注
然而人脸有大小,怎么在图像变换之下依旧能够框住?
对于画框而言可以逐次增大,比如尝试27x27能否找到,找不到继续扩大,最后扩大到227x227的时候找到了,那就可以
然而实际中,缩放的是图片,对一张图片从大到小生成一系列缩放,然后让画框移动到人脸上,再缩放回原来的大小
然而问题是,要识别画框中的图片,最后一层连接层大小是固定的,是所有权重的相连,这样对于不断即将要缩放的图片是不合适的
因此要把最后一层变成全卷积层,全连接层不要了
问题来了,那么这到底是训练还是测试?
测试,图像检测只是一种测试而已,人脸识别的模型上面就搭建好了
整体流程:
1.model转化为全卷积
2.多个缩放scale
3.对模型前向传播,得到特征图,得到众多的概率矩阵
4.在特征图上找到人脸部分,记录人脸框体坐标,再反变换回原图,缩放坐标,映射到原图上的真正的坐标
5.NMS非极大值抑制,去除重合度大的框体,因为多个框可能标注的一张人脸,留下概率最高的即可
效果如图
import numpy as np
import matplotlib.pyplot as plt
# import Image
import sys
import os
from math import pow
from PIL import Image, ImageDraw, ImageFont
import cv2
import math
import random
# caffe_root = '/home/matt/Documents/caffe/'
#
# sys.path.insert(0, caffe_root + 'python')
os.environ['GLOG_minloglevel'] = '2'
import caffe
# caffe.set_device(0)
caffe.set_mode_cpu()
class Point(object):
def __init__(self, x, y):
self.x = x
self.y = y
class Rect(object):
def __init__(self, p1, p2):
'''Store the top, bottom, left and right values for points
p1 and p2 are the (corners) in either order
'''
self.left = min(p1.x, p2.x)
self.right = max(p1.x, p2.x)
self.bottom = min(p1.y, p2.y)
self.top = max(p1.y, p2.y)
def __str__(self):
return "Rect[%d, %d, %d, %d]" % (self.left, self.top, self.right, self.bottom)
def calculateDistance(x1, y1, x2, y2):
dist = math.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2)
return dist
def range_overlap(a_min, a_max, b_min, b_max):
'''Neither range is completely greater than the other
'''
return (a_min <= b_max) and (b_min <= a_max)
def rect_overlaps(r1, r2):
return range_overlap(r1.left, r1.right, r2.left, r2.right) and range_overlap(r1.bottom, r1.top, r2.bottom, r2.top)
def rect_merge(r1, r2, mergeThresh):
# centralPt1 = Point((r1.left + r1.right)/2,(r1.top + r1.bottom)/2)
# centralPt2 = Point((r2.left + r2.right)/2,(r2.top + r2.bottom)/2)
if rect_overlaps(r1, r2):
# dist = calculateDistance((r1.left + r1.right)/2, (r1.top + r1.bottom)/2, (r2.left + r2.right)/2, (r2.top + r2.bottom)/2)
SI = abs(min(r1.right, r2.right) - max(r1.left, r2.left)) * abs(max(r1.bottom, r2.bottom) - min(r1.top, r2.top))
SA = abs(r1.right - r1.left) * abs(r1.bottom - r1.top)
SB = abs(r2.right - r2.left) * abs(r2.bottom - r2.top)
S = SA + SB - SI
ratio = float(SI) / float(S)
if ratio > mergeThresh:
return 1
return 0
# 热度图
def generateBoundingBox(featureMap, scale):
boundingBox = []
# 卷积核滑步,假设第一层划了5步,第二层划了5步,相当于划了5x2=10步,以此类推得到最终滑动的步长
stride = 32
# 检测窗口
cellSize = 227
# 227 x 227 cell, stride=32
# 返回各个画框的坐标以及判别人脸的概率值
for (x, y), prob in np.ndenumerate(featureMap):
if (prob >= 0.95):
print(prob)
# 需要得到原始图像上的坐标,而非特征图上的,还需要变换回去
boundingBox.append(
[float(stride * y) / scale, float(x * stride) / scale, float(stride * y + cellSize - 1) / scale,
float(stride * x + cellSize - 1) / scale, prob])
# sort by prob, from max to min.
# boxes = np.array(boundingBox)
return boundingBox
def nms_average(boxes, groupThresh=2, overlapThresh=0.2):
rects = []
temp_boxes = []
weightslist = []
new_rects = []
# print 'boxes: ', boxes
for i in range(len(boxes)):
if boxes[i][4] > 0.2:
rects.append([boxes[i, 0], boxes[i, 1], boxes[i, 2] - boxes[i, 0], boxes[i, 3] - boxes[i, 1]])
# print 'rects: ', rects
# for i in range(len(rects)):
# rects.append(rects[i])
rects, weights = cv2.groupRectangles(rects, groupThresh, overlapThresh)
#######################test#########
rectangles = []
for i in range(len(rects)):
# A______
# | |
# -------B
# A B
testRect = Rect(Point(rects[i, 0], rects[i, 1]), Point(rects[i, 0] + rects[i, 2], rects[i, 1] + rects[i, 3]))
rectangles.append(testRect)
clusters = []
for rect in rectangles:
matched = 0
for cluster in clusters:
if (rect_merge(rect, cluster, 0.2)):
matched = 1
cluster.left = (cluster.left + rect.left) / 2
cluster.right = (cluster.right + rect.right) / 2
cluster.top = (cluster.top + rect.top) / 2
cluster.bottom = (cluster.bottom + rect.bottom) / 2
if (not matched):
clusters.append(rect)
# print "Clusters:"
# for c in clusters:
# print c
###################################
result_boxes = []
for i in range(len(clusters)):
# result_boxes.append([rects[i,0], rects[i,1], rects[i,0]+rects[i,2], rects[i,1]+rects[i,3], 1])
result_boxes.append([clusters[i].left, clusters[i].bottom, clusters[i].right, clusters[i].top, 1])
# print 'result_boxes: ', result_boxes
return result_boxes
# 人脸检测
def face_detection(imgFile):
# 和train相比,网络结构相同,只不过这里没有数据层deploy_full_conv.prototxt,用来测试 模型文件 第三个参数,默认,代表测试
# 读进来配置文件
net_full_conv = caffe.Net(r'C:\Users\Administrator.DESKTOP-KMH7HN6\Downloads\li_test_net\my_face_detect\data_set\deploy_full_conv.prototxt',
r'C:\Users\Administrator.DESKTOP-KMH7HN6\Downloads\li_test_net\my_face_detect\data_set\alexnet_iter_50000_full_conv.caffemodel',
caffe.TEST)
randNum = random.randint(1, 10000)
# 图像缩放比例
scales = []
factor = 0.793700526
# img = Image.open(imgFile.strip())
# img = img.convert('RGB')
img = cv2.imread(imgFile)
print(img.shape)
# 制作图像金字塔,缩放图片,同一张图有大有小
largest = min(2, 4000 / max(img.shape[0:2]))
scale = largest
minD = largest * min(img.shape[0:2])
# 图像需要大于227
while minD >= 227:
scales.append(scale)
scale *= factor
minD *= factor
total_boxes = []
for scale in scales:
# resize image 读进图片,将长宽同比缩放
scale_img = cv2.resize(img, ((int(img.shape[0] * scale), int(img.shape[1] * scale))))
# 图片缩放后将图片存入路径,
cv2.imwrite(r'C:\Users\Administrator.DESKTOP-KMH7HN6\Downloads\li_test_net\my_face_detect\data_set\myscale.jpg', scale_img)
# scale_img.save("tmp{0}.jpg".format(randNum))
# load input and configure preprocessing
# im = caffe.io.load_image("tmp{0}.jpg".format(randNum))
# 读取存入的图片路径
im = caffe.io.load_image(imgFile)
# 将opecv读进的图片参数修改为caffe认可的格式,同时图片已经缩放,则图片尺寸也需要修改
net_full_conv.blobs['data'].reshape(1, 3, scale_img.shape[1], scale_img.shape[0])
# 图片数据可以输入了,转化成数据层结构
transformer = caffe.io.Transformer({'data': net_full_conv.blobs['data'].data.shape})
# 对图像数据进行减mean值操作
transformer.set_mean('data', np.load(
r"C:\Users\Administrator.DESKTOP-KMH7HN6\Downloads\Compressed\caffer_data\caffe-windows\python\caffe\imagenet\ilsvrc_2012_mean.npy").mean(
1).mean(1))
# 将图片由rgb变成bgr格式
transformer.set_transpose('data', (2, 0, 1))
# 对channel也进行变换
transformer.set_channel_swap('data', (2, 1, 0))
# 像素点是否进行缩放,如果训练时使用的是1/255,则这里也应该进行缩放
transformer.set_raw_scale('data', 255.0)
# 此时输入数据已经达标
# make classification map by forward and print prediction indices at each location
# 图像进行前向传播一次,求出各个滑动窗口恰好是人脸区域的概率值
out = net_full_conv.forward_all(data=np.asarray([transformer.preprocess('data', im)]))
# 0代表是人脸,1代表是人脸的概率值
print(out['prob'][0, 1].shape)
# print out['prob'][0].argmax(axis=0)
# 热度图,最后生成的卷积图片,每个点都是一个概率,分布着特征点
# 得到的是已经转换的原图人脸的坐标和人脸的概率
boxes = generateBoundingBox(out['prob'][0, 1], scale)
# plt.subplot(1, 2, 1)
# plt.imshow(transformer.deprocess('data', net_full_conv.blobs['data'].data[0]))
# plt.subplot(1, 2, 2)
# plt.imshow(out['prob'][0,1])
# plt.show()
# print boxes
# 不一定是一张人脸,因此需要多个,也有可能一张脸,但是画了很多框
if (boxes):
total_boxes.extend(boxes)
# boxes_nms = np.array(total_boxes)
# true_boxes = nms(boxes_nms, overlapThresh=0.3)
# #display the nmx bounding box in image.
# draw = ImageDraw.Draw(scale_img)
# for box in true_boxes:
# draw.rectangle((box[0], box[1], box[2], box[3]) )
# scale_img.show()
# nms
print(total_boxes)
# 筛选人脸,对于一张脸划了很多框的留一个
# 非非极大值抑制,判断框体重合度nms,只保留概率最大的
boxes_nms = np.array(total_boxes)
# 重合0.8的视为同一个
true_boxes = nms_average(boxes_nms, 1, 0.2)
if not true_boxes == []:
(x1, y1, x2, y2) = true_boxes[0][:-1]
# 将坐标描述的方框画出来
cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0))
win = cv2.namedWindow('test win', flags=0)
cv2.imshow('test win', img)
cv2.waitKey(0)
# x1 = int(max(1, x1-(x2-x1)/6))
# y1 = int(max(1, y1-(y2-y1)/3))
# x2 = int(min(img.size[0], x2+(x2-x1)/6))
# cvimg = cv2.imread(imgFile)
# if cvimg == None:
# continue
# cvimg = cvimg[y1:y2, x1: x2]
# cvimg = cv2.resize(cvimg, (256,256))
# outputPath = os.path.join(imgPath+'-c', folder, str(count)+'.jpg')
# cv2.imwrite(outputPath, cvimg)
# count += 1
if __name__ == "__main__":
imgFile = r'C:\Users\Administrator.DESKTOP-KMH7HN6\Downloads\li_test_net\my_face_detect\data_set\123.jpg'
face_detection(imgFile)
但是问题也非常明显,每个scale缩放都需要进行一次前向传播,而模型非常大,非常耗时,10s几乎才一帧
6·训练速度优化
因此可以采用一种思想
1.先用非常小的扫描块扫描图片,得到人脸的大概区域,然后逐次增大,锁定人脸区域识别
24x24 48x48 127x127.。。。
一开始用227x227的实际上非常耗时
2.有时人脸框标注的并非最佳区域,可以使用矫正网络
矫正网络将一张图片变成45张图片,sn5种,代表缩放,xn3种代表左右偏移,yn3种代表上下偏移,5x3x3
矫正网络可以用于服务1中的扫描块,用于微调扫描块所标注的人脸位置
其效果可以达到1s25帧识别,速度更快
7·精确度优化
理论上更深的网络提供的精确度更好
alexnet逊色于aggnet,后者网络结构更深
8·数据增强策略
图像数据,平移,偏转…可以将1000张图片扩充到3万
9·过拟合现象
训练的loss越来越低,而test的loss非常高,代表训练效果过拟合了,
可以减少学习率来解决
比如第五万次开始过拟合,则可以直接用第五万次的训练结果作为初始权重训练,然后降低学习率,查看新的训练结果