之前在文章mtcnn人脸检测算法介绍中对MTCNN的模型结构进行了说明,今天正好有点时间,把MTCNN的模型实现写一写。
1. P-Net
P-Net的模型结构如下:
代码实现:
# ----------------------------------------------------#
# 粗略获取人脸框
# 输出bbox位置和是否有人脸
# 已有P-Net模型的权重数据存在weight_path中
#
# ----------------------------------------------------#
def create_Pnet(weight_path):
inputs = Input(shape=[None, None, 3])
x = Conv2D(10, (3, 3), strides=1, padding='valid', name='conv1')(inputs)
x = PReLU(shared_axes=[1, 2], name='PReLU1')(x)
x = MaxPool2D(pool_size=2)(x)
x = Conv2D(16, (3, 3), strides=1, padding='valid', name='conv2')(x)
x = PReLU(shared_axes=[1, 2], name='PReLU2')(x)
x = Conv2D(32, (3, 3), strides=1, padding='valid', name='conv3')(x)
x = PReLU(shared_axes=[1, 2], name='PReLU3')(x)
classifier = Conv2D(2, (1, 1), activation='softmax', name='conv4-1')(x)
# 无激活函数,线性。
bbox_regress = Conv2D(4, (1, 1), name='conv4-2')(x)
model = Model([inputs], [classifier, bbox_regress])
# 加载存在本地的权重文件
model.load_weights(weight_path, by_name=True)
return model
以上代码中,权重数据是 以HDF5文件的形式存储在本地,通过model.load_weights()函数加载:
model.load_weights(filepath, by_name=False)
: 从 HDF5 文件(由 save_weights
创建)中加载权重。默认情况下,模型的结构应该是不变的。 如果想将权重载入不同的模型(部分层相同), 设置 by_name=True
来载入那些名字相同的层的权重。
2. R-Net
R-Net的模型结构如下:
代码实现:
#---------------------------------------------------------------------------#
# R-Net
# 进一步过滤大量虚假候选对象,利用边界框回归进行校准、使用NMS合并候选框
# --------------------------------------------------------------------------#
def create_Rnet(weight_path):
inputs = Input(shape=[24, 24, 3])
# 24,24,3 -> 22,22,28 -> 11,11,28
x = Conv2D(28, (3, 3), strides=1, padding='valid', name='conv1')(inputs)
x = PReLU(shared_axes=[1, 2], name='prelu1')(x)
x = MaxPool2D(pool_size=3, strides=2, padding='same')(x)
# 11,11,28 -> 9,9,48 -> 4,4,48
x = Conv2D(48, (3, 3), strides=1, padding='valid', name='conv2')(x)
x = PReLU(shared_axes=[1, 2], name='prelu2')(x)
x = MaxPool2D(pool_size=3, strides=2)(x)
# 4,4,48 -> 3,3,64
x = Conv2D(64, (2, 2), strides=1, padding='valid', name='conv3')(x)
x = PReLU(shared_axes=[1, 2], name='prelu3')(x)
# 3,3,64 -> 64,3,3
x = Permute((3, 2, 1))(x)
x = Flatten()(x)
# 576 -> 128
x = Dense(128, name='conv4')(x)
x = PReLU(name='prelu4')(x)
# 128 -> 2
classifier = Dense(2, activation='softmax', name='conv5-1')(x)
# 128 -> 4
bbox_regress = Dense(4, name='conv5-2')(x)
model = Model([inputs], [classifier, bbox_regress])
#加载存在本地的权重数据
model.load_weights(weight_path, by_name=True)
#保存整个model
#model.save("rnet_model.h5")
return model
3. O-Net
O-Net的模型结构如下图:
代码实现:
# ----------------------------------#
# O-Net, mtcnn的第三阶段网络
# 进一步精修回归框,并输出5个Landmark
# ----------------------------------#
def create_Onet(weight_path):
inputs = Input(shape=[48, 48, 3])
# 48,48,3 -> 46,46,32 -> 23,23,32
x = Conv2D(32, (3, 3), strides=1, padding='valid', name='conv1')(inputs)
x = PReLU(shared_axes=[1, 2], name='prelu1')(x)
x = MaxPool2D(pool_size=3, strides=2, padding='same')(x)
# 23,23,32 -> 21,21,64 -> 10,10,64
x = Conv2D(64, (3, 3), strides=1, padding='valid', name='conv2')(x)
x = PReLU(shared_axes=[1, 2], name='prelu2')(x)
x = MaxPool2D(pool_size=3, strides=2)(x)
# 8,8,64 -> 4,4,64
x = Conv2D(64, (3, 3), strides=1, padding='valid', name='conv3')(x)
x = PReLU(shared_axes=[1, 2], name='prelu3')(x)
x = MaxPool2D(pool_size=2)(x)
# 4,4,64 -> 3,3,128
x = Conv2D(128, (2, 2), strides=1, padding='valid', name='conv4')(x)
x = PReLU(shared_axes=[1, 2], name='prelu4')(x)
# 3,3,128 -> 128,12,12
x = Permute((3, 2, 1))(x)
x = Flatten()(x)
# 1152 -> 256
x = Dense(256, name='conv5')(x)
x = PReLU(name='prelu5')(x)
# 三个输出,主要是回归框bbox和landmark
# 256 -> 2
classifier = Dense(2, activation='softmax', name='conv6-1')(x)
# 256 -> 4
bbox_regress = Dense(4, name='conv6-2')(x)
# 256 -> 10
landmark_regress = Dense(10, name='conv6-3')(x)
model = Model([inputs], [classifier, bbox_regress, landmark_regress])
model.load_weights(weight_path, by_name=True)
#保存整个model
#model.save("onet_model.h5")
return model
4. 模型参数量及FLOPs分析
基于TensorFlow 2写了个小工具,用来分析mtcnn三个模型的参数量和FLOPs,代码如下:
import tensorflow as tf
#-----------------------------------------------------------
#定义一个函数用来统计图的profile,包括获得图的flops、params num
#-----------------------------------------------------------
def stats_graph(graph):
flops = tf.compat.v1.profiler.profile(graph, options=tf.compat.v1.profiler.ProfileOptionBuilder.float_operation())
params = tf.compat.v1.profiler.profile(graph, options=tf.compat.v1.profiler.ProfileOptionBuilder.trainable_variables_parameter())
print('FLOPs: {}; Total params: {}'.format(flops.total_float_ops, params.total_parameters))
#-----------------------------------------------------------
# 获取指定路径下模型的参数量和浮点操作数
#-----------------------------------------------------------
def get_flops_params(model_path):
# Add this to get correct result when call the above function multi-times
tf.compat.v1.reset_default_graph()
session = tf.compat.v1.Session()
graph = tf.compat.v1.get_default_graph()
with graph.as_default():
with session.as_default():
model = tf.keras.models.load_model(model_path)
stats_graph(graph)
#---------------------------------------------------------------
# 统计三个阶段模型的参数
# 注:输入的模型文件必须包含模型结构和权重,只有权重数据的H5文件会报错
#---------------------------------------------------------------
if __name__ == "__main__":
print("---------------- MTCNN PNet Stats ----------------------\n")
get_flops_params("pnet_model.h5")
print("---------------- MTCNN RNet Stats ----------------------\n")
#get_flops_params(rnet_model.h5)
print("---------------- MTCNN ONet Stats ----------------------\n")
#get_flops_params("onet_model.h5")
统计输出的内容有点多,类似这样:
不过我们只关注两个参数就够了,摘取打印结果如下:
---------------- MTCNN PNet Stats ----------------------
FLOPs: 13083; Total params: 6632
---------------- MTCNN RNet Stats ----------------------
FLOPs: 199546; Total params: 100178
---------------- MTCNN ONet Stats ----------------------
FLOPs: 776424; Total params: 389040
注意,P-Net的FLOPs是一张图的计算量,但是我们知道,在第一步,是需要根据图像大小进行不同数量的采样,生成图像金字塔,然后将金字塔中的图像送入到P-Net中进行预测,因此实际在P-Net阶段的运算量应该是13083*N。
根据以上统计数据,整理成表格如下,当然这个数据跟实际情况多少是有些出入的,尤其是FLOPs,此处仅供参考。