人脸关键点通常是基层的CNN然后连接FCN进行回归,这里计划将FCN改为全局加权平均池化来处理,意思说是卷积核的尺寸与输入map的尺寸相同,即K=H=W,则输出map为C∗1∗1即长度为CC的向量,此时称之为Global Depthwise Convolution(GDC),见MobileFaceNet,可以看成是全局加权池化,与 Global Average Pooling(GAP) 的不同之处在于,GDC 给每个位置赋予了可学习的权重(对于已对齐的图像这很有效,比如人脸,中心位置和边界位置的权重自然应该不同),而GAP每个位置的权重相同,全局取个平均,如图:
https://www.pianshen.com/article/76591199542/
替代FCN的好处:
全连接层的缺点
全连接层有一个非常致命的弱点就是参数量过大,特别是与最后一个卷积层相连的全连接层。一方面增加了Training以及testing的计算量,降低了速度;另外一方面参数量过大容易过拟合。虽然使用了类似dropout等手段去处理,但是毕竟dropout是hyper-parameter, 不够优美也不好实践。
GAP(Global Average Pooling)可以取代全连接层,关于全局平均池化,该链接介绍很清楚:https://www.cnblogs.com/jins-note/p/9769324.html。
我们要明确以下,全连接层将卷积层展开成向量之后不还是要针对每个feature map进行分类吗,GAP的思路就是将上述两个过程合二为一,一起做了。如图所示:
上面操作不禁让我想到采用深度可分离卷积来进行处理, 这篇文章居然已经用了该思路!https://blog.csdn.net/u011995719/article/details/79435615
# CAP池化使用方法
def get_model():
input_shape = (image_size, image_size, 3)
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), padding='same',
input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, kernel_size=(3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(n_classes, kernel_size=(3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(GlobalAveragePooling2D())
print (model.summary())
#sys.exit(0) #
model.compile(loss=keras.losses.mean_squared_error,
optimizer= keras.optimizers.Adadelta())
return model
大致在回想下深度可分离卷积原理:用单层卷积核分别和通道特征图分别卷积,如果卷积核尺寸和特征图尺寸相同,则每个通道生成一个值,这个值作为关键点的一个坐标即可;
下图是常规CNN和深度可分类CNN示意图:https://blog.csdn.net/tintinetmilou/article/details/81607721
本次采用keras框架进行处理,我们看下keras的深度可分离卷积DepthwiseConv2D:
https://blog.csdn.net/c_chuxin/article/details/88581411
keras.layers.DepthwiseConv2D(kernel_size, strides=(1, 1), padding='valid', depth_multiplier=1, data_format=None, activation=None, use_bias=True, depthwise_initializer='glorot_uniform', bias_initializer='zeros', depthwise_regularizer=None, bias_regularizer=None, activity_regularizer=None, depthwise_constraint=None, bias_constraint=None)
理论思考完毕,下面是动手实践。这里我自己定义了两张回归网络,model_key和model_key2,重点讲model_key2.
# -*- coding: utf-8 -*-
import numpy as np
import os
import keras.backend as K
from keras.optimizers import *
from keras.models import *
from keras.layers import *
def model_key():
imgsize=178
model = Sequential()#218*178*3
model.add(Conv2D(32, (3, 3), input_shape=(imgsize, imgsize, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10))
model.compile(optimizer = Nadam(lr = 5e-3),loss='mse', metrics=['mae'])
# model.compile(optimizer = Nadam(lr = 5e-4), loss = l2,metrics=['accuracy'])
return model
def model_key2(height=320, width=320, channel=3):
input = Input(shape=(height, width, channel))
conv1_1 = Conv2D(16, 3, strides=(2, 2), padding='same', use_bias=False, kernel_initializer='he_normal')(input)
conv1_1 = BatchNormalization(axis=3)(conv1_1)
conv1_1 = Activation('relu')(conv1_1)
conv1_1 = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same')(conv1_1)
conv2_1 = Conv2D(64, 3, strides=(2, 2), padding='same', use_bias=False, kernel_initializer='he_normal')(conv1_1)
conv2_1 = BatchNormalization(axis=3)(conv2_1)
conv2_1 = Activation('relu')(conv2_1)
conv2_1 = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same')(conv2_1)
conv3_1 = Conv2D(128, 3, strides=(2, 2), padding='same', use_bias=False, kernel_initializer='he_normal')(conv2_1)
conv3_1 = BatchNormalization(axis=3)(conv3_1)
conv3_1 = Activation('relu')(conv3_1)
conv3_1 = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same')(conv3_1)
# conv4_1 = Conv2D(256, 3, strides=(1, 1), padding='same', use_bias=False, kernel_initializer='he_normal')(conv3_1)
# conv4_1 = BatchNormalization(axis=3)(conv4_1)
# conv4_1 = Activation('relu')(conv4_1)
# conv4_1 = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), padding='same')(conv4_1)
convDepthwiseConv2D_1 = DepthwiseConv2D((5, 5),
padding='valid',
depth_multiplier=1,
strides=(1, 1),# 与该参数无关
use_bias=False)(conv3_1)
conv5_1 = Conv2D(10, 1, strides=(1, 1), padding='same', use_bias=False, kernel_initializer='he_normal')(convDepthwiseConv2D_1)
flat = Flatten()(conv5_1)
activation = Activation('relu', name='Classification')(flat)
model = Model(inputs=input, outputs=activation)
print(model.output_shape)
model.compile(optimizer = Nadam(lr = 5e-3),loss='mse', metrics=['mae'])
return model
通过summary()查看下model_key2的网络结构如下 ,这里我使用了深度可分离卷积来拟合关键点,注意不是未来降低参数量!
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 176, 176, 32) 896
_________________________________________________________________
activation_1 (Activation) (None, 176, 176, 32) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 88, 88, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 86, 86, 32) 9248
_________________________________________________________________
activation_2 (Activation) (None, 86, 86, 32) 0
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 43, 43, 32) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 41, 41, 64) 18496
_________________________________________________________________
activation_3 (Activation) (None, 41, 41, 64) 0
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 20, 20, 64) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 25600) 0
_________________________________________________________________
dense_1 (Dense) (None, 64) 1638464
_________________________________________________________________
activation_4 (Activation) (None, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 64) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 650
=================================================================
Total params: 1,667,754
Trainable params: 1,667,754
Non-trainable params: 0
_________________________________________________________________
测试脚本:
import cv2
import os
def predict():
model=load_model(r'./key_model.h5')
image= cv2.imread("000300.jpg",1)
image = cv2.resize(image, (320, 320))
image.astype('float32')
image = np.expand_dims(image, axis=0)
#归一化
result = model.predict(image)
print("...........")
print("key: ",result)
if __name__=="__main__":
predict()
测试发现采用我的网络loss下降得非常快,最终结果相差并不大!
预测的关键点坐标:
实际的关键点坐标:
000300.jpg,68,71,109,72,86,93,67,110,111,112
参考博客:https://www.bbsmax.com/A/MAzAjNORJ9/