一、概述

一种非常高效的CNN模型MobileFaceNets，该模型使用少于一百万个参数，专门针对移动和嵌入式设备上的高精度实时人脸验证而量身定制。MobileFaceNet达到明显优越的精度，而且实际速度是MobileNetV2的两倍。单个4M尺寸大小的MobileFaceNet在MS-Celeb-1M数据集上用ArcFace训练后，可以在LFW达到99.55%精度，甚至可以和一些大型几百M的CNN网络相比较。最快的MobileFaceNet在手机上推断时间仅有18毫秒，对于人脸验证，相比之前的MobileCNN，它的效率大大提高。

随着技术的发展，人脸识别算法在嵌入式终端上的应用越来越广泛，但由于终端设备的算力和存储资源限制，对人脸检测和识别模型的要求倾向于轻量级+高精度。轻量级模型相对于又深又宽的大模型，具有参数量小、乘加数少的特点，但同时在预测精度上不能有太大的损失。

近年来，MobilenetV1，ShuffleNet和MobileNetV2等轻量级网络多用于移动终端的视觉识别任务，但是由于人脸结构的特殊性，这些网络在人脸识别任务上并没有获得满意的效果。针对这一问题，北京交通大学的Sheng Chen等人在论文《MobileFaceNets: Efficient CNNs for Accurate RealTime Face Verification on Mobile Devices》提出了一种专门针对人脸识别的轻量级网络MobileFaceNet。

如下图所示，在使用MobileNetV2等网络进行人脸识别时，平均池化层对FMap-end的Corner Unit和Center Unit给予了同样的权重，但实际上，对于人脸识别来说，中心单元的重要程度显然比角单元重要。因此，需要对网络进行有针对性的优化。论文中，最重要的一个优化就是使用Global Depthwise Convolution (GDConv，全局逐深度卷积层)代替Global Average Pooling (GAP，全局平均池化层)，因为GDConv的weights即相当于实现不同位置的重要性权重系数。

1、MobileNet-v1

可分离卷积（Depthwise separable conv）：

可分离卷积可以减少参数量与计算量：

例如输入是1001003，普通卷积采用33352的卷积核，输出为10010052,参数量为33352=1404

使用深度分离卷积，第一步是采用333的卷积核，输出各个通道不相加，仍然为3通道，第二步采用113*52的卷积核，输出相同，参数量为27+156=183，参数量减少

2、MobileNet-v2

使用反残差模块，“反”体现在原来的模块会使用11的卷积和降维，再用33卷积核去卷积，现先使用1*1的卷积核升维，再使用大卷积核卷积。

那有人会问，这样不是会增加计算量吗？

其实，减少计算量体现在第二层是只有一个卷积核，即使用了可分离卷积的方式，因此计算量会减少。

举个例子：

假设不使用可分离卷积，设输入通道数为20,输出通道数同样要20，那么第二层需要3320*20个参数，

使用了可分离卷积，第二层则只需要3320*1个参数，每个通道的卷积结果不相加，因此输出通道数不变，但是减少了计算量。
在这里插入图片描述
其中：

t表示通道“扩张”倍数，
c表示输出通道数，
n表示重复次数，
s表示步长stride。

3、MobileFaceNet

mobilefacenet其实是mobilenetV2的改进版本，主要改进之处有以下几个地方：

1. 针对平均池化层，许多研究表明，使用平均池化层会使得网络表现下降，但是一直没有理论说明，因此作者在文中给出了一个理论解释：
在最后一个7*7特征图中，虽然中心点的感知域和边角点的感知域是一样的，但是中心点的感知域包括了完整的图片，边角点的感知域却只有部分的图片，因此每个点的权重应该不一样，但是平均池化层却把他们当作一样的权重去考虑了，因此网络表现会下降，如图：

在这里插入图片描述
因此，作者在此处使用了可分离卷积代替平均池化层，即使用一个77512（512表示输入特征图通道数目）的可分离卷积层代替了全局平均池化，这样可以让网络自己不同点的学习权重。

此处的可分离卷积层使用的英文名是global depthwise convolution，global表示全局，depthwise表示逐深度，即逐通道的卷积，其实就是之前描述的那种可分离卷积的方式：使用77512的卷积核代替77512*512的卷积核。
其实这里我们可以发现，后者其实是全卷积。

2. 采用Insightface的损失函数进行训练。

3.一些小细节：通道扩张倍数变小；使用Prelu代替relu;使用batch Normalization。

二、MobileFaceNet网络整体结构

在这里插入图片描述
其中：

t表示通道“扩张”倍数，
c表示输出通道数，
n表示重复次数，
s表示步长stride。

三、实验结果

1、训练集：CASIA-Webface，损失函数：ArcFace的损失

在这里插入图片描述

2、训练集：清理过的MS-Celeb-1M，损失函数：ArcFace的损失

在这里插入图片描述

四、MobileFaceNets模型代码

from torch import nn
from torch.nn import BatchNorm2d, Conv2d, Module, PReLU, Sequential

class Flatten(Module):
    def forward(self, input):
        return input.view(input.size(0), -1)

class Linear_block(Module):
    def __init__(self, in_c, out_c, kernel=(1, 1), stride=(1, 1), padding=(0, 0), groups=1):
        super(Linear_block, self).__init__()
        self.conv   = Conv2d(in_c, out_channels=out_c, kernel_size=kernel, groups=groups, stride=stride, padding=padding, bias=False)
        self.bn     = BatchNorm2d(out_c)
    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        return x

class Residual_Block(Module):
     def __init__(self, in_c, out_c, residual = False, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=1):
        super(Residual_Block, self).__init__()
        self.conv       = Conv_block(in_c, out_c=groups, kernel=(1, 1), padding=(0, 0), stride=(1, 1))
        self.conv_dw    = Conv_block(groups, groups, groups=groups, kernel=kernel, padding=padding, stride=stride)
        self.project    = Linear_block(groups, out_c, kernel=(1, 1), padding=(0, 0), stride=(1, 1))
        self.residual   = residual
     def forward(self, x):
        if self.residual:
            short_cut = x
        x = self.conv(x)
        x = self.conv_dw(x)
        x = self.project(x)
        if self.residual:
            output = short_cut + x
        else:
            output = x
        return output

class Residual(Module):
    def __init__(self, c, num_block, groups, kernel=(3, 3), stride=(1, 1), padding=(1, 1)):
        super(Residual, self).__init__()
        modules = []
        for _ in range(num_block):
            modules.append(Residual_Block(c, c, residual=True, kernel=kernel, padding=padding, stride=stride, groups=groups))
        self.model = Sequential(*modules)
    def forward(self, x):
        return self.model(x)

class Conv_block(Module):
    def __init__(self, in_c, out_c, kernel=(1, 1), stride=(1, 1), padding=(0, 0), groups=1):
        super(Conv_block, self).__init__()
        self.conv   = Conv2d(in_c, out_channels=out_c, kernel_size=kernel, groups=groups, stride=stride, padding=padding, bias=False)
        self.bn     = BatchNorm2d(out_c)
        self.prelu  = PReLU(out_c)
    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.prelu(x)
        return x

class MobileFaceNet(Module):
    def __init__(self, embedding_size):
        super(MobileFaceNet, self).__init__()
        # 112,112,3 -> 56,56,64
        self.conv1      = Conv_block(3, 64, kernel=(3, 3), stride=(2, 2), padding=(1, 1))

        # 56,56,64 -> 56,56,64
        self.conv2_dw   = Conv_block(64, 64, kernel=(3, 3), stride=(1, 1), padding=(1, 1), groups=64)

        # 56,56,64 -> 28,28,64
        self.conv_23    = Residual_Block(64, 64, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=128)
        self.conv_3     = Residual(64, num_block=4, groups=128, kernel=(3, 3), stride=(1, 1), padding=(1, 1))

        # 28,28,64 -> 14,14,128
        self.conv_34    = Residual_Block(64, 128, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=256)
        self.conv_4     = Residual(128, num_block=6, groups=256, kernel=(3, 3), stride=(1, 1), padding=(1, 1))

        # 14,14,128 -> 7,7,128
        self.conv_45    = Residual_Block(128, 128, kernel=(3, 3), stride=(2, 2), padding=(1, 1), groups=512)
        self.conv_5     = Residual(128, num_block=2, groups=256, kernel=(3, 3), stride=(1, 1), padding=(1, 1))

        self.sep        = nn.Conv2d(128, 512, kernel_size=1, bias=False)
        self.sep_bn     = nn.BatchNorm2d(512)
        self.prelu      = nn.PReLU(512)

        self.GDC_dw     = nn.Conv2d(512, 512, kernel_size=7, bias=False, groups=512)
        self.GDC_bn     = nn.BatchNorm2d(512)

        self.features   = nn.Conv2d(512, embedding_size, kernel_size=1, bias=False)
        self.last_bn    = nn.BatchNorm2d(embedding_size)
  
        self._initialize_weights()

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    m.bias.data.zero_()
            elif isinstance(m, nn.BatchNorm2d):
                m.weight.data.fill_(1)
                m.bias.data.zero_()
            elif isinstance(m, nn.Linear):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    m.bias.data.zero_()
                    
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2_dw(x)
        x = self.conv_23(x)
        x = self.conv_3(x)
        x = self.conv_34(x)
        x = self.conv_4(x)
        x = self.conv_45(x)
        x = self.conv_5(x)

        x = self.sep(x)
        x = self.sep_bn(x)
        x = self.prelu(x)
        
        x = self.GDC_dw(x)
        x = self.GDC_bn(x)

        x = self.features(x)
        x = self.last_bn(x)
        return x


def get_mbf(embedding_size, pretrained):
    if pretrained:
        raise ValueError("No pretrained model for mobilefacenet")
    return MobileFaceNet(embedding_size)