【WMCA】《Biometric Face Presentation Attack Detection with Multi-Channel Convolutional Neural Network》_wide multi channel presentation attack-CSDN博客

本文链接：https://blog.csdn.net/bryant_meng/article/details/125285392

在这里插入图片描述

IEEE Transactions on Information Forensics and Security（TIFS）-2019

文章目录

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
5 Experiments
6 Conclusion（own） / Future work

1 Background and Motivation

人脸识别是一种 mainstream biometric authentication method

However, vulnerability to presentation attacks (a.k.a spoofing) limits its usability in unsupervised applications（无人场景）

在这里插入图片描述

随着攻击方式的升级（特别是 3D 面具），仅凭 visual spectra alone（RGB）想要实现一个 reliable Presentation Attack detection（PAD，人脸活检检测器）是非常具有挑战的，作者觉得 multi channel（多模态）有助于缓解此问题（Tricking a multi-channel system is harder than a visual spectral one. An attacker would have to mimic real facial features across different representations.）

本文，作者开源了 Wide Multi-Channel presentation Attack (WMCA) database——RGB / NIR / Depth / Thermal

提出了 multi-channel CNN (MC-CNN) face presentation attack detection，can detect a variety of 2D and 3D attacks in obfuscation or impersonation settings.

2 Related Work

Feature based approaches for face PAD
CNN based approaches for face PAD
Multi-channel based approaches and datasets for face PAD

在这里插入图片描述

3 Advantages / Contributions

开源多模态数据集（RGB / NIR / Depth / Thermal）WMCA

提出 MC-CNN 来解决多模态人脸活体检测的问题

4 Method

1）Preprocessing

MTCNN 进行人脸检测

Supervised Descent Method (SDM) 进行人脸关键点定位

face alignment 然后 resize 成 128x128

数据归一化（8-bit形式）
在这里插入图片描述

这里强调了以下非 RGB 模态的数据（例如深度图可能是 16bit 的数据流），采用了 Mean absolute Deviation（MAD）归一化方法使其成 8-bit format

提起归一化，我们可能最先想到的是 Linear normalization (“Max-Min”)，也即
$\frac{x-min(x)}{max(x)-min(x)}$

还有 Z-Score normalization

$\frac{x- \mu}{\sigma}$

MAD 也是某一种，具体得看《Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median》

2）Network architecture

PAD 的数据集往往不大（相比自然数据集的分类 / 人脸识别等），being insufficient to train a deep architecture from scratch

得借助 pre-train 模型（一般是人脸识别）

《Heterogeneous face recognition using domain specific units》这篇文章提出

high-level features of Deep Convolutional Neural Networks trained in visual spectra images are potentially domain independent and can be used to encode faces sensed in different image domains

分开 learn low level feature detectors that are domain specific and share the same set of high level features from the source domain without re-train them

作者沿用了这个思想进行迁移学习

adaptation of lower layers of CNN, instead of adapting the whole network when limited amount of target data is available

基于 LightCNN 网络，设计了如下形式的 PAD
在这里插入图片描述
29 layers

灰色的部分都是不训练的，直接 pre-train 迁移过来

loss 函数为 Binary Cross Entropy (BCE)

5 Experiments

5.1 Datasets and Metrics

1）Camera set up for data collection

Intel RealSense SR300 sensor 采集 RGB / NIR / Depth

在这里插入图片描述
RealSense技术在SR300摄像头上的应用

Seek Thermal Compact PRO sensor 采集热成像图

在这里插入图片描述
看见不一样的世界：SEEK Compact Pro手机热成像镜头

采集到的图片如下：
在这里插入图片描述

2）Camera integration and calibration

RGB / NIR / Depth 不用担心，一个设备里面的，产品是校准好了的，需要校准的是 RGB / NIR / Depth 与 thermal 之间，使所有模态采集到的信息，时间+空间对应得上

架子，standard optical mounting posts,
在这里插入图片描述

标定棋盘：a checkerboard pattern made from materials with different thermal characteristics

加热棋盘使其在热成像摄像头下成像：For the pattern to be visible on the thermal channel, the target was illuminated by high power halogen lamps.（666）

在这里插入图片描述

3）Data collection procedure

在这里插入图片描述

Session four was dedicated to presentation attacks only.

The masks and mannequins were heated using a blower prior to capture to make the attack more challenging.（我去，这我是没有想到的，直接给自己上难度！！）

record data from the sensors for 10 seconds

4）Presentation attacks
在这里插入图片描述

glasses（这个太难了）
fake head（头模），were heated with a blower prior to capture
print
replay
rigid mask（plastic masks）
flexible mask（silicone masks）
paper mask

5）数据划分方式

50 frames from each video which are uniformly sampled in the temporal domain

grandtest protocol：the PA categories are distributed uniformly in the three splits（train, dev, and eval）
unseen attack：using leave one out (LOO) technique，留一法交叉验证，The training and tuning are done on data which doesn’t contain the attack of interest.

6）评价指标

在活体检测中，通常将攻击视为正样本，而真实人脸视为负样本

APCER：Attack Presentation Classification Error Rate： FN / (TP + FN)，攻击分类错误率
NPCER：Normal Presentation Classification Error Rate：FP / (TN + FP )，正常分类错误率
BPCER：Bona Fide Presentation Classification Error Rate：同 NPCER
ACER：Average Classification Error Rate： (APCER + BPCER) / 2.0，平均分类错误率

在 dev 集上 BPCER = 1% for obtaining the thresholds.

5.2 Experiments with grandtest protocol

1）Baseline results

在这里插入图片描述
不同模态在传统的特征提取分类框架下的表现结果

Score fusion 的方式为各模态的得分先归一化到0~1，a mean fusion is performed to obtain the final PA score.

the addition of multiple channels helps in boosting the performance of PAD systems.，但传统方法还没完全挖掘出多模态融合的潜力

ps：对于 PAD 系统来说，BPCER 不要太低就可以，重点是 APCER 一定要低

2）Results with MC-CNN
在这里插入图片描述
比传统方法猛一些

这个图为啥 MC-CNN 和 FASNet 短了一截呢？作者给出了解释

在这里插入图片描述
我们仔细分析一下，图 7 是在不同阈值下统计 APCER 和 BPCER 画出来的

当阈值升高，倾向于什么都判定为攻击，APCER->0，BPCER->1，1-BPCER->0，对应图中曲线左下方向
当阈值降低，倾向于什么都判定为真人，APCER->1，BPCER->0，1-BPCER->1，对应图中曲线右上方向

作者说 CNN 的结果是双峰的，集中分布在 0 和 1 附近，方差较小，eg：都集中在 0 和 0.9 区域，阈值高到一定程度，eg 0.9，APCER 和 BPCER 不变了，就没有曲线了（变成一个点）

下面看看面对不同攻击时候的结果
在这里插入图片描述
除了眼镜，都干到了 100%

5.3 Generalization to unseen attacks

在这里插入图片描述

眼镜弱了些（合理），rigid mask 比 flexible mask 要相对简单些，其他干到了 100%

在这里插入图片描述
Similarly, attacks in lower chin could be harder to detect due to variability introduced by bonafide samples with facial hair and so on.

作者进一步引申出，面对 obfuscation 的 PAI 方式，可能比 impersonation PAI 更加困难

5.4 Analysis of MC-CNN framework

1）Experiments with adapting different layers
在这里插入图片描述
fine-tune 不同的 layer

fine-tune 的 conv 指的是下图的灰色部分

The performance becomes worse when all layers are adapted. This can be attributed to over-fitting as the number of parameters to learn is very large

看了下代码好像只有 1-9 没有 1-10 哈

https://github.com/AlfredXiangWu/LightCNN/blob/master/light_cnn.py

class network_29layers(nn.Module):
    def __init__(self, block, layers, num_classes=79077):
        super(network_29layers, self).__init__()
        self.conv1  = mfm(1, 48, 5, 1, 2)
        self.pool1  = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)
        self.block1 = self._make_layer(block, layers[0], 48, 48)
        self.group1 = group(48, 96, 3, 1, 1)
        self.pool2  = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)
        self.block2 = self._make_layer(block, layers[1], 96, 96)
        self.group2 = group(96, 192, 3, 1, 1)
        self.pool3  = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)
        self.block3 = self._make_layer(block, layers[2], 192, 192)
        self.group3 = group(192, 128, 3, 1, 1)
        self.block4 = self._make_layer(block, layers[3], 128, 128)
        self.group4 = group(128, 128, 3, 1, 1)
        self.pool4  = nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)
        self.fc     = mfm(8*8*128, 256, type=0)
        self.fc2    = nn.Linear(256, num_classes)
            
    def _make_layer(self, block, num_blocks, in_channels, out_channels):
        layers = []
        for i in range(0, num_blocks):
            layers.append(block(in_channels, out_channels))
        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.pool1(x)

        x = self.block1(x)
        x = self.group1(x)
        x = self.pool2(x)

        x = self.block2(x)
        x = self.group2(x)
        x = self.pool3(x)

        x = self.block3(x)
        x = self.group3(x)
        x = self.block4(x)
        x = self.group4(x)
        x = self.pool4(x)

        x = x.view(x.size(0), -1)
        fc = self.fc(x)
        fc = F.dropout(fc, training=self.training)
        out = self.fc2(fc)
        return out, fc

2）Experiments with different combinations of channels

grandtest protocol.
在这里插入图片描述

G : Only Grayscale channel is used.
D : Only Depth channel is used.
I : Only Infrared channel is used.
T : Only Thermal channel is used.

T > I > D > G

The performance boost in the proposed framework is achieved with the use of multiple channels.

6 Conclusion（own） / Future work

presentation attack（PA），ISO 标准定义 presentation attack is defined as “a presentation to the biometric data capture subsystem with the goal of interfering with the operation of the biometric system”.

presentation attack instrument (PAI)，攻击手段，

For example, if we have silicone masks in the training set; then classifying mannequins as an attack is rather easy.

spatially and temporally aligned channels

深度图与彩色图的配准与对齐
在这里插入图片描述