【FAS-FRN】《Recognizing Multi-modal Face Spoofing with Face Recognition Networks》

最新推荐文章于 2022-09-07 19:38:54 发布

苏堤春不晓

最新推荐文章于 2022-09-07 19:38:54 发布

阅读量1.4k

点赞数 1

分类专栏： CNN / Transformer 文章标签： FAS 人脸识别深度学习

本文链接：https://blog.csdn.net/bryant_meng/article/details/109305936

版权

CNN / Transformer 专栏收录该内容

251 篇文章

订阅专栏

在这里插入图片描述

在这里插入图片描述
CVPR-2019 workshop

code：https://github.com/AlexanderParkin/ChaLearn_liveness_challenge

1 Background and Motivation

人脸识别落地应用中，face anti-spoofing algorithms（反欺骗算法）的重要性不言而喻！

虽然 face anti-spoofing 大型数据集不像 face recognition 那么好制作（花式攻击），但 anti-spoofing algorithms 可以从不同图像 modalities（例如红外图和深度图）中受益

IR cameras（infrared，红外摄像头） are insensitive to electronic displays and can prevent attacks from phones and tablets（平板电脑）, while depth channel（深度通道） makes it easier to distinguish flat printed surfaces from face shapes.

本文，作者旨在解决 face-anti-spoofing 问题，

什么是多模态机器学习？

2 Related Work

生物识别安全系统中，face liveness detection 可以分为如下两种类型

cooperative liveness detection：requires interaction with user in the form of certain actions
non-cooperative liveness detection：aimed at detecting liveness from just a single image of a person

3 Advantages / Contributions

1st in the Chalearn LAP multi-modal（RGB-IR-Depth） face anti-spoofing attack detection challenge
用 face recognition 和 gender class 的数据集做预训练，然后 ensemble 在一起
在打比赛数据集【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》提出的 baseline 方法的基础上，引入了 Multi-level feature aggregation 模块，使模型不仅能在 fine 水平上而且能在 coarse 水平上找到不同模态之间的相关性

4 Datasets

1）CASI-SURF

来自论文：【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》

the largest multi-modal anti-spoofing dataset，每个人 6 种攻击方式（3 train 3 test），每张图片有 RGB / IR / Depth 三种

Chalearn LAP challenge 中抽取了 CASI-SURF 部分图片，30K frames for training and 9.6K frames for validation，如下所示
在这里插入图片描述

注意，训练和测试集的攻击方式不一样

2）Evaluation metrics

用的是 ROC curve 评价指标，具体为

True Positive Rate（TPR） at some fixed False Positive Rate（FPR）

【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》一文中有更加详细的描述！

This approach enables to measure how many real samples will pass the anti-spoofing test while accepting no more than some percentage of spoofing attacks.

本文的评价指标为 TPR at $10^{-4}$ FPR

Despite being significantly larger than previous anti-spoofing datasets, CASIA-SURF is still orders of magnitude smaller compared to standard datasets for face recognition

因此作者在 4 个 face attribute / identity recognition 数据集上预训练（provides rich face-specific features），然后再在 CASIA-SURF 上 train，最后 ensemble

5 Method

5.1 Attack specific folds

训练集划分的方式，从泛化性角度出发的

把训练集的三种攻击（eg：A,B,C），两个组队（eg：A,B）训练，剩下的那组（eg：C）作为验证

这样同一个网络结构可以训练出三个 model（AB，AC，BC），然后 simply averaging their prediction scores

4.2 Transfer learning

在这里插入图片描述
先用 face recognition 和 gender classification 数据集对网络进行预训练，然后再在 CASI-SURF 数据集上 train

4.3. Model architecture

在这里插入图片描述
从结构上看，仅红框部分是作者引入进来的，对比 baseline 方法（【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》）

aggregation blocks ——Multi-level feature aggregation

making model capable of finding inter-modal correlations not only at a fine level but also at a coarse one

2×3×4 = 24 neural network

训练模型时 2 种 initial random seeds，3 种 training data（attack-specific folds），4 种 pretrained models，最后 liveness 得分来自 24 种网络的平均

5 Experiments

在这里插入图片描述
1）Baseline

Table 3 第一行，哈哈，比较弱对吧！baseline 在 TPR at FPR= $10^{-2}$ 下还行的，有 96.7%（【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》），baseline 方法用的是 resnet-18

这里是作者复现的，训练时采用了 5-fold cross-validation 策略！

按照 subject（人）来分的 5-fold

2） Attack-specific folds

表 3 中的 3~4 行，performance from 74.55 to 78.89

区别是，作者按照 attack 攻击方式来分，而不是 subject！

We explain this by the improved generalization to new attacks due to the training for different types of attacks.

3） Initialization matters

表 3 中的 4~6 行，在 face recognition 数据集上预训练原来这么猛，amazing，学到了

4）Multi-level feature aggregation

在这里插入图片描述
multi-level feature aggregation（MLFA）

表 3 中的 6~7 行，加了以后效果更好

5）Ensembling

表 3 中的 11 行，通关了，毕竟 ensemble 24 个模型，实时性还有提升空间

6）Solution stability

鲁棒性

在这里插入图片描述

可以看出，作者的方法还是蛮行的

7）Multi-modality

在这里插入图片描述
为了公平起见，

RGB + RGB + RGB
vs
IR + IR + IR
vs
Depth + Depth + Depth
vs
RGB + IR + Depth

深度图还是猛的

6 Conclusion（own）

readily by-passing human-level performance 轻松的超越人类水平
IR cameras（infrared，红外摄像头） are insensitive to electronic displays and can prevent attacks from phones and tablets（平板电脑）, while depth channel（深度通道） makes it easier to distinguish flat printed surfaces from face shapes.
IR 和 Depth 带来额外的信息，eg：light distribution，eye refle0ction，face surface
【数据集】：Replay-Attack, CASIA-FASD and SiW datasets contain still RGB images. MSUMFSD, Replay-Mobile and OULU-NPU provide video recordings of attacks from mobile devices
face recognition 和 gender class 上预训练，配合 ensemble，学到了
数据划分的方式（训练和测试的攻击不一样），能让网络更加的 generalize to unseen attacks