【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》

bryant_meng

已于 2023-02-09 10:20:44 修改

阅读量3.7k

点赞数 5

分类专栏： CNN / Transformer 文章标签： FAS 人脸识别计算机视觉人工智能

于 2020-10-29 23:23:53 首次发布

本文链接：https://blog.csdn.net/bryant_meng/article/details/109334761

版权

CNN / Transformer 专栏收录该内容

246 篇文章

订阅专栏

在这里插入图片描述

CVPR-2019

ChaLearn Face Anti-spoofing Attack Detection Challenge@CVPR2019 比赛中采用了该数据集

结束比赛时的前三名如下：

1st：【FAS-FRN】《Recognizing Multi-modal Face Spoofing with Face Recognition Networks》（人脸识别预训练 + 20+模型的 ensemble + Multi-level feature aggregation 模块）

2nd：【FaceBagNet】《FaceBagNet：Bag-of-local-features Model for Multi-modal Face Anti-spoofing》（patch input + erase fusion）

3rd：【FeatherNets】《FeatherNets：Convolutional Neural Networks as Light as Feather for Face Anti-spoofing》（ Streaming Module 和 ensemble + cascade 的 fusion 方式）

关于各自队伍信息可以参考 CVPR2019| 人脸防伪检测挑战赛-俄初创公司夺冠,中美企业位列二三(附论文代码及参赛模型解析)

文章目录

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Datasets
5 Method
- 5.1 Naive halfway fusion
- 5.2 Squeeze and excitation fusion
6 Experiments
7 Conclusion（own）

1 Background and Motivation

随着 CNN 技术的发展，人脸识别技术已经落地了，例如：phone unlock，access control，face payment，然而人脸识别系统很容易受到各种攻击，eg：print attack，video reply attack and 2D / 3D mask attack。因此 face presentation attack detection（PAD） 是确保面部识别系统处于安全状态的重要步骤
和可靠条件！

最近，PAD 算法的表现取得了不错的表现，成功部分要归因于 face anti-spoofing 数据集的建立！然而现有的 face anti-spoofing 数据集都像是开胃菜，和满汉全席般的 classification and face recognition 数据集无法媲美。

于是，作者制作了一个现有规模最大的 face anti-spoofing 数据集，subjects 之最（人），videos 之最，同时兼具 3 种模态（RGB / Depth / IR），来推动 face anti-spoofing 技术的发展，和其他开源的数据集对比如下

在这里插入图片描述
部分图片展示如下：

在这里插入图片描述
在此基础上，作者从实际的角度出发（更关心 false positive rate，FPR，也即把假的认为真的，这是最致命的），引入了 receiver operating characteristic（ROC，纵坐标 TPR，横坐标 FPR）curve 作为评价指标！

之前常用的评价指标如下

在这里插入图片描述

APCER：attack presentation classification error rate（Fake samples 的错误率）， $\frac{FP}{FP+TN}$
BPCER：bona fide presentation classification error rate（real samples 的错误率）， $\frac{FN}{TP+FN}$
ACER：average classification error rate（APCER 和 BPCER 的平均值）， $\frac{\frac{FP}{FP+TN} + \frac{FN}{TP+FN}}{2}$
HTER：half total error rate （真假人脸中各自被判断错的比例之和的一半，同 ACER）

2 Related Work

Datasets
Table 1 已经罗列的蛮详细了，现有数据集有两个 common limitation
- subjects 和 samples 有限，PAD algorithms 很容易在数据集上 overfit
- 大部分仅包含 RGB modality，面对 new types of PAs（3D and custom-made silicone masks）容易翻车
Methods
- 传统方法，利用 eye-blinking，context information，moving information，HSV and YCbCr color space，Fourier spectrum，score or feature level 的 Fusion methods
- CNN-based methods（二分类问题）

3 Advantages / Contributions

制作公开了一个 large-scale multi-modal（RGB / IR / Depth） datasets for face anti-spoofing——CASIA-SURF dataset
针对 CASIA-SURF 设计了一个多模态人脸活检的网络，并 conduct 了 extensive experiments

4 Datasets

CASIA-SURF：论文中官网的链接失效了

在这里插入图片描述

6 types of phone attacks，eg：cropping，bending the print paper and stand-off distance，具体如下

在这里插入图片描述

【FAS-FRN】《Recognizing Multi-modal Face Spoofing with Face Recognition Networks》文章种这个表总结的很到位，Surface 是纸张表面的扭曲情况，Eyes 是纸张的眼睛区域被 cut 掉，露出后面真人的眼睛，Nose 和 Mouth 同理

4.1 Acquisition details

在这里插入图片描述

摄像头：Intel RealSense SR300 camera
打印人脸的大小：A4
采集图像的大小：1280×720 for RGB，640×480 for Depth，IR and aligned images

4.2 Data preprocessing

把背景都去掉了，只保留人脸轮廓区域

在这里插入图片描述

图 4 第一列到第二列，用 Dlib——《Dlib-ml：A machine learning toolkit》工具检测人脸（矩形区域）

图 4 第二列到第三列，用 PRNet——《Joint 3d face reconstruction and dense alignment with position map regression network》，获取 accurate face area（face reconstruction area）

图 4 第三列到第四列，生成一个 mask

图 4 第四列到第五列，Dlib 生成的人脸矩形区域与 mask 结合，crop 出仅含人脸轮廓的区域

4.3 Statistics

在这里插入图片描述
作者从原始 video 中每 10 frames sample 一张，性别和年龄分布如 Figure 5 所示

在这里插入图片描述

所有人都是 Chinese

4.4 Evaluation protocol

1）Intra-testing

live faces and Attacks 4，5，6 用作 train

live faces and Attacks 1，2，3 用作 validation and testing

数据都采用的是 CASIA-SURF 中的数据

2）Cross-testing

用 CASIA-SURF 预训练，在其他数据集上 fine-tune 和测试

5 Method

5.1 Naive halfway fusion

三个模态先单飞，到一定的 network stage 后，再组团出道

缺点是：

However，direct concatenating these features cannot make full use of the characteristics between different modalities

5.2 Squeeze and excitation fusion

面对不同类型的 PAs，三种模态可以互补

RGB：rich appearance details
Depth：sensitive to the distance between the image plane and the corresponding face
IR：measure the amount of heat radiated from a face

作者借鉴【SENet】《Squeeze-and-Excitation Networks》来融合三个模态提取出来的信息，而不是简单的把三种模态提取出来的特征 concatenate 在一起

在这里插入图片描述

squeeze and excitation fusing 模块 performs modal-dependent feature re-weighting to select the more informative channel features while suppressing less useful features from each modality

6 Experiments

face region：112 × 112
data augmentation：random flipping，rotation，resizing，cropping and color distortion for data augmentation

6.1 Model analysis

在这里插入图片描述
halfway fusion 就是 Figure 6 去掉 SE fusion 模块，确实 SE fusion 带来的受益还是蛮多的

6.2 Dataset analysis

1）effect on the number of modalities

在这里插入图片描述
深度图单飞的话还是能发展的不错的，组团还是硬道理

2）effect on the number of subjects

As described in《Revisiting unreasonable effectiveness of data in deep learning era》（ICCV-2017）, there is a logarithmic relation between the amount of training data and the performance of deep neural network methods

作者试了试

在这里插入图片描述
Figure 7 展示的是不同 training subjects（50，100，200，300 人）的 ROC曲线，Figure 8 展示的是不同 training subjects 的 ACER（越小越好）

还是可以看出，在网络没有吃饱的时候，多吃点（data）还是蛮管用的，话说想看 log 关系是不是像我这样更直观，哈哈

在这里插入图片描述

6.3 Generalization capability

用 CASIA-SURF 预训练，然后去别的数据集上 fine-tune 测测效果来验证泛化性能

1）Siw dataset

用 CASIA-SURF 的 RGB 和 Depth 数据预训练 FAS-TD-SF（利用到了深度信息）模型，然后在 SiW（仅 RGB）数据集上 fine-tune 和测试

在这里插入图片描述
2）CASIA-MFSD dataset

在这里插入图片描述
HTER 是 half total error rate，真假人脸中各自被判断错的比例之和的一半

	1	0
1	TP	FP
0	FN	TN

$\frac{\frac{FP}{FP+TN} + \frac{FN}{TP+FN}}{2}$

参考活体检测评判标准HTER（half total error rate）解读

7 Conclusion（own）

CASIA：Institute of automation，Chinese Academy of Sciences
不多说，中国人的文章读起来还是比较琅琅上口滴
多模态输入，用 SE attention 配合 concatenate 融合，halfway fusion，总体来说还不错的样子
ROC 又补了补，时间久了，老忘了！注意，不同阈值对应着不同的 confusion matrix，然后对应上 ROC 曲线上的一个点（ROC及AUC计算方法及原理）