CVPR-2019 workshop
code:https://github.com/AlexanderParkin/ChaLearn_liveness_challenge
文章目录
1 Background and Motivation
人脸识别落地应用中,face anti-spoofing algorithms(反欺骗算法) 的重要性不言而喻!
虽然 face anti-spoofing 大型数据集不像 face recognition 那么好制作(花式攻击),但 anti-spoofing algorithms 可以从不同图像 modalities(例如红外图和深度图)中受益
IR cameras(infrared,红外摄像头) are insensitive to electronic displays and can prevent attacks from phones and tablets(平板电脑), while depth channel(深度通道) makes it easier to distinguish flat printed surfaces from face shapes.
本文,作者旨在解决 face-anti-spoofing 问题,
2 Related Work
生物识别安全系统中,face liveness detection 可以分为如下两种类型
-
cooperative liveness detection:requires interaction with user in the form of certain actions
-
non-cooperative liveness detection:aimed at detecting liveness from just a single image of a person
3 Advantages / Contributions
-
1st in the Chalearn LAP multi-modal(RGB-IR-Depth) face anti-spoofing attack detection challenge
-
用 face recognition 和 gender class 的数据集做预训练,然后 ensemble 在一起
-
在打比赛数据集 【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》 提出的 baseline 方法的基础上,引入了 Multi-level feature aggregation 模块,使模型不仅能在 fine 水平上而且能在 coarse 水平上找到不同模态之间的相关性
4 Datasets
1)CASI-SURF
来自论文:【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》
the largest multi-modal anti-spoofing dataset,每个人 6 种攻击方式(3 train 3 test),每张图片有 RGB / IR / Depth 三种
Chalearn LAP challenge 中抽取了 CASI-SURF 部分图片,30K frames for training and 9.6K frames for validation,如下所示
注意,训练和测试集的攻击方式不一样
2)Evaluation metrics
用的是 ROC curve 评价指标,具体为
True Positive Rate(TPR) at some fixed False Positive Rate(FPR)
【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》 一文中有更加详细的描述!
This approach enables to measure how many real samples will pass the anti-spoofing test while accepting no more than some percentage of spoofing attacks.
本文的评价指标为 TPR at 1 0 − 4 10^{-4} 10−4 FPR
Despite being significantly larger than previous anti-spoofing datasets, CASIA-SURF is still orders of magnitude smaller compared to standard datasets for face recognition
因此作者在 4 个 face attribute / identity recognition 数据集上预训练(provides rich face-specific features),然后再在 CASIA-SURF 上 train,最后 ensemble
5 Method
5.1 Attack specific folds
训练集划分的方式,从泛化性角度出发的
把训练集的三种攻击(eg:A,B,C),两个组队(eg:A,B)训练,剩下的那组(eg:C)作为验证
这样同一个网络结构可以训练出三个 model(AB,AC,BC),然后 simply averaging their prediction scores
4.2 Transfer learning
先用 face recognition 和 gender classification 数据集对网络进行预训练,然后再在 CASI-SURF 数据集上 train
4.3. Model architecture
从结构上看,仅红框部分是作者引入进来的,对比 baseline 方法(【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》)
aggregation blocks ——Multi-level feature aggregation
making model capable of finding inter-modal correlations not only at a fine level but also at a coarse one
2×3×4 = 24 neural network
训练模型时 2 种 initial random seeds,3 种 training data(attack-specific folds),4 种 pretrained models,最后 liveness 得分来自 24 种网络的平均
5 Experiments
1)Baseline
Table 3 第一行,哈哈,比较弱对吧!baseline 在 TPR at FPR= 1 0 − 2 10^{-2} 10−2 下还行的,有 96.7%(【CASIA-SURF】《A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing》),baseline 方法用的是 resnet-18
这里是作者复现的,训练时采用了 5-fold cross-validation 策略!
按照 subject(人)来分的 5-fold
2) Attack-specific folds
表 3 中的 3~4 行,performance from 74.55 to 78.89
区别是,作者按照 attack 攻击方式来分,而不是 subject!
We explain this by the improved generalization to new attacks due to the training for different types of attacks.
3) Initialization matters
表 3 中的 4~6 行,在 face recognition 数据集上预训练原来这么猛,amazing,学到了
4)Multi-level feature aggregation
multi-level feature aggregation(MLFA)
表 3 中的 6~7 行,加了以后效果更好
5)Ensembling
表 3 中的 11 行,通关了,毕竟 ensemble 24 个模型,实时性还有提升空间
6)Solution stability
鲁棒性
可以看出,作者的方法还是蛮行的
7)Multi-modality
为了公平起见,
RGB + RGB + RGB
vs
IR + IR + IR
vs
Depth + Depth + Depth
vs
RGB + IR + Depth
深度图还是猛的
6 Conclusion(own)
- readily by-passing human-level performance 轻松的超越人类水平
- IR cameras(infrared,红外摄像头) are insensitive to electronic displays and can prevent attacks from phones and tablets(平板电脑), while depth channel(深度通道) makes it easier to distinguish flat printed surfaces from face shapes.
- IR 和 Depth 带来额外的信息,eg:light distribution,eye refle0ction,face surface
- 【 数据集】:Replay-Attack, CASIA-FASD and SiW datasets contain still RGB images. MSUMFSD, Replay-Mobile and OULU-NPU provide video recordings of attacks from mobile devices
- face recognition 和 gender class 上预训练,配合 ensemble,学到了
- 数据划分的方式(训练和测试的攻击不一样),能让网络更加的 generalize to unseen attacks