Towards Universal Representation Learning for Deep Face Recognition
论文地址:https://arxiv.org/pdf/2002.11841.pdf
这篇论文出自美国NEC研究院。文章暂时没时间看,大致过了一遍。其中论文核心思想是:
1.传统的人脸识别需要目通过高质量的原始数据来“推导”低质量图像的识别
2.集成式模型通过集成多个模型的不同分布,来提升模型的识别率
3.本论文提出通过直接在原始数据进行处理,避免训练目标域与test域的不同导致精度损失。
Abstract
文章提出了目前的人脸识别存在的问题:
Recognizing wild faces is extremely hard as they appear with all kinds of variations. Traditional methods either train with specifically annotated variation data from target domains, or by introducing unlabeled target variation data to adapt from the training data。
通过将目标域与训练域尽可能接近。
为了去掉存在的问题,文章进一步提出了:propose a universal representation learning framework that can deal with larger variation unseen in the given training data without leveraging target domain knowledge。
其中所谓的framework 包含some semantically meaningful variations, such as low-resolution, occlusion and head pose
但是直接训练augmented data for training,则不会得到很好的训练结果 ,主要是这些augmented data 为hard examples。为了解决此问题:
再次提出
- split the feature embedding into multiple sub-embeddings, and associate different confidence values for each sub-embedding to smooth the training procedure。
- The sub-embeddings are further decorrelated by regularizing variation classification loss and variation adversarial loss on different partitions of them
Introduction
现在的人脸识别算法,通用做法:map input images to a feature space with small intra-identity distance and large inter-identity distance,
公开的数据集存在种族数目不均衡等large public datasets such as MS-Celeb-1M manifest strong biases, such as ethnicity。这些会导致数据在不同的数据集(目标域)导致accuracy大幅度下降。
为了减弱这些问题,学者进一步提出的方法:
- by identifying relevant factors of variation and augmenting datasets to incorporate them through domain adaptation method;但这些 variations are hard to identify,所以被通常用于 align features between training and test domains。
- individual models on various datasets and ensembled
所有的上述处理方法均存在either only handle specific variations, or require access to test data distributions, or accrue additional runtime complexity to handle wider variations。并最终提出了propose learning a single“universal” deep feature representation that handles the variations in face recognition
Proposed Approach
- Confidence-aware Identification Loss
将公式进行贝叶斯整理得到
再进行L2-norm之后:
如果只采用公式5,在样本中会产生:learned prototype will be in
the center of all samples。
为了让不同 sample有不同的confience,且能够使得stronger push for
low-quality fi to be closer to the prototype
最后的Loss为
采用此LOSS相比COS loss的优点是:
- Confidence-aware Sub-Embeddings
文章认为,Though the embedding filearned through a samplespecific gating si can deal with sample-level variations, we argue that the correlation among the entries of fiit self is still high即中间所有的特征也存在一定的相关性,为了减少这个相关性提出了:
Accordingly, the prototype vector wj and the confidence scalar si are also partitioned into the same size
K groups.
同是为了减少过拟合采用L2正则
-
. Sub-Embeddings Decorrelation
如果采用上述的sub-embeddings,会出现 :does not guarantee the features in different groups are learning complementary information。如下图所示
采用的解决方式如下:
-
. ** Mining More Variations**
to introduce more variations for better generalization ability, we aim to explore more variations with semantic meaning.
Uncertainty-Guided Probabilistic Aggregation
为了解决Considering the metric for inference, simply taking the average of the learned sub-embeddings is sub-optimal.
Experiments
reference
- https://arxiv.org/pdf/2002.11841.pdf