Class-Variant Margin Normalized Softmax Loss for Deep Face Recognition

最新推荐文章于 2024-01-18 16:06:49 发布

kormoie

最新推荐文章于 2024-01-18 16:06:49 发布

阅读量488

点赞数

文章标签：计算机视觉人工智能

原文链接：https://ieeexplore.ieee.org/document/9179805

版权

本文针对深度人脸识别中的类不平衡和softmax饱和问题，提出了一种新的损失函数——CVM softmax损失。该损失函数通过引入true-class margin和false-class margin，分别缓解类别不平衡和延缓softmax的早期饱和。实验表明，CVM损失函数在LFW、YTF和MegaFace等基准上提高了人脸识别的有效性。

摘要由CSDN通过智能技术生成

翻译一篇2020 TNN 的论文仅用于学习，因为是有道翻译的所以用了第一人称。

原文链接
题目：Class-Variant Margin Normalized Softmax Loss for Deep Face Recognition

深度人脸识别的class-variant Normalize Softmax损失（CVM归一化softmax损失）。

摘要

：

在深度人脸识别中，常用的softmax损失算法及其新提出的变异算法在提取判别特征的同时，还不能有效地处理训练过程中的类不平衡和softmax饱和问题。在本文中，为了解决这两个问题，我们提出了一种class-variant margin(CVM)标准化softmax损失，通过在特征向量和类权向量之间夹角的余弦空间中引入a true-class margin 和a false-class margin。The true-class margin 缓解了等级不平衡问题， the false-class margin延缓了softmax的早期个体饱和。新的损失函数在训练过程中计算复杂度增量可忽略，易于在常见的深度学习框架中实现。在LFW、YTF和megface三种协议上的实验验证了所提出的CVM损失函数的有效性。

In deep face recognition, the commonly used softmax loss and its newly proposed variations are not yet sufficiently effective to handle the class imbalance and softmax saturation issues during the training process while extracting discriminative features. In this brief, to address both issues, we propose a class-variant margin (CVM) normalized softmax loss, by introducing a true-class margin and a false-class margin into the cosine space of the angle between the feature vector and the class-weight vector. The true-class margin alleviates the class imbalance problem, and the false-class margin postpones the early individual saturation of softmax. With negligible computational complexity increment during training, the new loss function is easy to implement in the common deep learning frameworks. Comprehensive experiments on the LFW, YTF, and MegaFace protocols demonstrate the effectiveness of the proposed CVM loss function.

introduce：

在这里插入图片描述

卷积神经网络(cnn)已经被证明在许多计算机视觉任务中是有效的，如图像分类[1]，[2]，目标检测[3]，[4]，语义分割[5]，[6]，尤其是人脸识别[7]，[8]。如图1所示，首先需要对人脸图像进行预处理，即进行人脸检测、裁剪、对齐，然后将处理后的图像输入到CNN中提取特征进行识别。对于两个测试设置，即人脸验证[9]和人脸识别[10]，识别的关键是判别特征。要构建这样一个功能强大的特征提取器，关键是设计一个合适的损耗函数来监督网络参数的学习。因此，许多研究都集中在设计损耗函数上。

Convolutional neural networks (CNNs) have been proved to be effective in numerous computer vision tasks, such as image classification [1], [2], object detection [3], [4], semantic segmentation [5], [6], and, particularly, face recognition [7], [8]. As shown in Fig. 1, face images need to be preprocessed first, i.e., by face detection, cropping, and alignment, and then, the processed images will be inputted into a CNN to extract features for recognition.

For both testing settings, i.e., face verification [9] and face identification [10], the key for recognition is the discriminative feature.To construct such a powerful feature extractor, it is critical to design an appropriate loss function, which will supervise the learning of the network parameters. Hence, many studies have been focused on designing loss functions.

现有的损失函数大致可以分为两类:基于度量的和基于边际的。基于度量的损失函数通常基于度量学习，以同时使类内特征紧凑，类间特征彼此远离。常用的基于度量的损失包括对比损失[11]、三重损失[12]和中心损失[13]。然而，这些损失要么需要一个复杂的抽样策略，要么需要耗费时间训练。基于边际的损失函数通常在经典的softmax损失中增加一个边际，以使分离更加严格。典型的基于保证金的损失有L-softmax [8]， A-softmax [14]，CosFace[15]和ArcFace[16]，分别向角空间或余弦空间添加边距。

The existing loss functions can be largely divided into two categories: metric-based and margin-based. Metric-based loss functions are often based on metric learning to simultaneously make intraclass features compact and interclass features remote from each other. Popular metric-based losses include contrastive loss [11], triplet loss [12], and center loss [13]. However, these losses either need a complicated sampling strategy or demand a time-consuming training [11], [12]. Margin-based loss functions generally add a margin to the classical softmax loss to make the separation stricter. Typical margin-based losses are L-softmax [8], A-softmax [14], CosFace [15], and ArcFace [16], which add margins to the angular space or cosine space, respectively.

然而，有两个困难的问题消极影响cnn的训练的面孔识别:类别不平衡和softmax饱和。分类不平衡在人脸识别中很严重，因为在大多数训练数据集中，每个人的人脸图像数量变化很大。这个问题可能会导致训练网络倾向于训练数据中图像较多的类别，从而对深度特征学习产生偏倚。因此，有必要对不同的类别进行不同的处理，最近范围损失[17]和焦点损失[18]已经做出了一些成功的努力，以缓解等级不平衡的问题。焦点损失可以通过对分类良好的实例的交叉熵损失进行降权来集中在难实例的稀疏集上，而距离损失的目的是在一个小批内同时减小类内差和增大类间差。早期的softmax饱和是指softmax产生的短暂的梯度传播，这将阻碍随机梯度下降[19]的探索。为了缓解这一早期饱和问题，在[19]中提出了noise -softmax，它将噪声注入softmax损耗中。除了损失函数的改进，还有其他的工作来增强人脸识别，如可分离性和紧密性网络(SCNet)[20]，半监督稀疏表示分类(S3RC)[21]，以及特定的人脸数据集[22]。

However, there are two hard issues negatively affecting the training of CNNs for face recognition: class imbalance and softmax saturation. Class imbalance is severe in face recognition, as the number of face images per person varies greatly in most training data sets. This issue may cause the trained network to favor those categories that have more images in the training data and bias the deep feature learning. Hence, it is necessary to treat different categories differently, and some successful efforts have recently been made by the range loss [17] and the focal loss [18] to alleviate class imbalance problems. The focal loss can focus on the sparse set of hard examples through down-weighting the cross-entropy loss of well-classified examples, while the range loss aims to reduce the intraclass difference and enlarge the interclass difference simultaneously within a minibatch. The early softmax saturation refers to the short-lived gradient propagation that the softmax produces, which will impede the exploration of stochastic gradient descends [19]. To mitigate this early saturation problem, noisy-softmax was proposed in [19], which injects noise into the softmax loss. Besides improvements on loss functions, there have been other works to enhance face recognition, such as separability and compactness network (SCNet) [20], semisupervised sparse representation-based classification (S3RC) [21], and specific face data sets [22].

在本文中，我们的目标是提出一个简单而有效的损失函数，称为class-variant margin（CVM) softmax损失，以解决深度面部特征学习中的类不平衡和softmax饱和问题。更具体地说，我们在softmax损失余弦空间中引入两个margin函数来解决这两个问题。对于类不平衡问题，我们首先在feature vector和true class weight vector夹角的余弦上引入一个缩减的margin函数，我们称之为true-class margin，比如错误分类的实例可以获得更大的true-class margin，对网络优化有更大的贡献。对于softmax饱和问题，我们在feature vector和false class weight vector夹角的余弦上引入一个additive margin函数，我们称之为false-class margin，这样接近饱和的例子可以获得较大的false-class margin，从而延缓softmax饱和。

In this brief, we aim to propose a simple yet effective loss function called class-variant margin (CVM) softmax loss to address the class imbalance and softmax saturation problems for deep facial feature learning. More specifically, we introduce two margin functions into the cosine space of the softmax loss to address the two problems. For the class imbalance problem, we first introduce a reduced margin function to the cosine of the angle between the feature vector and true class weight vector, which we call a true-class margin, such that the misclassified examples can obtain a larger true-class margin to contribute more to the network optimization. For the softmax saturation problem, we introduce an additive margin function to the cosine of the angle between the feature vector and the false class weight vector, which we call a false-class margin, such that the examples near saturation can obtain a larger false-class margin to postpone the softmax saturation.

贡献点：

1，我们提出了一种新的损失称为CVM损失，它可以同时缓解cnn训练中的类不平衡和softmax饱和问题。

We propose a novel loss termed CVM loss that can simultaneously alleviate the class imbalance and softmax saturation problems in the training of CNNs.

2，所提出的CVM损耗可以很容易地在常见的CNN架构下实现，并通过标准SGD方法直接优化

The proposed CVM loss can be easily implemented under common CNN architectures and directly optimized by the standard SGD method

3，我们在公开的Casia-Webface数据集上训练我们的模型，并在三个流行的基准上验证它的有效性:LFW、YTF和megface。

The proposed CVM loss can be easily implemented under common CNN architectures and directly optimized by the standard SGD method

CVM softmax loss:

在这里插入图片描述

人脸识别中存在的两个问题:

1,类别不平衡:在大多数人脸识别训练数据集中，类别不平衡都比较严重。例如，在比较流行的Casia-Webface数据集中，绘制出的人均图像数量曲线如图2所示，可以观察到明显的长尾分布。事实上，实证实验和分析表明，样本越多的类对特征学习[17]的影响越大。因此，如何有效地处理不平衡数据，提高人脸识别中的特征识别能力是一个关键问题。

Class Imbalance: Class imbalance is severe in most training data sets for face recognition, e.g., for the popular Casia-Webface data set, the curve for the number of images per person is plotted in Fig. 2, where a clear long-tail distribution can be observed. In fact, it has been shown from empirical experiments and analysis that the classes with more samples will have a greater impact on the feature learning [17]. Hence, it is a critical issue to effectively handle the imbalanced data for improving feature discrimination in face recognition.

2,Softmax饱和:Softmax损耗是分类应用中常用的损耗。然而，正如[19]正确地指出的那样，softmax函数受到早期个体饱和的影响.在图3的曲线中，早期饱和个体(其输出分数已经接近1)实际上对之后反向传播过程中的梯度更新贡献不大。因此，为了充分利用这些个体的信息，最好推迟个体的早期饱和。

Softmax Saturation: The softmax loss is commonly applied in classification applications. However, as rightly pointed out by [19], the softmax function suffers from an early individual saturation . For illustration, here, we consider its use in a two-class classification scenario of sample

在这里插入图片描述

Class-Variant Margin Softmax Loss:

在本文中，我们提出了一种新的简单而有效的损失函数CVM softmax损失来解决类不平衡和softmax饱和问题。原始归一化softmax loss (NLS)为

在这里插入图片描述

以下是CVM损失函数的两个基本原理。

1,由于training被majority classes所主导，而大量minority classes(图2中的长尾上的人)不幸被淹没，因此新的loss function预计会加强tail数据在training过程中的影响。边界特征θy分布在90°表示hard sample。这些特性是确保类内和类间变化的关键。因此，为了增强这些点在网络训练中的影响，当θy 大约 90°，我们对feature vector和true class weight vector之间的夹角余弦值应用更大的margin。当真正的类的角度大于90◦，使用smaller margin，因为这些训练样本可能是离群值。因此，我们构造true-class margin函数h(θ)(见图4)应用于feature vector和true class weight vector之间夹角的余弦。（我的理解：在训练过程，夹角为90°，标签的不确定性大，应该使用更大的margin进行划分。大于90°的样本可能极端值，可以减小margin。）

在这里插入图片描述

2，由于softmax损耗的早期个体饱和导致了短暂的梯度传播，这不利于网络的泛化和鲁棒学习，因此新的损耗函数应该能够推迟早期饱和。例如，当j = yi时，增大softmax输入fj。为了解决软最大饱和度问题，我们构造了伪类边界函数g(θ)来延迟早期个体饱和度。当属于第j类特征向量的置信度较低时，比如j/=yi的角度分布接近180°，特征向量易于正确分类，接近softmax饱和带。因此，我们在这样一个角度的余弦上增加一个更大的边距(见图4)以保持有效的梯度传播。

结合the true-class margin function and false-class margin function，提出CVM损失为

在这里插入图片描述

其中下标j和yi表示C类中的j类和yi类;N为小批数量;S是比例因子; fj(简称fj(xi))是xi的softmax输入的第j个元素;θj(简写为θj(xi))是第i类feature vector xi与第j类weight vector之间的夹角。H (θyi)是应用于feature vector 和 the true class weight vector夹角余弦上的边界函数，称为the true-class margin;G (θj)是e feature vector 和 false class weight vector夹角余弦上加的边距函数，称为false-class margin;M1和m2是两个预设超参数;M1表示true-class margin的上界，M2表示false-class margin的上界。

图4中，真类边界h(θyi)和假类边界g(θj)分别为角θyi和θj的非线性映射。我们设计这两个margin函数是为了缓解阶级不平衡问题，推迟早期个体饱和，具体如下。

discussion

margin:这里，我们以两类分类为例，列出了几种常用的损失函数的决策边界和margin。如表一所示，在人脸识别中有一些常用的损失函数。softmax损耗有助于卷积神经网络快速收敛，但不能保证提取的特征具有很强的识别力。为了提高人脸验证和人脸识别的准确性，CosFace和ArcFace分别在余弦空间和角空间上应用了恒定的margin，使特征更具识别力。这两种方法虽然在一定的留距下可以扩大类间距离，但对所有类都适用相同的留距，而没有考虑到类间差异。本文提出的CVM NLS通过构造真类margin函数和假类margin函数，将类变量margin应用到NLS中。

Margins: Here, we take the two-class classification as an example and list the decision boundary and margins of several popular loss functions. As shown in Table I, there have been some popular loss functions commonly used in face recognition. The softmax loss helps the convolution neural network quickly converge, but it cannot ensure the extracted features very discriminative. To improve the accuracy of face verification and face identification, CosFace and ArcFace apply a constant margin to cosine and angular spaces, respectively, to make the feature more discriminative. Although, with the constant margin, the interclass distance can be enlarged, these two methods apply the same margin to all classes but do not take the class discrepancy into consideration. Here, the proposed CVM NLS applies a class-variable margin to the NLS by constructing true-class margin function and false-class margin function.

我们还在图5的余弦空间中说明了其中的一些，其中蓝色区域代表类1，而红色区域属于类2。我们可以观察到NLS的决策裕度(见图5(a))为零，这使得损失函数对于决策边界周围的特征不是很稳健。
We also illustrate some of them in the cosine space in Fig. 5, in which the blue areas represent class 1, while the red areas belong to class 2. We can observe that the decision margin of the NLS [see Fig. 5(a)] is zero, making the loss function not very robust for the features around the decision boundary.

在这里插入图片描述

为了说明这两项(真类边距和假类边距)的有效性，我们首先将真类边距应用于特征向量和真类权向量之间夹角的余弦。少数类样本通常分布在类边界附近，导致余弦值较小。因此，在两个余弦值都很小的情况下，通过应用真类裕度引入更大的裕度，我们可以使那些罕见样本的特征更具鉴别性，如图5(b)所示。然后，对于早期个体饱和问题，我们在特征向量和伪类权向量夹角的余弦上引入伪类边界。这种方法，如图5©所示，使我们能够扩大那些接近饱和状态的样品的边界，从而推迟饱和过程。最后，结合真类裕度和假类裕度得到CVM损失的最终决策裕度，既解决了类别不平衡问题，又解决了早期饱和问题，如图5(d)所示。
To illustrate the effectiveness of the two terms (true-class margin and false-class margin), we first apply the true-class margin to the cosine of the angle between the feature vector and the true class weight vector. The samples of minority class are often distributed near the class boundary, which leads to small cosine values. Therefore, as larger margins are introduced by applying the true-class margin when both cosine values are small, we manage to make the features of those rare samples more discriminative, as shown in Fig. 5(b). Then, for the early individual saturation problem, we introduce the false-class margin to the cosine of the angle between the feature vector and the false class weight vector. This approach, as shown in Fig. 5©, enables us to enlarge the margins for those samples near the saturation status and, thus, to postpone the saturation process. Finally, we combine the true-class margin and the false-class margin to attain the final decision margin of our CVM loss, which addresses both the class imbalance problem and the early saturation problem, as shown in Fig. 5(d).

为了进一步理解在几何视图中CVM的损失，我们还绘制了在超球体中添加CVM的图，如图6所示。
To understand the CVM loss even further in a geometric view, we also draw an illustration of adding CVMs in the hypersphere, as shown in Fig. 6.

在图6(左)中，原始特征向量x1和x2都属于第一类，x1和W1之间的夹角大于x2和W1之间的夹角。也就是说，很难对样本x1进行分类。
In Fig. 6(left), the original feature vectors x1 and x2 both belong to class 1, and the angle between x1 and W1 is larger than the angle between x2 and W1. That is, it is harder to classify the sample x1.

在这里插入图片描述

因此，我们对角度 $\theta_{x1，1}$ 的余弦应用一个更大的True-class margin , $h(\theta_{x1，1})$ 和 $h(\theta_{x2，1})$ ，将原特征向量x1和x2转换为新的特征向量 $x_1^{'}$ 和 $x_2^{'}$ ，以加强样本x1对训练的影响。几何上，通过两个不同的真类边距，.我们增强了x1的影响，提取了更多的判别特征