人脸识别中Softmax-based Loss的演化史
ArcFace: Additive Angular Margin Loss for Deep Face Recognition论文
1、提出问题yellow |
1、需要特征的 d i s c r i m n a t i o n discrimnation discrimnation
2、之前提出的一些方法,如triplet loss,center loss, L-Softmax, A-Softmax, cosface, 但是都有缺陷。
2、之前主要方法的缺陷yellow |
1、Center Loss
2、Triplet loss
3、计算复杂度高,需要辅助函数(保证单调性),以及计算cos(m*theta),并且训练很不稳定, 难收敛,需要很多策略,比如一开始要和softmax的loss做插值。
- 用深度学习提取特征的主要挑战是设计好的具有区分性的损失函数
- 最近也有一些方法解决这个问题如center loss, sphereface等
- 提出arcface来获得高度具有区分性的特征用作人脸识别,并且有着清晰的几何解释
- 在很多人脸识别的任务上取得很好的效果
1、提出问题yellow |
- 用DCNN来学习人脸表示,把人脸图片映射到特征空间,使得类内距离小类间距离大
- 一般两条主线方法,代表分别为softmax loss和triplet loss,但是都有一些缺陷
- 一些方法用来加强特征的区分性,比如center loss,但是也有不足
- Sphereface 和cosface 的优点与缺点 5. 提出arcface,并给出它的算法框架,并且总结了其优势
Face recognition的DCNN训练主要有两种方法和他们的缺点 |
1、The Softmax Classifer
缺点如下:(1) 线性变换矩阵的大小W∈Rd×n随着身份数的增加而线性增加n;
(2) 学习到的特征对于封闭集的分类问题是可分离的的分类问题是可分离的,但对于开放性的人脸识别问题来说,却没有足够的鉴别力。脸部识别的问题。softmax损失函数并没有明确地对特征嵌入进行优化。特征嵌入,对类内样本执行更高的相似性,对类间样本执行更高的多样性。这导致了在大的类内外观变化(如姿势变化)下,深度人脸识别的性能差距。大的类内外观变化(例如:姿势变化 [28, 44]和年龄差距[19, 45])和大规模测试场景下的性能差距。(例如,百万[12, 37, 18]或万亿对[1])。
2、The Triplet Loss
Softmax Loss的一些变体,以提高softmax损失的判别能力。 |
1.the centre loss
(2)、Nevertheless, updating the actual centres during training is extremely difficult as the number of faceclasses available for training has recently dramatically increased.
θ θ θ乘以决策余量 m m m,进行权重归一化,并将偏置项归零( ∣ ∣ W i ∣ ∣ = 1 , b i = 0 ||W_i||=1,b_i=0 ∣∣Wi∣∣=1,bi=0)
CosFace [35, 33] directly adds cosine margin penalty to the target logit, which obtains better performance compared to SphereFace but admits much easier implementation and relieves the need for joint supervision from the softmax loss.
性能:further improve the discriminative power of the face recognition model and to stabilise the trainingprocess。
论文中详细讲述的关于 x i x_i xi和 w w w细节如下:
For simplicity, we fix the bias b j = 0 b_j = 0 bj=0 as in [15]. Then,we transform the logit [24] as W j T x i = ∣ ∣ W j ∣ ∣ ∣ ∣ x i ∣ ∣ c o s θ j W_j^T x_i = ||W_j|| ||x_i|| cosθ_j WjTxi=∣∣Wj∣∣∣∣xi∣∣cosθj ,where θ j θ_j θj is the angle between the weight W j W_j Wj and the feature x i x_i xi. Following [15, 35, 34], we fix the individual weight W j W_j Wj = 1 by L 2 L_2 L2 normalisation. Following [26, 35, 34, 33],we also fix the embedding feature x i x_i xi by L 2 L_2 L2 normalisation and re-scale it to s s s. The normalisation step on features and weights makes the predictions only depend on the angle between the feature and the weight. The learned embedding features are thus distributed on a hypersphere with a radius of s s s.
关于 x i x_i xi和 w w w处理公式如下:1、 x i − − > x i ∣ ∣ x i ∣ ∣ − − > s x i ∣ ∣ x i ∣ ∣ x_i --> \frac{x_i}{||x_i||}-->s\frac{x_i}{||x_i||} xi−−>∣∣xi∣∣xi−−>s∣∣xi∣∣xi
2、 ∣ ∣ s x i ∣ ∣ x i ∣ ∣ ∣ ∣ = s ∣ ∣ x i ∣ ∣ ∣ ∣ x i ∣ ∣ = s ||s\frac{x_i}{||x_i||}|| = \frac{s}{||x_i||}||x_i|| = s ∣∣s∣∣xi∣∣xi∣∣=∣∣xi∣∣s∣∣xi∣∣=s
3、 x i T w y i = s ∗ c o s θ j x_i^T w_{y_i} = s*cosθ_j xiTwyi=s∗cosθj
L 2 = − 1 N ∑ i = 1 N l o g e s ∗ c o s ( θ y j + m ) e s ( c o s ( θ y i + m ) ) + ∑ j = 1 , j ≠ y i n e s ∗ c o s ( θ j ) L_2 = -\frac{1}{N}\sum_{i=1}^Nlog\frac{e^{s*cos(θ_{y_j + m})}}{e^{s(cos(θ_{y_i}+m))}+ \sum_{j=1,j\neq y_i}^n e^{s*cos(θ_j)}} L2=−N1∑i=1Nloges(cos(θyi+m))+∑j=1,j=yines∗cos(θj)es∗cos(θyj+m)
(三)、Arcface Loss代码详解
Arcface为什么很容易实现? |
Easy: ArcFace only needs several lines of code as given in Algorithm 1 and is extremely easy to implement in thecomputational-graph-based deep learning frameworks, e.g. MxNet [5], Pytorch [23] and Tensorflow [2].
ArcfaceMarginProduct官方源码 |
class ArcMarginProduct(nn.Module):
r"""Implement of large margin arc distance: :
in_features: size of each input sample
out_features: size of each output sample
s: norm of input feature
m: margin
cos(theta + m)
def __init__(self, in_features, out_features, s=30.0, m=0.50, easy_margin=False):
super(ArcMarginProduct, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.s = s
self.m = m
self.weight = Parameter(torch.FloatTensor(out_features, in_features))
self.easy_margin = easy_margin
self.cos_m = math.cos(m)
self.sin_m = math.sin(m)
self.th = math.cos(math.pi - m)
self.mm = math.sin(math.pi - m) * m
def forward(self, input, label):
# --------------------------- cos(theta) & phi(theta) ---------------------------
# 对feature x 和权重w 分别归一化后,做点乘,也就是全连接操作,得到的就是cosθ
cosine = F.linear(F.normalize(input), F.normalize(self.weight))
# 求sinθ
sine = torch.sqrt((1.0 - torch.pow(cosine, 2)).clamp(0, 1))
# cos(θ+m) = cosθ*cosm - sinθ*sinm
phi = cosine * self.cos_m - sine * self.sin_m
if self.easy_margin:
phi = torch.where(cosine > 0, phi, cosine)
phi = torch.where(cosine > self.th, phi, cosine - self.mm)
# --------------------------- convert label to one-hot ---------------------------
# one_hot = torch.zeros(cosine.size(), requires_grad=True, device='cuda')
one_hot = torch.zeros(cosine.size(), device='cuda')
one_hot.scatter_(1, label.view(-1, 1).long(), 1)
# -------------torch.where(out_i = {x_i if condition_i else y_i) -------------
output = (one_hot * phi) + ((1.0 - one_hot) * cosine) # you can use torch.where if your torch.__version__ is 0.4
output *= self.s
# print(output)
return output
ArcfaceMarginProduct官方源码我在下图中做了详细的推导,方便理解。 |
class Arcface_Head(Module):
def __init__(self, embedding_size=128, num_classes=10575, s=64., m=0.5):
super(Arcface_Head, self).__init__()
self.s = s
self.m = m
self.weight = Parameter(torch.FloatTensor(num_classes, embedding_size))
self.cos_m = math.cos(m)
self.sin_m = math.sin(m)
self.th = math.cos(math.pi - m)
self.mm = math.sin(math.pi - m) * m
def forward(self, input, label):
cosine = F.linear(input, F.normalize(self.weight)) # 归一化后的x与w相乘就是cos(角度)
sine = torch.sqrt((1.0 - torch.pow(cosine, 2)).clamp(0, 1))
phi = cosine * self.cos_m - sine * self.sin_m # 等于cos(角度+m)
# torch.where(a>0,a,b) # 满足条件返回a, 不满足条件返回b
phi = torch.where(cosine.float() > self.th, phi.float(), cosine.float() - self.mm)
one_hot = torch.zeros(cosine.size()).type_as(phi).long()
one_hot.scatter_(1, label.view(-1, 1).long(), 1)
output = (one_hot * phi) + ((1.0 - one_hot) * cosine)
output *= self.s
return output