记录一下移植人脸识别arcface loss遇到的问题

还没入门的炼丹师

已于 2022-06-27 17:35:14 修改

阅读量1.2k

点赞数 1

文章标签：大数据计算机视觉

于 2022-06-27 11:59:54 首次发布

本文链接：https://blog.csdn.net/xyy0112/article/details/125480503

版权

一、arcface loss是什么？

论文链接：https://arxiv.org/pdf/1801.07698.pdf

作者开源代码：https://github.com/deepinsight/insightface

人脸识别有两条研究主线，一种是把他当成分类问题，在训练集上采用softmax损失函数训练，另一种是直接在度量空间学习，比如triplet loss。然而这两种方法都有缺陷，对于softmax损失：(1)最后一个全连接层的权重随着训练集身份的增多而线性增加，(2)学习的特征对于闭集问题(也就是训练集)是分开的，但是对于开集(测试集，与训练集身份没有交集)的人脸识别，学习到的特征判别性不够强。对于triplet loss：(1)对于大型数据集，triplet数量爆炸式增长，导致迭代次数急剧增长（2）对于有效的模型训练，semi-hard sample挖掘是一个相对困难的问题。

Additive Angular MarginLoss (ArcFace)进一步提高人脸识别模型的判别能力，以及提升了训练的稳定度(之前的A-softmax训练为了收敛联合softmax一起训练)，如下图，在最后一个全连接层的特征和权重归一化后，他们的点积等于余弦距离。可以先通过反余弦函数计算特征与权重向量的角度，然后在这个角度上加上一个Margin。

话不多说，直接上公式：

上式中x为最后面fc层的输出（没有经过softmax层）w为到fc层的权重。

二、移植代码

对于arcface loss的数学理论层面的理解可以去看原论文，能理解最好，但是不能理解没有关系，能用上就可以了，只要知道论文中大量实验表明他是有效的即可。那么下面就是怎么去用了。最近想把这个loss移植到行人重识别的torchreid框架中去。遇到的问题，特此记录。

首先，我也是上网找别人写好的代码直接来用的。我用的代码如下：

import math
import torch
import torch.nn as nn
import torch.nn.functional as F


class ArcFaceLoss(nn.Module):
    r"""Implement of large margin arc distance: :
        Args:
            in_features: size of each input sample
            out_features: size of each output sample
            s: norm of input feature
            m: margin

            cos(theta + m)
        """

    def __init__(self, in_features, num_classes, s=30.0, m=0.50, easy_margin=False, use_gpu=True):
        super(ArcFaceLoss, self).__init__()
        self.in_features = in_features
        self.out_features = num_classes
        self.s = s
        self.m = m
        self.use_gpu = use_gpu
        self.logsoftmax = nn.LogSoftmax(dim=1)
        # Parameter 的用途：
        # 将一个不可训练的类型Tensor转换成可以训练的类型parameter
        # 并将这个parameter绑定到这个module里面
        # net.parameter()中就有这个绑定的parameter，所以在参数优化的时候可以进行优化的
        # https://www.jianshu.com/p/d8b77cc02410
        # 初始化权重
       
        self.weight = nn.Parameter(torch.randn(num_classes, in_features))
        # self.weight = Parameter(torch.FloatTensor(out_features, in_features))
        nn.init.xavier_uniform_(self.weight)

        self.easy_margin = easy_margin
        self.cos_m = math.cos(m)
        self.sin_m = math.sin(m)
        self.th = math.cos(math.pi - m)
        self.mm = math.sin(math.pi - m) * m

    def forward(self, input, label):
        # --------------------------- cos(theta) & phi(theta) ---------------------------
        # torch.nn.functional.linear(input, weight, bias=None)
        # y=x*W^T+b
        cosine = F.linear(F.normalize(input), F.normalize(self.weight))
        sine = torch.sqrt(1.0 - torch.pow(cosine, 2))
        # cos(a+b)=cos(a)*cos(b)-size(a)*sin(b)
        phi = cosine * self.cos_m - sine * self.sin_m
        if self.easy_margin:
            # torch.where(condition, x, y) → Tensor
            # condition (ByteTensor) – When True (nonzero), yield x, otherwise yield y
            # x (Tensor) – values selected at indices where condition is True
            # y (Tensor) – values selected at indices where condition is False
            # return:
            # A tensor of shape equal to the broadcasted shape of condition, x, y
            # cosine>0 means two class is similar, thus use the phi which make it
            phi = torch.where(cosine > 0, phi, cosine)
        else:
            phi = torch.where(cosine > self.th, phi, cosine - self.mm)
        # --------------------------- convert label to one-hot ---------------------------
        # one_hot = torch.zeros(cosine.size(), requires_grad=True, device='cuda')
        # 将cos(\theta + m)更新到tensor相应的位置中
        one_hot = torch.zeros(cosine.size(), device='cuda')
        # scatter_(dim, index, src)
        one_hot.scatter_(1, label.view(-1, 1).long(), 1)
        # -------------torch.where(out_i = {x_i if condition_i else y_i) -------------
        output = (one_hot * phi) + ((1.0 - one_hot) * cosine)
        # you can use torch.where if your torch.__version__ is 0.4
        output *= self.s
        return output

需要说的有两点把其中一点就是

self.weight = nn.Parameter(torch.randn(num_classes, in_features))

这个weight是需要放进网络训练的，因此需要传进GPU里，不然可能会报错说CPU跑不了。改成下面的形式即可：

        if self.use_gpu:
            self.weight = nn.Parameter(torch.randn(num_classes, in_features).cuda())
        else:
            self.weight = nn.Parameter(torch.randn(num_classes, in_features))

第二点就是需要注意一下返回的数据类型，output是一个tensor，并不是最后得到的loss值。它需要经过softmax层然后按照cross entropy loss的计算方式去计算。没有读代码直接去用果然还是不行，然后就把后面计算过程也给加进去了。代码如下：

# ArcFace
import math
import torch
import torch.nn as nn
import torch.nn.functional as F


class ArcFaceLoss(nn.Module):
    r"""Implement of large margin arc distance: :
        Args:
            in_features: size of each input sample
            out_features: size of each output sample
            s: norm of input feature
            m: margin

            cos(theta + m)
        """

    def __init__(self, in_features, num_classes, s=30.0, m=0.50, easy_margin=False, use_gpu=True):
        super(ArcFaceLoss, self).__init__()
        self.in_features = in_features
        self.out_features = num_classes
        self.s = s
        self.m = m
        self.use_gpu = use_gpu
        self.logsoftmax = nn.LogSoftmax(dim=1)
        # Parameter 的用途：
        # 将一个不可训练的类型Tensor转换成可以训练的类型parameter
        # 并将这个parameter绑定到这个module里面
        # net.parameter()中就有这个绑定的parameter，所以在参数优化的时候可以进行优化的
        # https://www.jianshu.com/p/d8b77cc02410
        # 初始化权重
        if self.use_gpu:
            self.weight = nn.Parameter(torch.randn(num_classes, in_features).cuda())
        else:
            self.weight = nn.Parameter(torch.randn(num_classes, in_features))
        # self.weight = Parameter(torch.FloatTensor(out_features, in_features))
        nn.init.xavier_uniform_(self.weight)

        self.easy_margin = easy_margin
        self.cos_m = math.cos(m)
        self.sin_m = math.sin(m)
        self.th = math.cos(math.pi - m)
        self.mm = math.sin(math.pi - m) * m

    def forward(self, input, label):
        # --------------------------- cos(theta) & phi(theta) ---------------------------
        # torch.nn.functional.linear(input, weight, bias=None)
        # y=x*W^T+b
        cosine = F.linear(F.normalize(input), F.normalize(self.weight))
        sine = torch.sqrt(1.0 - torch.pow(cosine, 2))
        # cos(a+b)=cos(a)*cos(b)-size(a)*sin(b)
        phi = cosine * self.cos_m - sine * self.sin_m
        if self.easy_margin:
            # torch.where(condition, x, y) → Tensor
            # condition (ByteTensor) – When True (nonzero), yield x, otherwise yield y
            # x (Tensor) – values selected at indices where condition is True
            # y (Tensor) – values selected at indices where condition is False
            # return:
            # A tensor of shape equal to the broadcasted shape of condition, x, y
            # cosine>0 means two class is similar, thus use the phi which make it
            phi = torch.where(cosine > 0, phi, cosine)
        else:
            phi = torch.where(cosine > self.th, phi, cosine - self.mm)
        # --------------------------- convert label to one-hot ---------------------------
        # one_hot = torch.zeros(cosine.size(), requires_grad=True, device='cuda')
        # 将cos(\theta + m)更新到tensor相应的位置中
        one_hot = torch.zeros(cosine.size(), device='cuda')
        # scatter_(dim, index, src)
        one_hot.scatter_(1, label.view(-1, 1).long(), 1)
        # -------------torch.where(out_i = {x_i if condition_i else y_i) -------------
        output = (one_hot * phi) + ((1.0 - one_hot) * cosine)
        # you can use torch.where if your torch.__version__ is 0.4
        output *= self.s
        log_probs = self.logsoftmax(output)
        # print(output)

        # return output.sum()
        return (-one_hot * log_probs).mean(0).sum()