CenterLoss的实现

最新推荐文章于 2024-03-07 17:16:31 发布

Natuski_

最新推荐文章于 2024-03-07 17:16:31 发布

阅读量721

点赞数 2

文章标签： pytorch 人脸识别神经网络卷积

本文链接：https://blog.csdn.net/kobayashi_/article/details/108029539

版权

论文

文章目录

1.为什么要CenterLoss

首先在定义一个简单的全连接神经网络。为了更好的可视化特征，将网络的输出层的前一层的输出变为2，使之输出只有2个特征。接着，在MNIST数据集上进行训练，边训练，边可视化输出。

class SimpleFCNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.fc1=nn.Sequential(
        nn.Linear(784,512),
        nn.BatchNorm1d(512), 
        nn.ReLU(),

        nn.Linear(512,256),
        nn.BatchNorm1d(256),
        nn.ReLU(),

        nn.Linear(256,128),
        nn.BatchNorm1d(128),
        nn.ReLU(),

        nn.Linear(128,2)
    )
    self.fc2=nn.Sequential(
        nn.Linear(2,10),
        nn.Softmax(dim=1),
    )

  def forward(self,x):
    fc1_out=self.fc1(x)
    out=self.fc2(fc1_out)
    return out,fc1_out

接着我们定义一个简单的卷积神经网络。同样的，特征提取层的输出为2个特征。

class SimpleConvNet(nn.Module):
  def __init__(self):
    super().__init__()
    self.conv1=nn.Sequential(
        nn.Conv2d(1,56,3), 
        nn.BatchNorm2d(56),
        nn.ReLU(),
        nn.MaxPool2d(3,2),
        nn.Conv2d(56,128,3),
        nn.BatchNorm2d(128),
        nn.ReLU(),
        nn.MaxPool2d(3,2),
        nn.Conv2d(128,256,3),
        nn.BatchNorm2d(256),
        nn.ReLU(),
    
        )
    self.fc1=nn.Sequential(
        nn.Linear(2*2*256,2),
    )
    self.fc2=nn.Sequential(
        nn.Linear(2,10),
        nn.Softmax(dim=1),
    )

  def forward(self,x):
    conv_out=self.conv1(x)
    fc1_out=self.fc1(conv_out.reshape(-1,256*2*2))
    fc2_out=self.fc2(fc1_out)

    return fc2_out,fc1_out

使用MINIST数据集，迭代100轮。

def visualFC():
  net=SimpleFCNet().to(DEVICE)
  # net=SimpleConvNet.to(DEVICE)
  opt=torch.optim.Adam(net.parameters())

  for epoch in range(EPOCH):
    feature_loader=[]
    label_loader=[]
    
    for i,(x,y) in enumerate(train_loader):
      input=x.reshape(-1,784).to(DEVICE)
      #input=x.to(DEVICE) #卷积网络的输入
      target=F.one_hot(y,num_classes=10).float()
      target=target.to(DEVICE)

      output,feat=net(input)
      loss=F.
        (output,target)

      opt.zero_grad()
      loss.backward()
      opt.step()

      feature_loader.append(feat)
      label_loader.append(y)
    
    feat=torch.cat(feature_loader,0)
    labels=torch.cat(label_loader,0)
    visual2d(feat.data.cpu().numpy(),labels.data.cpu().numpy(),epoch)

可视化输出

def visual2d(features,labels,epoch):
  plt.ion()
  c = ['#ff0000', '#ffff00', '#00ff00', '#00ffff', '#0000ff',
         '#ff00ff', '#990000', '#999900', '#009900', '#009999']
  plt.clf()
  for i in range(10):
      #features [[[0,1],[0,1],[0,1]],[[]],...] #label [[1,2,3,],[],...]，形状
      plt.plot(features[labels == i, 0], features[labels == i, 1], '.', c=c[i])
      #将输入的标签与当【输入该标签的数据】时的输出的特征关联起来。
  plt.legend(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], loc = 'upper right')
  plt.title("epoch=%d" % epoch)
  plt.draw()
  plt.pause(0.001)

得到结果

1.全连接
在这里插入图片描述

2.卷积
在这里插入图片描述

对比两图，可以这样描述：

经过100轮迭代，全连接的类内距相对卷积较小，但是类间距相对卷积较大。

明显地，全连接并没有将所有的数据按类别完全分开。虽然在100轮迭代时，两者都没有将数据完全分开，但是卷积将数据按类别分开的能力比全连接要强。

经过500轮迭代，选取最好的结果。卷积网络：
在这里插入图片描述

由图可知，卷积神经网络虽然能够将数据分成10堆，但是，每堆的特征值的分布的类内距较大，且类间距也较小。

结论：对于人脸识别来说，我们需要网络能够精确地判别（Discriminative）¹人脸地每个特征的能力（减少类间距），但是使用softmax 损失只是引导网络对特征进行分离（Separable）而不是判别。

在这里插入图片描述

fig.1. 卷积网络的典型结构

For face recognition task, the deeply learned features need to be not only separable but also discriminative. Since it is impractical to pre-collect all the possible testing identities for training, the label prediction in CNNs is not always applicable.The deeply learned features are required to be discriminative and generalized enough for identifying new unseen classes without label prediction. Discriminative power characterizes features in both the compact intra-class variations and separable inter-class differences, as shown in Fig. 1. Discriminative features can be well-classified by nearest neighbor (NN) [ or k-nearest neighbor (k-NN) algorithms, which do not necessarily depend on the label prediction. However, the softmax loss only encourage the separability of features. The resulting features are not sufficiently effective for face recognition.

为了训练一个能够满足人脸识别的特征提取器，我们需要该特征提取器能够将数据的特征分得足够开。用fig.1.图来说，就是需要特征提取网络既具有判别特征的能力，也要具有分开特征的能力。也就是，类间距大，类内距小。

对于人脸识别任务（识别出图像得人是哪个人）来说，通用流程是先提取人脸的特征，然后和注册在数据库中的人脸特征模板作对比，最后得到结果。

在这里插入图片描述

fig.2. 人脸比对²

一句话来说，需要设计一个损失函数来引导网络增加类间距，减少类内距。

2.如何CenterLoss

论文提出的CenterLoss为每个类的特征学习一个中心点，该中心点的维度与特征维度相同。在训练过程中，同时更新类中心和减小同一类的特征之间的距离。并且和Softmax损失一起进行联合训练。Softmax损失能够使特征在不同类之间保持距离，Centerloss则有效的减小了相同类之间的距离。在联合训练中，不仅扩大了类间据，也减少了类内距。

center loss函数：
$\mathcal L_{\mathcal C}=\frac{1}{2}\sum_{i=1}^{m}||x_i-c_{yi}||_2^2$
$c_{yi}\in \mathbb{R^d}$ ，是第 $y_i$ 个类的特征中心， $m$ 个数据， $c_{yi}$ 随着特征的改变（网络学习过程中）而更新，也就是说，需要在一次迭代中使用整个数据集来计算每个类特征的期望（特征的中心），这样做非常地低效。

$\mathcal{L_C}$ 衡量的是类内距离损失，最小化 $\mathcal{L_C}$ 就是最小化类内距离。

为了解决这个问题，作者提出了两个改进：

1.在mini-batch中更新中心，在每次迭代中，中心根据平均对应类的特征值计算；

2.为了避免少数错误标记样本造成大的扰动，使用标量 $\alpha$ 来控制中心的学习率。

$\mathcal{L}_{\mathcal{C}}$ 对于的 $x_i$ 梯度和 $c_j$ 的梯度如下
$\frac{\partial\mathcal{L_C}}{\partial{x_i}}=x_i-c_{yi}\\ \Delta{c_j}=\frac{\sum_{i=1}^{m}\delta(y_i=j)·(c_j-x_i)}{1+\sum_{i=1}^{m}\delta(y_i=j)}$
$condition:y_i=j$ 条件满足, $\delta(condition)=1$ ，否则为0。

$y_i=j$ 条件满足指，网络输出的指与标签相等，此时：

$\Delta c_j=\frac{1}{m}\sum_{i=1}^{m}(c_j-x_i)$

不满足， $\Delta{c_j}=0$

$\alpha\in[0,1]$

Center loss和Softmax loss的联合损失
$\mathcal{L}=\mathcal{L_S+\lambda L_C}$
标量 $\lambda$ 用于平衡两者，不同的 $\lambda$ 会产生不同的结果。

在这里插入图片描述

具体算法实现如下：

在这里插入图片描述

$j$ 是类别数，即如果有10个类， $j = 1, 2, . . ., 10$ 。

3.Centerloss 的代码实现

class center_loss(nn.Module):
    def __init__(self, cls_num, feature_nums):
        '''
        随机初始化center
        :param cls_num: 类别数，即有几个特征中心
        :param feature_nums: 特征数，即特征的维度，假若有两个特征，那么特征维度为2
        '''
        super(center_loss, self).__init__()
        self.cls_num = cls_num
        # 随机初始化中心
        self.center = nn.Parameter(torch.randn(cls_num, feature_nums))

    def forward(self, features, labels):
        '''
        :param x: 输入
        :param y: 标签
        :return:  center loss
        '''
        # 第一步：将center变量与x对应。
        # 倘若输入：x=[0，1，2，3，4，5，6，7，8，9]
        #         y=[0,0,0,0,0,1,1,1,1,1]
        #         center=[5,6]
        # 可以将center变为 center_=[5,5,5,5,5,6,6,6,6,6]
        center_ = self.center.index_select(dim=0, index=labels.long())
        # 第二步，计算每个数据到随机中心点欧式距离
        # Euclidean Distance=srqt(sum(pow(x1-c1_)+pow(x2_c2_)...))
        Euclid_Dis = torch.sqrt(torch.sum(torch.pow(features - center_, 2), dim=1))
        # 第三步：计算每个类的期望（特征中心）
        # 统计标签中每个类的个数
        n = torch.histc(labels, bins=self.cls_num,min=0, max=self.cls_num-1)
        # 计算数据的期望
        n_ = n.index_select(dim=0, index=labels.long())
        mean = torch.div(Euclid_Dis, n_)
        return mean.sum()/2