ECCV2020 | Smooth-AP：用于大规模图像检索的平滑损失函数，解决不可微问题

最新推荐文章于 2024-06-28 20:51:20 发布

AI算法修炼营

最新推荐文章于 2024-06-28 20:51:20 发布

阅读量2.6k

点赞数 1

文章标签：算法深度学习机器学习人工智能 deep learning

本文链接：https://blog.csdn.net/sinat_17456165/article/details/108289465

版权

点击上方“AI算法修炼营”，选择“星标”公众号

精选作品，第一时间送达

本文首发自：https://zhuanlan.zhihu.com/p/163413041

论文地址：https://arxiv.org/abs/2007.12163

代码地址：https://github.com/Andrew-Brown1/Smooth_AP

视频讲解：https://www.bilibili.com/video/BV1dD4y1m7fA

一、简介

图像检索通常是，给定一个包含特定实例（例如特定目标、场景、建筑等)的查询图像，图像检索旨在从数据库图像中找到包含相同实例的图像。但由于不同图像的拍摄视角、光照、或遮挡情况不同，如何设计出能应对这些类内差异的有效且高效的图像检索算法仍是一项研究难题。

图像检索的典型流程首先，设法从图像中提取一个合适的图像的表示向量。其次，对这些表示向量用欧式距离或余弦距离进行最近邻搜索以找到相似的图像。最后，可以使用一些后处理技术对检索结果进行微调。可以看出，决定一个图像检索算法性能的关键在于提取的图像表示的好坏。

不同于以往基于度量学习的损失函数，作者提出了基于优化排序的损失函数。选择的优化对象是AP（Average Precision），但是AP是不可微的，所以提出了smooth AP，具体做法是写了AP估值计算后，将其中的不可微部分换成sigmoid函数。在Stanford Online products，VehicleID，INaturalist，VGGFace2 and IJB-C上做了实验，结果不错。结果示意：

二、本文方法

Notations

：retrieval set
：query instance
，：positive and negative set
：cosine similarity。，：positive and negative relevance score sets；：query vector；：vectorized retrieval set。
：AP。：the rankings of the instance i。
：ranking R。：an indicator function。
：a difference matrix
。

优化AP其实是最小化，就是排序时候不要让负样本排到正样本前面。

Smooth AP

上面的 indicator function 不能被基于梯度的方法优化。

所以改为sigmoid：，是平滑系数。

AP的估计值重写为：

。

损失函数为：。

此外还有三点分析。

第一点是平滑系数越小，AP的估计值越接近真实AP，而越大的平滑系数会带来更大的操作空间，就是图二里求导后的曲线下方面积，可以提供更多的梯度信息。

第二点是triplet loss 更像是度量损失而不是优化排序。

第三点是相对于其他优化AP的方法 FastAP and Blackbox AP，本方法更简单，并且估计的更准。而且这俩方法和triplet loss 一样，可能更像度量损失。

class SmoothAP(torch.nn.Module):
    """PyTorch implementation of the Smooth-AP loss.
    implementation of the Smooth-AP loss. Takes as input the mini-batch of CNN-produced feature embeddings and returns
    the value of the Smooth-AP loss. The mini-batch must be formed of a defined number of classes. Each class must
    have the same number of instances represented in the mini-batch and must be ordered sequentially by class.
    e.g. the labels for a mini-batch with batch size 9, and 3 represented classes (A,B,C) must look like:
        labels = ( A, A, A, B, B, B, C, C, C)
    (the order of the classes however does not matter)
    For each instance in the mini-batch, the loss computes the Smooth-AP when it is used as the query and the rest of the
    mini-batch is used as the retrieval set. The positive set is formed of the other instances in the batch from the
    same class. The loss returns the average Smooth-AP across all instances in the mini-batch.
    Args:
        anneal : float
            the temperature of the sigmoid that is used to smooth the ranking function. A low value of the temperature
            results in a steep sigmoid, that tightly approximates the heaviside step function in the ranking function.
        batch_size : int
            the batch size being used during training.
        num_id : int
            the number of different classes that are represented in the batch.
        feat_dims : int
            the dimension of the input feature embeddings
    Shape:
        - Input (preds): (batch_size, feat_dims) (must be a cuda torch float tensor)
        - Output: scalar
    Examples::
        >>> loss = SmoothAP(0.01, 60, 6, 256)
        >>> input = torch.randn(60, 256, requires_grad=True).cuda()
        >>> output = loss(input)
        >>> output.backward()
    """

    def __init__(self, anneal, batch_size, num_id, feat_dims):
        """
        Parameters
        ----------
        anneal : float
            the temperature of the sigmoid that is used to smooth the ranking function
        batch_size : int
            the batch size being used
        num_id : int
            the number of different classes that are represented in the batch
        feat_dims : int
            the dimension of the input feature embeddings
        """
        super(SmoothAP, self).__init__()

        assert(batch_size%num_id==0)

        self.anneal = anneal
        self.batch_size = batch_size
        self.num_id = num_id
        self.feat_dims = feat_dims

    def forward(self, preds):
        """Forward pass for all input predictions: preds - (batch_size x feat_dims) """


        # ------ differentiable ranking of all retrieval set ------
        # compute the mask which ignores the relevance score of the query to itself
        mask = 1.0 - torch.eye(self.batch_size) 
        mask = mask.unsqueeze(dim=0).repeat(self.batch_size, 1, 1)
        # compute the relevance scores via cosine similarity of the CNN-produced embedding vectors
        sim_all = compute_aff(preds)
        sim_all_repeat = sim_all.unsqueeze(dim=1).repeat(1, self.batch_size, 1)
        # compute the difference matrix
        sim_diff = sim_all_repeat - sim_all_repeat.permute(0, 2, 1)
        # pass through the sigmoid
        sim_sg = sigmoid(sim_diff, temp=self.anneal) * mask.cuda()
        # compute the rankings
        sim_all_rk = torch.sum(sim_sg, dim=-1) + 1

        # ------ differentiable ranking of only positive set in retrieval set ------
        # compute the mask which only gives non-zero weights to the positive set
        xs = preds.view(self.num_id, int(self.batch_size / self.num_id), self.feat_dims)
        pos_mask = 1.0 - torch.eye(int(self.batch_size / self.num_id))
        pos_mask = pos_mask.unsqueeze(dim=0).unsqueeze(dim=0).repeat(self.num_id, int(self.batch_size / self.num_id), 1, 1)
        # compute the relevance scores
        sim_pos = torch.bmm(xs, xs.permute(0, 2, 1))
        sim_pos_repeat = sim_pos.unsqueeze(dim=2).repeat(1, 1, int(self.batch_size / self.num_id), 1)
        # compute the difference matrix
        sim_pos_diff = sim_pos_repeat - sim_pos_repeat.permute(0, 1, 3, 2)
        # pass through the sigmoid
        sim_pos_sg = sigmoid(sim_pos_diff, temp=self.anneal) * pos_mask.cuda()
        # compute the rankings of the positive set
        sim_pos_rk = torch.sum(sim_pos_sg, dim=-1) + 1

        # sum the values of the Smooth-AP for all instances in the mini-batch
        ap = torch.zeros(1).cuda()
        group = int(self.batch_size / self.num_id)
        for ind in range(self.num_id):
            pos_divide = torch.sum(sim_pos_rk[ind] / (sim_all_rk[(ind * group):((ind + 1) * group), (ind * group):((ind + 1) * group)]))
            ap = ap + ((pos_divide / group) / self.batch_size)

        return (1-ap)