点击上方“AI算法修炼营”,选择“星标”公众号
精选作品,第一时间送达
本文首发自:https://zhuanlan.zhihu.com/p/163413041
论文地址:https://arxiv.org/abs/2007.12163
代码地址:https://github.com/Andrew-Brown1/Smooth_AP
视频讲解:https://www.bilibili.com/video/BV1dD4y1m7fA
一、简介
图像检索通常是,给定一个包含特定实例(例如特定目标、场景、建筑等)的查询图像,图像检索旨在从数据库图像中找到包含相同实例的图像。但由于不同图像的拍摄视角、光照、或遮挡情况不同,如何设计出能应对这些类内差异的有效且高效的图像检索算法仍是一项研究难题。
图像检索的典型流程 首先,设法从图像中提取一个合适的图像的表示向量。其次,对这些表示向量用欧式距离或余弦距离进行最近邻搜索以找到相似的图像。最后,可以使用一些后处理技术对检索结果进行微调。可以看出,决定一个图像检索算法性能的关键在于提取的图像表示的好坏。
不同于以往基于度量学习的损失函数,作者提出了基于优化排序的损失函数。选择的优化对象是AP(Average Precision),但是AP是不可微的,所以提出了smooth AP,具体做法是写了AP估值计算后,将其中的不可微部分换成sigmoid函数。在Stanford Online products,VehicleID,INaturalist,VGGFace2 and IJB-C上做了实验,结果不错。结果示意:
二、本文方法
Notations
:retrieval set
:query instance
, :positive and negative set
:cosine similarity。 , :positive and negative relevance score sets; :query vector; :vectorized retrieval set。
:AP。 :the rankings of the instance i。
:ranking R。 :an indicator function。
:a difference matrix
。
优化AP其实是最小化 ,就是排序时候不要让负样本排到正样本前面。
Smooth AP
上面的 indicator function 不能被基于梯度的方法优化。
所以改为sigmoid: , 是平滑系数。
AP的估计值重写为 :
。
损失函数为: 。
此外还有三点分析。
第一点是平滑系数越小,AP的估计值越接近真实AP,而越大的平滑系数会带来更大的操作空间,就是图二里求导后的曲线下方面积,可以提供更多的梯度信息。
第二点是triplet loss 更像是度量损失而不是优化排序。
第三点是相对于其他优化AP的方法 FastAP and Blackbox AP,本方法更简单,并且估计的更准。而且这俩方法和triplet loss 一样,可能更像度量损失。
class SmoothAP(torch.nn.Module):
"""PyTorch implementation of the Smooth-AP loss.
implementation of the Smooth-AP loss. Takes as input the mini-batch of CNN-produced feature embeddings and returns
the value of the Smooth-AP loss. The mini-batch must be formed of a defined number of classes. Each class must
have the same number of instances represented in the mini-batch and must be ordered sequentially by class.
e.g. the labels for a mini-batch with batch size 9, and 3 represented classes (A,B,C) must look like:
labels = ( A, A, A, B, B, B, C, C, C)
(the order of the classes however does not matter)
For each instance in the mini-batch, the loss computes the Smooth-AP when it is used as the query and the rest of the
mini-batch is used as the retrieval set. The positive set is formed of the other instances in the batch from the
same class. The loss returns the average Smooth-AP across all instances in the mini-batch.
Args:
anneal : float
the temperature of the sigmoid that is used to smooth the ranking function. A low value of the temperature
results in a steep sigmoid, that tightly approximates the heaviside step function in the ranking function.
batch_size : int
the batch size being used during training.
num_id : int
the number of different classes that are represented in the batch.
feat_dims : int
the dimension of the input feature embeddings
Shape:
- Input (preds): (batch_size, feat_dims) (must be a cuda torch float tensor)
- Output: scalar
Examples::
>>> loss = SmoothAP(0.01, 60, 6, 256)
>>> input = torch.randn(60, 256, requires_grad=True).cuda()
>>> output = loss(input)
>>> output.backward()
"""
def __init__(self, anneal, batch_size, num_id, feat_dims):
"""
Parameters
----------
anneal : float
the temperature of the sigmoid that is used to smooth the ranking function
batch_size : int
the batch size being used
num_id : int
the number of different classes that are represented in the batch
feat_dims : int
the dimension of the input feature embeddings
"""
super(SmoothAP, self).__init__()
assert(batch_size%num_id==0)
self.anneal = anneal
self.batch_size = batch_size
self.num_id = num_id
self.feat_dims = feat_dims
def forward(self, preds):
"""Forward pass for all input predictions: preds - (batch_size x feat_dims) """
# ------ differentiable ranking of all retrieval set ------
# compute the mask which ignores the relevance score of the query to itself
mask = 1.0 - torch.eye(self.batch_size)
mask = mask.unsqueeze(dim=0).repeat(self.batch_size, 1, 1)
# compute the relevance scores via cosine similarity of the CNN-produced embedding vectors
sim_all = compute_aff(preds)
sim_all_repeat = sim_all.unsqueeze(dim=1).repeat(1, self.batch_size, 1)
# compute the difference matrix
sim_diff = sim_all_repeat - sim_all_repeat.permute(0, 2, 1)
# pass through the sigmoid
sim_sg = sigmoid(sim_diff, temp=self.anneal) * mask.cuda()
# compute the rankings
sim_all_rk = torch.sum(sim_sg, dim=-1) + 1
# ------ differentiable ranking of only positive set in retrieval set ------
# compute the mask which only gives non-zero weights to the positive set
xs = preds.view(self.num_id, int(self.batch_size / self.num_id), self.feat_dims)
pos_mask = 1.0 - torch.eye(int(self.batch_size / self.num_id))
pos_mask = pos_mask.unsqueeze(dim=0).unsqueeze(dim=0).repeat(self.num_id, int(self.batch_size / self.num_id), 1, 1)
# compute the relevance scores
sim_pos = torch.bmm(xs, xs.permute(0, 2, 1))
sim_pos_repeat = sim_pos.unsqueeze(dim=2).repeat(1, 1, int(self.batch_size / self.num_id), 1)
# compute the difference matrix
sim_pos_diff = sim_pos_repeat - sim_pos_repeat.permute(0, 1, 3, 2)
# pass through the sigmoid
sim_pos_sg = sigmoid(sim_pos_diff, temp=self.anneal) * pos_mask.cuda()
# compute the rankings of the positive set
sim_pos_rk = torch.sum(sim_pos_sg, dim=-1) + 1
# sum the values of the Smooth-AP for all instances in the mini-batch
ap = torch.zeros(1).cuda()
group = int(self.batch_size / self.num_id)
for ind in range(self.num_id):
pos_divide = torch.sum(sim_pos_rk[ind] / (sim_all_rk[(ind * group):((ind + 1) * group), (ind * group):((ind + 1) * group)]))
ap = ap + ((pos_divide / group) / self.batch_size)
return (1-ap)
三、实验
数据集:
SOP结果和AP估计,可以看到平滑系数越小,估计越准:
VehicleID 和 INaturalist 数据集上的结果:
人脸数据集上的结果:
消融实验。第二个表,越小的P意味着一个batch里面其他类别更多,负样本数更多,排序出现负样本出现在正样本前面的概率越大,更有利于网络学习。
定性结果: