Margin Based Loss

大坡山小霸王

已于 2022-06-30 23:35:34 修改

阅读量1.2k

点赞数 2

分类专栏：度量学习文章标签：深度学习机器学习神经网络

于 2022-06-28 23:02:29 首次发布

本文链接：https://blog.csdn.net/weixin_44742887/article/details/125297945

版权

度量学习专栏收录该内容

6 篇文章 1 订阅

订阅专栏

17-ICCV-Sampling Matters in Deep Embedding Learning

semi-hard negative mining

Distance weighted sampling

Margin based loss

Relationship to isotonic regression

17-ICCV-Sampling Matters in Deep Embedding Learning

Preliminaries

感觉这篇论文的主要贡献应该是后面提出的损失函数（把contrastive和triplet结合在一起了），而不是前面的采样策略，这种“均匀”的采样策略几乎和其他所有论文使用困难（非平凡）样本矛盾了。

contrastive loss

正样本尽可能近，负样本被固定距离α隔开

visually diverse classes are embedded in the same small space as visually similar ones. The embedding space does not allow for distortions.

triplet loss

loss+sampling strategy

embedding space to be arbitrarily distorted
does not impose a constant margin α

hard negative mining

——>模型坍塌

semi-hard negative mining

online selection：one triplet is sampled for every (a, p) pair

offline selection：a batch has 1=3 of images as anchors, positives, and negatives respectively

如果有正确的采样策略，简单的pairwse loss也是高效的。

Distance weighted sampling

n维单位球面成对距离分布：球面上取一固定点a，随机在球面上另选一点，这个点和a之间距离（余弦距离？球面上的距离？）为d的概率

在高维空间中q(d)接近正态分布

如果负样本均匀分散，随机抽样最可能得到的样本；阈值<，没有loss，训练不会有进展。

负样本梯度

决定梯度方向，如果很小（困难样本），有噪音z，梯度方向会被噪音主导。

在anchor为a时负样本n被选中的概率

根据距离均匀采样，权重（采样概率与出现的概率成反比，随机采样到的概率越大乘的权重越小才能保证均匀）【避免采样都聚集在】
用λ切断权重采样【过近或过远的样本随机采样到的概率较小，对应的权重会比较大。为了避免噪音样本，设定λ限制两端的样本权重不会过大】

距离加权采样提供较大的距离范围，在控制方差的同时，稳定地生成信息丰富的示例。

  def inverse_sphere_distances(self, batch, anchor_to_all_dists, labels, anchor_label):
            dists        = anchor_to_all_dists
            bs,dim       = len(dists),batch.shape[-1]

            #negated log-distribution of distances of unit sphere in dimension <dim>
            log_q_d_inv = ((2.0 - float(dim)) * torch.log(dists) - (float(dim-3) / 2) * torch.log(1.0 - 0.25 * (dists.pow(2))))
            log_q_d_inv[np.where(labels==anchor_label)[0]] = 0

            q_d_inv     = torch.exp(log_q_d_inv - torch.max(log_q_d_inv)) # - max(log) for stability
            q_d_inv[np.where(labels==anchor_label)[0]] = 0

            ### NOTE: Cutting of values with high distances made the results slightly worse. It can also lead to
            # errors where there are no available negatives (for high samples_per_class cases).
            # q_d_inv[np.where(dists.detach().cpu().numpy()>self.upper_cutoff)[0]]    = 0

            q_d_inv = q_d_inv/q_d_inv.sum()
            return q_d_inv.detach().cpu().numpy()

【实际实现：权重log分布，没有λ切断可能更好】