对比学习（Contrastive Learning）中的损失函数

最新推荐文章于 2025-04-09 13:42:49 发布

IMU_YY

最新推荐文章于 2025-04-09 13:42:49 发布

阅读量4.9w

点赞数 83

分类专栏： Loss Function Contrastive Learning 文章标签：对比学习

本文链接：https://blog.csdn.net/yyhaohaoxuexi/article/details/113824125

版权

Loss Function 同时被 2 个专栏收录

2 篇文章

订阅专栏

Contrastive Learning

1 篇文章

订阅专栏

本文深入探讨了对比学习中的InfoNCE和HCL两种损失函数，通过实例代码解析了它们的实现方式。InfoNCE在MoCo中用于衡量正负样本之间的距离，而HCL在《Contrastive Learning with Hard Negative Samples》中提出，两者的本质相似，都是通过正样本得分与所有样本得分的比值进行计算。文中还详细分析了两种实现的差异，并通过代码展示了等效的计算过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章目录

写在前面

最近在基于对比学习做实验，github有许多实现，虽然直接套用即可，但是细看之下，损失函数部分甚是疑惑，故学习并记录于此。关于对比学习的内容网络上已经有很多内容了，因此不再赘述。本文重在对InfoNCE的两种实现方式的记录。

一、Info Noise-contrastive estimation(Info NCE)

1.1 描述

InfoNCE在MoCo中被描述为：
$\mathcal{L}_{q}=-\log \frac{\exp \left(q \cdot k_{+} / \tau\right)}{\sum_{i=0}^{K} \exp \left(q \cdot k_{i} / \tau\right)} \tag{1}$
其中 $\tau$ 是超参。

分子表示： $q$ 对 $k_+$ 的点积。所谓点积就是描述 $q$ 和 $k_+$ 两个向量之间的距离。
分母表示： $q$ 对所有 $k$ 的点积。所谓所有就是指正例（positive sample）和负例（negative sample），所以求和号是从 $i = 0$ 到 $K$ ，一共 $K + 1$ 项。

1.2 实现

MoCo源码的\moco\builder.py中，实现如下：

	# compute logits
	# Einstein sum is more intuitive
	# positive logits: Nx1
	l_pos = torch.einsum('nc,nc->n', [q, k]).unsqueeze(-1)
	# negative logits: NxK
	l_neg = torch.einsum('nc,ck->nk', [q, self.queue.clone().detach()])
	
	# logits: Nx(1+K)
	logits = torch.cat([l_pos, l_neg], dim=1)
	
	# apply temperature
	logits /= self.T
	
	# labels: positive key indicators
	labels = torch.zeros(logits.shape[0], dtype=torch.long).cuda()
	...
	return logits, labels

这里的变量logits的意义我也查了一下：是未进入softmax的概率

这段代码根据注释即可理解：l_pos表示正样本的得分，l_neg表示所有负样本的得分，logits表示将正样本和负样本在列上cat起来之后的值。值得关注的是，labels的数值，是根据logits.shape[0]的大小生成的一组zero。也就是大小为batch_size的一组0。

接下来看损失函数部分，\main_moco.py：

	# define loss function (criterion) and optimizer
	criterion = nn.CrossEntropyLoss().cuda(args.gpu)
	...
	# compute output
	output, target = model(im_q=images[0], im_k=images[1])
	loss = criterion(output, target)

这里直接对输出的logits和生成的labels计算交叉熵，然后就是模型的loss。这里就是让我不是很理解的地方。先将疑惑埋在心里～

二、HCL

2.1 描述

在文章《Contrastive Learning with Hard Negative Samples》中描述到，使用负样本的损失函数为：
$\mathbb{E}_{x \sim p, x^{+} \sim p_{x}^{+}}\left[-\log \frac{e^{f(x)^{T} f\left(x^{+}\right)}}{e^{f(x)^{T} f\left(x^{+}\right)}+\frac{Q}{N} \sum_{i=1}^{N} e^{f(x)^{T} f\left(x_{i}^{-}\right)}}\right] \tag{2}$

分子： $e^{f(x)^{T} f(x^{+})}$ 表示学到的表示 $f (x)$ 和正样本 $f(x^+)$ 的点积。（其实也就是正样本的得分）
分母：第一项表示正样本的得分，第二项表示负样本的得分。

其实本质上适合InfoNCE一个道理，都是mean(-log(正样本的得分/所有样本的得分))。

2.2 实现

但是在这篇文章的实现中，\image\main.py：

def criterion(out_1,out_2,tau_plus,batch_size,beta, estimator):
	# neg score
	out = torch.cat([out_1, out_2], dim=0)
	neg = torch.exp(torch.mm(out, out.t().contiguous()) / temperature)
	old_neg = neg.clone()
	mask = get_negative_mask(batch_size).to(device)
	neg = neg.masked_select(mask).view(2 * batch_size, -1)
	
	# pos score
	pos = torch.exp(torch.sum(out_1 * out_2, dim=-1) / temperature)
	pos = torch.cat([pos, pos], dim=0)
	
	# negative samples similarity scoring
	if estimator=='hard':
	    N = batch_size * 2 - 2
	    imp = (beta* neg.log()).exp()
	    reweight_neg = (imp*neg).sum(dim = -1) / imp.mean(dim = -1)
	    Ng = (-tau_plus * N * pos + reweight_neg) / (1 - tau_plus)
	    # constrain (optional)
	    Ng = torch.clamp(Ng, min = N * np.e**(-1 / temperature))
	elif estimator=='easy':
	    Ng = neg.sum(dim=-1)
	else:
	    raise Exception('Invalid estimator selected. Please use any of [hard, easy]')
	    
	# contrastive loss
	loss = (- torch.log(pos / (pos + Ng) )).mean()
	
	return loss

可以看到最后计算loss的公式是：

	loss = (- torch.log(pos / (pos + Ng) )).mean()

的确与我上文中的理解相同，可是为什么这样的实现，没有用到全0的label呢？

三、文字解释

既然是同一种方法的两种实现，已经理解了第二种实现(HCL)。那么，问题就出在了：不理解第一种实现的label为何要这样生成? 于是乎，查看交叉熵的计算方式：
$\text{loss}(x, class) = -\log\left(\frac{\exp(x[class])}{\sum_j \exp(x[j])}\right)= -x[class] + \log\left(\sum_j \exp(x[j])\right) \tag{3}$

交叉熵的label的作用是：将label作为索引，来取得 $x$ 中的项( $x [c l a s s]$ )，因此，这些项就是label。而倘若label是全0的项，那么其含义为： $x$ 中的第一列为label（正样本），其他列就是负样本。然后带入公式(3)中计算，即可得到交叉熵下的loss值。