SimCSE: Simple Contrastive Learning of Sentence Embeddings（个人笔记）

最新推荐文章于 2024-09-14 16:15:29 发布

RSociopath

最新推荐文章于 2024-09-14 16:15:29 发布

阅读量251

点赞数 5

文章标签： nlp

本文链接：https://blog.csdn.net/RSociopath/article/details/140831188

版权

“1 Introduction”

“We present SimCSE, a simple contrastive sentence embedding framework, which can produce superior sentence embeddings, from either unlabeled or labeled data” 我们提出了SimCSE，一个简单的对比句子嵌入框架，它可以从无标签或有标签的数据中产生更好的句子嵌入

“Our unsupervised SimCSE simply predicts the input sentence itself with only dropout used as noise” 我们的无监督SimCSE只对输入句子本身进行预测，仅使用dropout 作为噪声

“Our supervised SimCSE builds upon the recent success of using natural language inference (NLI) datasets for sentence embeddings and incorporates annotated sentence pairs in contrastive learning (Figure 1(b)).” 我们的有监督SimCSE基于最近使用自然语言推理( Natural Language Inference，NLI )数据集进行句子嵌入的成功，并在对比学习中融合了已标注的句子对(图1 ( b ) )。

“2 Background: Contrastive Learning”

“Contrastive learning aims to learn effective representation by pulling semantically close neighbors together and pushing apart non-neighbors” 对比学习旨在通过将语义相近的邻居拉在一起，将非邻居推开来学习有效的表征

“we encode input sentences using a pre-trained language model such as BERT (Devlin et al., 2019) or RoBERTa” 我们使用预训练的语言模型(如BERT ( Devlin et al , 2019)或罗伯塔)对输入句子进行编码

“and then fine-tune all the parameters using the contrastive learning objective (Eq. 1).” 然后利用对比学习目标(式( 1 ) )对所有参数进行微调。

“D = {(xi, x+ i )}im=1, where xi and x+ i are semantically related.” D = { ( xi , x + i) } im = 1，其中xi和x + i在语义上相关。
“let hi and h+ i denote the representations of xi and x+ i ,” 设hi和h + i分别表示xi和x + i的表示.

“Positive instances.”

“One critical question in contrastive learning is how to construct (xi, x+ i ) pairs.” 对比学习中的一个关键问题是如何构建( xi , x + i)对。

“applying augmentation techniques such as word deletion, reordering, and substitution.” 应用增广技术，如单词删除、重新排序和替换。

“In NLP, a similar contrastive learning objective has been explored in different contexts” 在NLP中，相似的对比学习目标在不同的语境中被探索.“In these cases, (xi, x+ i ) are collected from supervised datasets such as question-passage pairs”在这些情况下，( xi , x + i)是从问题-篇章对等监督数据集中收集的。

“Alignment and uniformity.”

“two key properties related to contrastive learning—alignment and uniformityand propose to use them to measure the quality of representations.” 与对比学习相关的两个关键属性- -对齐性和统一性，并提出用它们来衡量表征的质量。

“pdata denotes the data distribution.” Pdata表示数据分布。

“These two metrics are well aligned with the objective of contrastive learning: positive instances should stay close and embeddings for random instances should scatter on the hypersphere.” 这两个指标与对比学习的目标很好地吻合：正例应该保持紧密，随机例的嵌入应该分散在超球体上。

“3 Unsupervised SimCSE”

“we take a collection of sentences {xi}im=1 and use x+ i = xi. The key ingredient to get this to work with identical positive pairs is through the use of independently sampled dropout masks for xi and x+ i .” 我们取句子集合{ xi } im = 1并使用x + i = xi。获得这种工作的关键要素是通过对xi和x + i使用独立采样的dropout掩码。

“Dropout noise as data augmentation.”

“the positive pair takes exactly the same sentence, and their embeddings only differ in dropout masks.” 正对取完全相同的句子，它们的嵌入只在dropout掩码上有差异。

“4 Supervised SimCSE”

“we instead directly take (xi, x+ i ) pairs from supervised datasets and use them to optimize Eq. 1.” 相反，我们直接从监督数据集中抽取( xi , x + i)对，并使用它们来优化方程。1 .

“Choices of labeled data.”

“QQP4”
“Flickr30k”
“ParaNMT”
“NLI datasets”

“Contradiction as hard negatives.”

“Finally, we further take the advantage of the NLI datasets by using its contradiction pairs as hard negatives.” 最后，我们进一步利用NLI数据集的优势，将其反例对作为硬否定。

“Formally, we extend (xi, x+ i ) to (xi, x+ i , x− i ), where xi is the premise, x+ i and x− i are entailment and contradiction hypotheses.” 在形式上，我们将( xi , x + i)推广到( xi , x + i , x-i)，其中xi是前提，x + i和x - i是蕴涵和矛盾假设。

“5 Connection to Anisotropy”与各向异性的联系

“Recent work identifies an anisotropy problem in language representations, i.e., the learned embeddings occupy a narrow cone in the vector space, which severely limits their expressiveness.” 最近的工作发现了语言表示中的一个各向异性问题，即学习到的嵌入在向量空间中占据一个狭窄的锥，这严重限制了它们的表达能力。

“In this work, we show that—both theoretically and empirically—the contrastive objective can also alleviate the anisotropy problem.” 在这项工作中，我们从理论和实证上证明了对比目标也可以缓解各向异性问题。

“6 Experiment”

“We conduct our experiments on 7 semantic textual similarity (STS) tasks.” 我们在7个语义文本相似度( STS )任务上进行实验。

“Semantic textual similarity tasks.”

“Training details.”

“We start from pre-trained checkpoints of BERT (Devlin et al., 2019) (uncased) or RoBERTa (Liu et al., 2019) (cased) and take the [CLS] representation as the sentence embedding” 我们从BERT (uncased)或RoBERTa( cased )的预训练检查点开始，以[ CLS ]表示作为句子嵌入

“6.2 Main Results”

“6.3 Ablation Studies”

“Pooling methods.”

“7 Analysis”

“Uniformity and alignment.”

“models which have both better alignment and uniformity achieve better performance” 同时具有较好的对齐性和均匀性的模型可以获得更好的性能

“though pre-trained embeddings have good alignment, their uniformity is poor” 虽然预训练的嵌入具有良好的对齐性，但其均匀性较差
“post-processing methods like BERT-flow and BERT-whitening greatly improve uniformity but also suffer a degeneration in alignment” BERT - flow和BERT - whitening等后处理方法极大地改善了均匀性，但也存在对齐退化的问题
“unsupervised SimCSE effectively improves uniformity of pre-trained embeddings whereas keeping a good alignment;” 无监督的SimCSE有效地提高了预训练嵌入的均匀性，同时保持了良好的对齐；
“incorporating supervised data in SimCSE further amends alignment” 在SimCSE中融合监督数据进一步修正了比对结果

“Qualitative comparison.”

“As several examples shown in Table 8, the retrieved sentences by SimCSE have a higher quality compared to those retrieved by SBERT.” 如表8中的几个例子所示，SimCSE检索的句子比SBERT检索的句子质量高。

“9 Conclusion”

“It provides a new perspective on data augmentation with text input, and can be extended to other continuous representations and integrated in language model pre-training.” 它为文本输入的数据增强提供了一个新的视角，可以扩展到其他连续表示，并集成在语言模型预训练中。