Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

最新推荐文章于 2025-02-05 10:52:43 发布

还卿一钵无情泪

最新推荐文章于 2025-02-05 10:52:43 发布

阅读量474

点赞数

分类专栏： Paper

本文链接：https://blog.csdn.net/weixin_48185819/article/details/118481664

版权

Paper 专栏收录该内容

11 篇文章

订阅专栏

原文 https://arxiv.org/abs/1908.10084

Abstract

STS semantic textual similarity

BERT结构不适合语义相似搜索，非监督的任务聚类等

SBERT Sentence-BERT

finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT.

finding which of the over 40 million existent questions of Quora is the most similar for a new question could be modeled as a pair-wise comparison with BERT, however, answering a single query would require over 50 hours

1 Introduction

By using optimized index structures, finding the most similar Quora question can be reduced from 50 hours to a few milliseconds (Johnson et al., 2017).

Quora一个新问题要50小时才能找到答案，SBERT降到毫秒

3 Model

SBERT adds a pooling operation to the output of BERT / RoBERTa to derive a fixed sized sentence embedding

增加了池化层，固定embedding的尺寸

We experiment with three pooling strategies: Using the output of the CLS-token, computing the mean of all output vectors (MEAN strategy), and computing a max-over-time of the output vectors (MAX-strategy). The default configuration is MEAN.

三种池化策略

3.1 Training Details

We fine-tune SBERT with a 3-way softmax classifier objective function for one epoch. We used a batch-size of 16, Adam optimizer with learning rate 2e−5, and a linear learning rate warm-up over 10% of the training data. Our default pooling strategy is MEAN.

三种方式和fine-tune参数

4.1 Unsupervised STS

SBERT和其他模型在STS任务结果对比，SRoBERT相对于SBERT提高有限

4.2 Supervised STS

4.3 Argument Facet Similarity

AFS Argument Facet Similarity

STS data is usually descriptive, while AFS data are argumentative excerpts from dialogs. To be considered similar, arguments must not only make similar claims, but also provide a similar reasoning.

AFS数据判断相似比STS数据更难

6 Ablation Study

classification任务用 (u,v,|u-v|) 在softmax classifier中效果比较好

When trained with the classification objective function on NLI data, the pooling strategy has a rather minor impact. The impact of the concatenation mode is much larger.

When trained with the regression objective function, we observe that the pooling strategy has a large impact.

不同任务对应不同的影响

7 Computational Efficiency

For improved computation of sentence embeddings, we implemented a smart batching strategy: Sentences with similar lengths are grouped together and are only padded to the longest element in a mini-batch. This drastically reduces computational overhead from padding tokens.

降低计算成本使用策略，相似长度的放在一起，只和mini-batch中对齐