Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

原文  https://arxiv.org/abs/1908.10084

Abstract

STS semantic textual similarity

BERT结构不适合语义相似搜索,非监督的任务聚类等

SBERT Sentence-BERT

finding the most similar pair from 65 hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT.

finding which of the over 40 million existent questions of Quora is the most similar for a new question could be modeled as a pair-wise comparison with BERT, however, answering a single query would require over 50 hours

1 Introduction

By using optimized index structures, finding the most similar Quora question can be reduced from 50 hours to a few milliseconds (Johnson et al., 2017).

Quora一个新问题要50小时才能找到答案,SBERT降到毫秒

3 Model

SBERT adds a pooling operation to the output of BERT / RoBERTa to derive a fixed sized sentence embedding

增加了池化层,固定embedding的尺寸

We experiment with three pooling strategies: Using the output of the CLS-token, computing the mean of all output vectors (MEAN strategy), and computing a max-over-time of the output vectors (MAX-strategy). The default configuration is MEAN.

三种池化策略

3.1 Training Details

We fine-tune SBERT with a 3-way softmax classifier objective function for one epoch. We used a batch-size of 16, Adam optimizer with learning rate 2e−5, and a linear learning rate warm-up over 10% of the training data. Our default pooling strategy is MEAN.

三种方式和fine-tune参数

4.1 Unsupervised STS

SBERT和其他模型在STS任务结果对比,SRoBERT相对于SBERT提高有限

4.2 Supervised STS

4.3 Argument Facet Similarity

AFS Argument Facet Similarity

STS data is usually descriptive, while AFS data are argumentative excerpts from dialogs. To be considered similar, arguments must not only make similar claims, but also provide a similar reasoning.

AFS数据判断相似比STS数据更难

6 Ablation Study

classification任务用 (u,v,|u-v|) 在softmax classifier中效果比较好

When trained with the classification objective function on NLI data, the pooling strategy has a rather minor impact. The impact of the concatenation mode is much larger.

When trained with the regression objective function, we observe that the pooling strategy has a large impact.

不同任务对应不同的影响

7 Computational Efficiency

For improved computation of sentence embeddings, we implemented a smart batching strategy: Sentences with similar lengths are grouped together and are only padded to the longest element in a mini-batch. This drastically reduces computational overhead from padding tokens.

降低计算成本使用策略,相似长度的放在一起,只和mini-batch中对齐

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值