bart使用方法_使用简单变压器的BART释义

最新推荐文章于 2024-09-05 08:01:34 发布

weixin_26752765

最新推荐文章于 2024-09-05 08:01:34 发布

阅读量1.6k

点赞数 1

文章标签： python java 算法 leetcode

原文链接：https://towardsdatascience.com/bart-for-paraphrasing-with-simple-transformers-7c9ea3dfdd8c

版权

本文介绍了如何使用BART模型进行释义任务，主要针对简单变压器库的应用进行了讲解。

摘要由CSDN通过智能技术生成

bart使用方法

介绍 (Introduction)

BART is a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.

BART是一种用于预训练序列到序列模型的去噪自动编码器。通过(1)使用任意噪声功能破坏文本，以及(2)学习模型以重建原始文本来训练BART。

- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension -

-BART：对自然语言生成，翻译和理解进行序列到序列的预训练降噪 -

Don’t worry if that sounds a little complicated; we are going to break it down and see what it all means. To add a little bit of background before we dive into BART, it’s time for the now-customary ode to Transfer Learning with self-supervised models. It’s been said many times over the past couple of years, but Transformers really have achieved incredible success in a wide variety of Natural Language Processing (NLP) tasks.

如果这听起来有点复杂，请不要担心。我们将对其进行分解，并查看其含义。为了在开始使用BART之前增加一些背景知识，现在是时候让传统习俗向自我监督模型转移学习了。在过去的几年中已经有很多次这样的说法，但是Transformers确实在各种自然语言处理(NLP)任务中取得了令人难以置信的成功。

BART uses a standard Transformer architecture (Encoder-Decoder) like the original Transformer model used for neural machine translation but also incorporates some changes from BERT (only uses the encoder) and GPT (only uses the decoder). You can refer to the 2.1 Architecture section of the BART paper for more details.

BART使用标准的Transformer体系结构(编码器-解码器)，就像用于神经机器翻译的原始Transformer模型一样，但是还结合了BERT(仅使用编码器)和GPT(仅使用解码器)的一些更改。您可以参考BART论文的2.1体系结构部分以获取更多详细信息。

培训前的BART (Pre-Training BART)

BART is pre-trained by minimizing the cross-entropy loss between the decoder output and the original sequence.

通过最小化解码器输出和原始序列之间的交叉熵损失来对BART进行预训练。

屏蔽语言建模(MLM) (Masked Language Modeling (MLM))

MLM models such as BERT are pre-trained to predict masked tokens. This process can be broken down as follows:

像BERT这样的MLM模型已经过预训练，可以预测被屏蔽的令牌。此过程可以细分如下：

Replace a random subset of the input with a mask token [MASK]. (Adding noise/corruption)
用掩码标记[MASK]替换输入的随机子集。 (增加噪音/腐败)
The model predicts the original tokens for each of the [MASK] tokens. (Denoising)
该模型预测每个[MASK]的原始令牌令牌。 (去噪)

Importantly, BERT models can “see” the full input sequence (with some tokens replaced with [MASK]) when attempting to predict the original tokens. This makes BERT a bidirectional model, i.e. it can “see” the tokens before and after the masked tokens.

重要的是，当尝试预测原始令牌时，BERT模型可以“看到”完整的输入序列(某些令牌被[MASK]替换)。这使BERT成为双向模型，即它可以“看到”被屏蔽令牌之前和之后的令牌。

This is suited for tasks like classification where you can use information from the full sequence to perform the prediction. However, it is less suited for text generation tasks where the prediction depends only on the previous words.

这适用于诸如分类之类的任务，您可以在其中使用完整序列中的信息来执行预测。但是&