pytorch实现bert_精细调整bert和roberta以在pytorch中实现高精度文本分类

本文介绍了如何在PyTorch中利用BERT和RoBERTa进行精细调整,以实现高精度的文本分类任务。翻译自一篇Data Science文章,详细阐述了实现过程。
摘要由CSDN通过智能技术生成

pytorch实现bert

As of the time of writing this piece, state-of-the-art results on NLP and NLU tasks are obtained with Transformer models. There is a trend of performance improvement as models become deeper and larger, GPT 3 comes to mind. Training small versions of such models from scratch takes a significant amount of time, even with GPU. This problem can be solved via pre-training when a model is trained on a large text corpus using a high-performance cluster. Later it can be fine-tuned for a specific task in a much shorter amount of time. During fine tuning stage, additional layers can be added to the model for specific tasks, which can be different from those for which the model was initially trained. This technique is related to transfer learning, a concept applied to areas of machine learning beyond NLP (see here and here for a quick intro).

在撰写本文时,已使用Transformer模型获得了有关NLP和NLU任务的最新结果。 随着模型变得越来越深,性能越来越大, GPT 3浮现在脑海。 即使使用GPU,从头开始训练这类模型的小版本也要花费大量时间。 当使用高性能集群在大型文本语料库上训练模型时,可以通过预训练来解决此问题。 之后,可以在更短的时间内针对特定任务对其进行微调。 在微调阶段,可以为特定任务向模型添加其他层,这些层可以与最初训练模型时所用的层不同。 该技术与转移学习有关,转移学习是应用于NLP之外的机器学习领域的概念(快速入门请参见此处此处 )。

In this post, I would like to share my experience of fine-tuning BERT and RoBERTa, available from the transformers library by Hugging Face, for a document classification task. Both models share a transformer architecture, which consists of at least two distinct blocks — encoder and decoder. Both encoder and decoder consist of multiple layers based around Attention mechanism. Encoder processed the input token sequence into a vector of floating point numbers — a hidden state, which is picked up by the decoder. It is the h

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值