pytorch实现bert_精细调整bert和roberta以在pytorch中实现高精度文本分类

最新推荐文章于 2024-07-22 09:29:50 发布

weixin_26750481

最新推荐文章于 2024-07-22 09:29:50 发布

阅读量3k

点赞数 2

文章标签： python java 人工智能

原文链接：https://towardsdatascience.com/fine-tuning-bert-and-roberta-for-high-accuracy-text-classification-in-pytorch-c9e63cf64646

版权

本文介绍了如何在PyTorch中利用BERT和RoBERTa进行精细调整，以实现高精度的文本分类任务。翻译自一篇Data Science文章，详细阐述了实现过程。

摘要由CSDN通过智能技术生成

pytorch实现bert

As of the time of writing this piece, state-of-the-art results on NLP and NLU tasks are obtained with Transformer models. There is a trend of performance improvement as models become deeper and larger, GPT 3 comes to mind. Training small versions of such models from scratch takes a significant amount of time, even with GPU. This problem can be solved via pre-training when a model is trained on a large text corpus using a high-performance cluster. Later it can be fine-tuned for a specific task in a much shorter amount of time. During fine tuning stage, additional layers can be added to the model for specific tasks, which can be different from those for which the model was initially trained. This technique is related to transfer learning, a concept applied to areas of machine learning beyond NLP (see here and here for a quick intro).

在撰写本文时，已使用Transformer模型获得了有关NLP和NLU任务的最新结果。随着模型变得越来越深，性能越来越大， GPT 3浮现在脑海。即使使用GPU，从头开始训练这类模型的小版本也要花费大量时间。当使用高性能集群在大型文本语料库上训练模型时，可以通过预训练来解决此问题。之后，可以在更短的时间内针对特定任务对其进行微调。在微调阶段，可以为特定任务向模型添加其他层，这些层可以与最初训练模型时所用的层不同。该技术与转移学习有关，转移学习是应用于NLP之外的机器学习领域的概念(快速入门请参见此处和此处 )。

In this post, I would like to share my experience of fine-tuning BERT and RoBERTa, available from the transformers library by Hugging Face, for a document classification task. Both models share a transformer architecture, which consists of at least two distinct blocks — encoder and decoder. Both encoder and decoder consist of multiple layers based around Attention mechanism. Encoder processed the input token sequence into a vector of floating point numbers — a hidden state, which is picked up by the decoder. It is the h