Bidirectional Encoder Representations from Transformers (BERT) is a major advancement in the field of Natural Language Processing (NLP) in recent years. BERT achieves good performances in many aspects of NLP, such as text classification, text summarisation and question answering.
变压器的双向编码器表示(BERT)是近年来自然语言处理(NLP)领域的一项重大进步。 BERT在NLP的许多方面都表现出色,例如文本分类,文本摘要和问题解答。
In this article, I will walk through how to fine tune a BERT model based on your own dataset to do text classification (sentiment analysis in my case). When browsing through the net to look for guides, I came across mostly PyTorch implementation or fine-tuning using pre-existing dataset such as the GLUE dataset. Therefore, I would like to provide a guide on the Tensorflow implementation using my own customised dataset.
在本文中,我将逐步介绍如何根据自己的数据集微调BERT模型以进行文本分类(在我的情况下为情感分析)。 在网上浏览以查找指南时,我遇到了大多数PyTorch实施或使用诸如GLUE数据集之类的预先存在的数据集进行微调的情况。 因此,我想使用自己的自定义数据集提供有关Tensorflow实现的指南。
Hugging Face library provides convenient pre-trained transformer models to be used, including BERT. We will be using TFBertForSequenceClassification, the tensorflow implementation of fine tuning BERT model. This pretrained model is trained on the Wikipedia and Brown